Achieving complete reproducibility in science, particularly in research fields such as biodiversity, is challenging due to analytical choices, bias and interpretation. Here, we examine examples of reproducibility in biological systematics, ecology, and molecular biology. To mitigate the impact of interpretation and analytical choices, Artificial Intelligence (AI) has provided potential tools. In the present work, while emphasizing the need for methodological rigor and transparency, we acknowledge the role of interpretation in activities such as coding biological characters and detecting morphological patterns in nature. We explore the opportunities and limitations associated with the synergy between big data and AI in molecular biology, emphasizing the need for a more comprehensive and integrated approach based on dataset quality and usefulness. We conclude by advocating for AI-based tools to assist biologists, reinforcing consilience as a criterion for scientific validity without hindering scientific progress.
We explore how physical scale and population size shape the emergence of complex behaviors in open-ended ecological environments. In our setting, agents are unsupervised and have no explicit rewards or learning objectives but instead evolve over time according to reproduction, mutation, and selection. As they act, agents also shape their environment and the population around them in an ongoing dynamic ecology. Our goal is not to optimize a single high-performance policy, but instead to examine how behaviors emerge and evolve across large populations due to natural competition and environmental pressures. We use modern hardware along with a new multi-agent simulator to scale the environment and population to sizes much larger than previously attempted, reaching populations of over 60,000 agents, each with their own evolved neural network policy. We identify various emergent behaviors such as long-range resource extraction, vision-based foraging, and predation that arise under competitive and survival pressures. We examine how sensing modalities and environmental scale affect the emergence of these behaviors and find that some of them appear only in sufficiently large environments an
Epistasis occurs when the effect of a mutation depends on its carrier's genetic background. Despite increasing evidence that epistasis for fitness is common, its role during evolution is contentious. Fitness landscapes, mappings of genotype or phenotype to fitness, capture the full extent and complexity of epistasis. Fitness landscape theory has shown how epistasis affects the course and the outcome of evolution. Moreover, by measuring the competitive fitness of sets of tens to thousands of connected genotypes, empirical fitness landscapes have shown that epistasis is frequent and depends on the fitness measure, the choice of mutations for the landscape, and the environment in which it was measured. Here, I review fitness landscape theory and experiments and their implications for the role of epistasis in adaptation. I discuss theoretical expectations in the light of empirical fitness landscapes and highlight open challenges and future directions towards integrating theory and data, and incorporating ecological factors.
Biological evolution is realised through the same mechanisms of birth and death that underlie change in population density. The deep interdependence between ecology and evolution is well-established, and recent models focus on integrating eco-evolutionary dynamics to demonstrate how ecological and evolutionary processes interact and feed back upon each other. Nevertheless, a gap remains between the logical foundations of ecology and evolution. Population ecology and evolution have fundamental equations that define how the size of a population (ecology) and the average characteristic within a population (evolution) change over time. These fundamental equations are a complete and exact description of change for any closed population, but how they are formally linked remains unclear. We link the fundamental equations of population ecology and evolution with an equation that sums how individual characteristics interact with individual fitness in a population. From this equation, we derive the fundamental equations of population ecology and evolutionary biology (the Price equation). We thereby identify an overlooked bridge between ecology and biological evolution. Our unification formal
Although extensive behavioral changes often exist between closely related animal species, our understanding of the genetic basis underlying the evolution of behavior has remained limited. Here, we propose a new framework to study behavioral evolution by computational estimation of ancestral behavioral repertoires. We measured the behaviors of individuals from six species of fruit flies using unsupervised techniques and identified suites of stereotyped movements exhibited by each species. We then fit a Generalized Linear Mixed Model to estimate the suites of behaviors exhibited by ancestral species, as well as the intra- and inter-species behavioral covariances. We found that much of intraspecific behavioral variation is explained by differences between individuals in the status of their behavioral hidden states, what might be called their "mood." Lastly, we propose a method to identify groups of behaviors that appear to have evolved together, illustrating how sets of behaviors, rather than individual behaviors, likely evolved. Our approach provides a new framework for identifying co-evolving behaviors and may provide new opportunities to study the genetic basis of behavioral evolut
Emergent behavior arising in a joint human-robot system cannot be fully predicted based on an understanding of the individual agents. Typically, robot behavior is governed by algorithms that optimize a reward function that should quantitatively capture the joint system's goal. Although reward functions can be updated to better match human needs, this is no guarantee that no misalignment with the complex and variable human needs will occur. Algorithms may learn undesirable behavior when interacting with the human and the intrinsically unpredictable human-inhabited world, thereby producing further misalignment with human users or bystanders. As a result, humans might behave differently than anticipated, causing robots to learn differently and undesirable behavior to emerge. With this short paper, we state that to design for Human-Robot Interaction that mitigates such undesirable emergent behavior, we need to complement advancements in human-robot interaction algorithms with human factors knowledge and expertise. More specifically, we advocate a three-pronged approach that we illustrate using a particularly challenging example of safety-critical human-robot interaction: a driver inter
The surging demand for new energy vehicles is driven by the imperative to conserve energy, reduce emissions, and enhance the ecological ambiance. By conducting behavioral analysis and mining usage patterns of new energy vehicles, particular patterns can be identified. For instance, overloading the battery, operating with low battery power, and driving at excessive speeds can all detrimentally affect the battery's performance. To assess the impact of such driving behavior on the urban ecology, an environmental computational modeling method has been proposed to simulate the interaction between new energy vehicles and the environment. To extend the time series data of the vehicle's entire life cycle and the ecological environment within the model sequence data, the LSTM model with Bayesian optimizer is utilized for simulation. The analysis revealed the detrimental effects of poor driving behavior on the environment.
A birth-death process is a continuous-time Markov chain that counts the number of particles in a system over time. In the general process with $n$ current particles, a new particle is born with instantaneous rate $λ_n$ and a particle dies with instantaneous rate $μ_n$. Currently no robust and efficient method exists to evaluate the finite-time transition probabilities in a general birth-death process with arbitrary birth and death rates. In this paper, we first revisit the theory of continued fractions to obtain expressions for the Laplace transforms of these transition probabilities and make explicit an important derivation connecting transition probabilities and continued fractions. We then develop an efficient algorithm for computing these probabilities that analyzes the error associated with approximations in the method. We demonstrate that this error-controlled method agrees with known solutions and outperforms previous approaches to computing these probabilities. Finally, we apply our novel method to several important problems in ecology, evolution, and genetics.
We present a pedagogical review of the weak gravitational lensing measurement process and its connection to major scientific questions such as dark matter and dark energy. Then we describe common ways of parametrizing systematic errors and understanding how they affect weak lensing measurements. Finally, we discuss several instrumental systematics and how they fit into this context, and conclude with some future perspective on how progress can be made in understanding the impact of instrumental systematics on weak lensing measurements.
New technologies for acquiring biological information such as eDNA, acoustic or optical sensors, make it possible to generate spatial community observations at unprecedented scales. The potential of these novel community data to standardize community observations at high spatial, temporal, and taxonomic resolution and at large spatial scale ('many rows and many columns') has been widely discussed, but so far, there has been little integration of these data with ecological models and theory. Here, we review these developments and highlight emerging solutions, focusing on statistical methods for analyzing novel community data, in particular joint species distribution models; the new ecological questions that can be answered with these data; and the potential implications of these developments for policy and conservation.
Network ecologists investigate the structure, function, and evolution of ecological systems using network models and analyses. For example, network techniques have been used to study community interactions (i.e., food-webs, mutualisms), gene flow across landscapes, and the sociality of individuals in populations. The work presented here uses a bibliographic and network approach to (1) document the rise of Network Ecology, (2) identify the diversity of topics addressed in the field, and (3) map the structure of scientific collaboration among contributing scientists. Our aim is to provide a broad overview of this emergent field that highlights its diversity and to provide a foundation for future advances. To do this, we searched the ISI Web of Science database for ecology publications between 1900 and 2012 using the search terms for research areas of Environmental Sciences & Ecology and Evolutionary Biology and the topic tag ecology. From these records we identified the Network Ecology publications using the topic terms network, graph theory, and web while controlling for the usage of misleading phrases. The resulting corpus entailed 29,513 publications between 1936 and 2012. We
Datasets encountered when examining deeper issues in ecology and evolution are often complex. This calls for careful strategies for both model building, model selection, and model averaging. Our paper aims at motivating, exhibiting, and further developing focused model selection criteria. In contexts involving precisely formulated interest parameters, these versions of FIC, the focused information criterion, typically lead to better final precision for the most salient estimates, confidence intervals, etc. as compared to estimators obtained from other selection methods. Our methods are illustrated with real case studies in ecology; one related to bird species abundance and another to the decline in body condition for the Antarctic minke whale.
We present optimal measurements of the angular power spectrum of the XDQSOz catalogue of photometric quasars from the Sloan Digital Sky Survey. These measurements rely on a quadratic maximum likelihood estimator that simultaneously measures the auto- and cross-power spectra of four redshift samples, and provides minimum-variance, unbiased estimates even at the largest angular scales. Since photometric quasars are known to be strongly affected by systematics such as spatially-varying depth and stellar contamination, we introduce a new framework of extended mode projection to robustly mitigate the impact of systematics on the power spectrum measurements. This technique involves constructing template maps of potential systematics, decorrelating them on the sky, and projecting out modes which are significantly correlated with the data. Our method is able to simultaneously process several thousands of nonlinearly-correlated systematics, and mode projection is performed in a blind fashion. Using our final power spectrum measurements, we find a good agreement with theoretical predictions, and no evidence for further contamination by systematics. Extended mode projection not only obviates
Large image collections generated from camera traps offer valuable insights into species richness, occupancy, and activity patterns, significantly aiding biodiversity monitoring. However, the manual processing of these datasets is time-consuming, hindering analytical processes. To address this, deep neural networks have been widely adopted to automate image labelling, but the impact of classification error on key ecological metrics remains unclear. Here, we analyse data from camera trap collections in an African savannah (82,300 labelled images, 47 species) and an Asian sub-tropical dry forest (40,308 labelled images, 29 species) to compare ecological metrics derived from expert-generated species identifications with those generated by deep learning classification models. We specifically assess the impact of deep learning model architecture, proportion of label noise in the training data, and the size of the training dataset on three key ecological metrics: species richness, occupancy, and activity patterns. We found that predictions of species richness derived from deep neural networks closely match those calculated from expert labels and remained resilient to up to 10% noise in t
As Evolutionary Dynamics moves from the realm of theory into application, algorithms are needed to move beyond simple models. Yet few such methods exist in the literature. Ecological and physiological factors are known to be central to evolution in realistic contexts, but accounting for them generally renders problems intractable to existing methods. We introduce a formulation of evolutionary games which accounts for ecology and physiology by modeling both as computations and use this to analyze the problem of directed evolution via methods from Reinforcement Learning. This combination enables us to develop first-of-their-kind results on the algorithmic problem of learning to control an evolving population of cells. We prove a complexity bound on eco-evolutionary control in situations with limited prior knowledge of cellular physiology or ecology, give the first results on the most general version of the mathematical problem of directed evolution, and establish a new link between AI and biology.
Many living and non-living complex systems can be modeled and understood as collective systems made of heterogeneous components that self-organize and generate nontrivial morphological structures and behaviors. This chapter presents a brief overview of our recent effort that investigated various aspects of such morphogenetic collective systems. We first propose a theoretical classification scheme that distinguishes four complexity levels of morphogenetic collective systems based on the nature of their components and interactions. We conducted a series of computational experiments using a self-propelled particle swarm model to investigate the effects of (1) heterogeneity of components, (2) differentiation/re-differentiation of components, and (3) local information sharing among components, on the self-organization of a collective system. Results showed that (a) heterogeneity of components had a strong impact on the system's structure and behavior, (b) dynamic differentiation/re-differentiation of components and local information sharing helped the system maintain spatially adjacent, coherent organization, (c) dynamic differentiation/re-differentiation contributed to the development
Mitochondrial and nuclear genomes must be co-adapted to ensure proper cellular respiration and energy production. Mito-nuclear incompatibility reduces individual fitness and induces hybrid infertility, suggesting a possible role in reproductive barriers and speciation. Here we develop a birth-death model for evolution in spatially extended populations under selection for mito-nuclear co-adaptation. Mating is constrained by physical and genetic proximity, and offspring inherit nuclear genomes from both parents, with recombination. The model predicts macroscopic patterns including a community's long-term species diversity, its species abundance distribution, speciation and extinction rates, as well as intra- and inter-specific genetic variation. We explore how these long-term outcomes depend upon the microscopic parameters of reproduction: individual fitness governed by mito-nuclear compatibility, constraints on mating compatibility, and ecological carrying capacity. We find that strong selection for mito-nuclear compatibility reduces the equilibrium number of species after a radiation, increases the species' abundances, while simultaneously increasing both speciation and extinction
Understanding crowd behavior in video is challenging for computer vision. There have been increasing attempts on modeling crowded scenes by introducing ever larger property ontologies (attributes) and annotating ever larger training datasets. However, in contrast to still images, manually annotating video attributes needs to consider spatiotemporal evolution which is inherently much harder and more costly. Critically, the most interesting crowd behaviors captured in surveillance videos (e.g., street fighting, flash mobs) are either rare, thus have few examples for model training, or unseen previously. Existing crowd analysis techniques are not readily scalable to recognize novel (unseen) crowd behaviors. To address this problem, we investigate and develop methods for recognizing visual crowd behavioral attributes without any training samples, i.e., zero-shot learning crowd behavior recognition. To that end, we relax the common assumption that each individual crowd video instance is only associated with a single crowd attribute. Instead, our model learns to jointly recognize multiple crowd behavioral attributes in each video instance by exploring multiattribute cooccurrence as conte
Ecological networks such as plant-pollinator systems and food webs vary in space and time. This variability includes fluctuations in global network properties such as total number and intensity of interactions but also in the local properties of individual nodes such as the number and intensity of species-level interactions. Fluctuations of species properties can significantly affect higher-order network features, e.g. robustness and nestedness. Local fluctuations should therefore be controlled for in applications that rely on null models, especially pattern and perturbation detection. By contrast, most randomization methods for null models used by ecologists treat node-level local properties as hard constraints that cannot fluctuate. Here, we synthesise a set of methods that resolves the limit of hard constraints and is based on statistical mechanics. We illustrate the methods with some practical examples making available open source computer codes. We clarify how this approach can be used by experimental ecologists to detect non-random network patterns with null models that not only rewire but also redistribute interaction strengths by allowing fluctuations in the null model cons
We test whether artificial intelligence architectural evolution obeys the same statistical laws as biological evolution. Compiling 935 ablation experiments from 161 publications, we show that the distribution of fitness effects (DFE) of architectural modifications follows a heavy-tailed Student's t-distribution with proportions (68% deleterious, 19% neutral, 13% beneficial for major ablations, n=568) that place AI between compact viral genomes and simple eukaryotes. The DFE shape matches D. melanogaster (normalized KS=0.07) and S. cerevisiae (KS=0.09); the elevated beneficial fraction (13% vs. 1-6% in biology) quantifies the advantage of directed over blind search while preserving the distributional form. Architectural origination follows logistic dynamics (R^2=0.994) with punctuated equilibria and adaptive radiation into domain niches. Fourteen architectural traits were independently invented 3-5 times, paralleling biological convergences. These results demonstrate that the statistical structure of evolution is substrate-independent, determined by fitness landscape topology rather than the mechanism of selection.