In this paper, we propose and study several inverse problems of determining unknown parameters in nonlocal nonlinear coupled PDE systems, including the potentials, nonlinear interaction functions and time-fractional orders. In these coupled systems, we enforce non-negativity of the solutions, aligning with realistic scenarios in biology and ecology. There are several salient features of our inverse problem study: the drastic reduction in measurement/observation data due to averaging effects, the nonlinear coupling between multiple equations, and the nonlocality arising from fractional-type derivatives. These factors present significant challenges to our inverse problem, and such inverse problems have never been explored in previous literature. To address these challenges, we develop new and effective schemes. Our approach involves properly controlling the injection of different source terms to obtain multiple sets of mean flux data. This allows us to achieve unique identifiability results and accurately determine the unknown parameters. Finally, we establish a connection between our study and practical applications in biology, further highlighting the relevance of our work in real-
Systems biology relies on mathematical models that often involve complex and intractable likelihood functions, posing challenges for efficient inference and model selection. Generative models, such as normalizing flows, have shown remarkable ability in approximating complex distributions in various domains. However, their application in systems biology for approximating intractable likelihood functions remains unexplored. Here, we elucidate a framework for leveraging normalizing flows to approximate complex likelihood functions inherent to systems biology models. By using normalizing flows in the Simulation-based inference setting, we demonstrate a method that not only approximates a likelihood function but also allows for model inference in the model selection setting. We showcase the effectiveness of this approach on real-world systems biology problems, providing practical guidance for implementation and highlighting its advantages over traditional computational methods.
Dynamical systems modeling, particularly via systems of ordinary differential equations, has been used to effectively capture the temporal behavior of different biochemical components in signal transduction networks. Despite the recent advances in experimental measurements, including sensor development and '-omics' studies that have helped populate protein-protein interaction networks in great detail, modeling in systems biology lacks systematic methods to estimate kinetic parameters and quantify associated uncertainties. This is because of multiple reasons, including sparse and noisy experimental measurements, lack of detailed molecular mechanisms underlying the reactions, and missing biochemical interactions. Additionally, the inherent nonlinearities with respect to the states and parameters associated with the system of differential equations further compound the challenges of parameter estimation. In this study, we propose a comprehensive framework for Bayesian parameter estimation and complete quantification of the effects of uncertainties in the data and models. We apply these methods to a series of signaling models of increasing mathematical complexity. Systematic analysis o
In a recent paper, Wilmes et al. demonstrated a qualitative integration of omics data streams to gain a mechanistic understanding of cyclosporine A toxicity. One of their major conclusions was that cyclosporine A strongly activates the nuclear factor (erythroid-derived 2)-like 2 pathway (Nrf2) in renal proximal tubular epithelial cells exposed in vitro. We pursue here the analysis of those data with a quantitative integration of omics data with a differential equation model of the Nrf2 pathway. That was done in two steps: (i) Modeling the in vitro pharmacokinetics of cyclosporine A (exchange between cells, culture medium and vial walls) with a minimal distribution model. (ii) Modeling the time course of omics markers in response to cyclosporine A exposure at the cell level with a coupled PK-systems biology model. Posterior statistical distributions of the parameter values were obtained by Markov chain Monte Carlo sampling. Data were well simulated, and the known in vitro toxic effect EC50 was well matched by model predictions. The integration of in vitro pharmacokinetics and systems biology modeling gives us a quantitative insight into mechanisms of cyclosporine A oxidative-stress
Understanding the mechanisms of interactions within cells, tissues, and organisms is crucial to driving developments across biology and medicine. Mathematical modeling is an essential tool for simulating biological systems and revealing biochemical regulatory mechanisms. Building on experiments, mechanistic models are widely used to describe small-scale intracellular networks and uncover biochemical mechanisms in healthy and diseased states. The rapid development of high-throughput sequencing techniques and computational tools has recently enabled models that span multiple scales, often integrating signaling, gene regulatory, and metabolic networks. These multiscale models enable comprehensive investigations of cellular networks and thus reveal previously unknown disease mechanisms and pharmacological interventions. Here, we review systems biology models from classical mechanistic models to larger, multiscale models that integrate multiple layers of cellular networks. We introduce several examples of models of hypertrophic cardiomyopathy, exercise, and cancer cell proliferation. Additionally, we discuss methods that increase the certainty and accuracy of model predictions. Integrat
With the completion of human genome mapping, the focus of scientists seeking to explain the biological complexity of living systems is shifting from analyzing the individual components (such as a particular gene or biochemical reaction) to understanding the set of interactions amongst the large number of components that results in the different functions of the organism. To this end, the area of systems biology attempts to achieve a "systems-level" description of biology by focusing on the network of interactions instead of the characteristics of its isolated parts. In this article, we briefly describe some of the emerging themes of research in "network" biology, looking at dynamical processes occurring at the two different length scales of within the cell and between cells, viz., the intra-cellular signaling network and the nervous system. We show that focusing on the systems-level aspects of these problems allows one to observe surprising and illuminating common themes amongst them.
A number of models in mathematical epidemiology have been developed to account for control measures such as vaccination or quarantine. However, COVID-19 has brought unprecedented social distancing measures, with a challenge on how to include these in a manner that can explain the data but avoid overfitting in parameter inference. We here develop a simple time-dependent model, where social distancing effects are introduced analogous to coarse-grained models of gene expression control in systems biology. We apply our approach to understand drastic differences in COVID-19 infection and fatality counts, observed between Hubei (Wuhan) and other Mainland China provinces. We find that these unintuitive data may be explained through an interplay of differences in transmissibility, effective protection, and detection efficiencies between Hubei and other provinces. More generally, our results demonstrate that regional differences may drastically shape infection outbursts. The obtained results demonstrate the applicability of our developed method to extract key infection parameters directly from publically available data so that it can be globally applied to outbreaks of COVID-19 in a number
Understanding the biological mechanisms of disease is crucial for medicine, and in particular, for drug discovery. AI-powered analysis of genome-scale biological data holds great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving single-cell foundation models. First, we scaled the pre-training data to a diverse collection of 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the \model family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on several downstream evaluation tasks, including identifying the underlying disease state of held-out donors not seen during training, distinguishing between diseased and healthy cells for disease conditions and
Quantitative computational models play an increasingly important role in modern biology. Such models typically involve many free parameters, and assigning their values is often a substantial obstacle to model development. Directly measuring \emph{in vivo} biochemical parameters is difficult, and collectively fitting them to other data often yields large parameter uncertainties. Nevertheless, in earlier work we showed in a growth-factor-signaling model that collective fitting could yield well-constrained predictions, even when it left individual parameters very poorly constrained. We also showed that the model had a `sloppy' spectrum of parameter sensitivities, with eigenvalues roughly evenly distributed over many decades. Here we use a collection of models from the literature to test whether such sloppy spectra are common in systems biology. Strikingly, we find that every model we examine has a sloppy spectrum of sensitivities. We also test several consequences of this sloppiness for building predictive models. In particular, sloppiness suggests that collective fits to even large amounts of ideal time-series data will often leave many parameters poorly constrained. Tests over our m
Identification of dynamics underlying biochemical pathways of interest in oncology is a primary goal in current systems biology. Understanding structures and interactions that govern the evolution of such systems is believed to be a cornerstone in this research. Systems theory and systems identification theory are primary resources for this task since they both provide a self consistent framework for modelling and manipulating models of dynamical systems that are best suited for the problem under investigation. We address herein the issue of obtaining an informative dataset ZN to be used as starting point for identification of EGFR pathway dynamics. In order to match experimental identifiability criteria we propose a theoretical framework for input stimulus design based on dynamical properties of the system under investigation. A feasible optofluidic design has been designed on the basis of the spectral properties of the driving inputs that maximize information content after the theoretical studies.
Benchmarking the performance of complex systems such as rail networks, renewable generation assets and national economies is central to transport planning, regulation and macroeconomic analysis. Classical frontier methods, notably Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA), estimate an efficient frontier in the observed input-output space and define efficiency as distance to this frontier, but rely on restrictive assumptions on the production set and only indirectly address heterogeneity and scale effects. We propose Geometric Manifold Analysis (GeMA), a latent manifold frontier framework implemented via a productivity-manifold variational autoencoder (ProMan-VAE). Instead of specifying a frontier function in the observed space, GeMA represents the production set as the boundary of a low-dimensional manifold embedded in the joint input-output space. A split-head encoder learns latent variables that capture technological structure and operational inefficiency. Efficiency is evaluated with respect to the learned manifold, endogenous peer groups arise as clusters in latent technology space, a quotient construction supports scale-invariant benchmarking, and
Recent tumor genome sequencing confirmed that one tumor often consists of multiple cell subpopulations (clones) which bear different, but related, genetic profiles such as mutation and copy number variation profiles. Thus far, one tumor has been viewed as a whole entity in cancer functional studies. With the advances of genome sequencing and computational analysis, we are able to quantify and computationally dissect clones from tumors, and then conduct clone-based analysis. Emerging technologies such as single-cell genome sequencing and RNA-Seq could profile tumor clones. Thus, we should reconsider how to conduct cancer systems biology studies in the genome sequencing era. We will outline new directions for conducting cancer systems biology by considering that genome sequencing technology can be used for dissecting, quantifying and genetically characterizing clones from tumors. Topics discussed in Part 1 of this review include computationally quantifying of tumor subpopulations; clone-based network modeling, cancer hallmark-based networks and their high-order rewiring principles and the principles of cell survival networks of fast-growing clones.
We advocates here the use of (mathematical) logic for systems biology, as a unified framework well suited for both modeling the dynamic behaviour of biological systems, expressing properties of them, and verifying these properties. The potential candidate logics should have a traditional proof theoretic pedigree (including a sequent calculus presentation enjoying cut-elimination and focusing), and should come with (certified) proof tools. Beyond providing a reliable framework, this allows the adequate encodings of our biological systems. We present two candidate logics (two modal extensions of linear logic, called HyLL and SELL), along with biological examples. The examples we have considered so far are very simple ones-coming with completely formal (interactive) proofs in Coq. Future works includes using automatic provers, which would extend existing automatic provers for linear logic. This should enable us to specify and study more realistic examples in systems biology, biomedicine (diagnosis and prognosis), and eventually neuroscience.
Although reproducibility is a core tenet of the scientific method, it remains challenging to reproduce many results. Surprisingly, this also holds true for computational results in domains such as systems biology where there have been extensive standardization efforts. For example, Tiwari et al. recently found that they could only repeat 50% of published simulation results in systems biology. Toward improving the reproducibility of computational systems research, we identified several resources that investigators can leverage to make their research more accessible, executable, and comprehensible by others. In particular, we identified several domain standards and curation services, as well as powerful approaches pioneered by the software engineering industry that we believe many investigators could adopt. Together, we believe these approaches could substantially enhance the reproducibility of systems biology research. In turn, we believe enhanced reproducibility would accelerate the development of more sophisticated models that could inform precision medicine and synthetic biology.
Systems Biology has emerged in the last years as a new holistic approach based on the global understanding of cells instead of only being focused on their individual parts (genes or proteins), to better understand the complexity of human cells. Since the Systems Biology still does not provide the most accurate answers to our questions due to the complexity of cells and the limited quality of available information to perform a good gene/protein map analysis, we have created simpler models to ensure easier analysis of the map that represents the human cell. Therefore, a virtual organism has been designed according to the main physiological rules for humans in order to replicate the human organism and its vital functions. This toy model was constructed by defining the topology of its genes/proteins and the biological functions associated to it. There are several examples of these toy models that emulate natural processes to perform analysis of the virtual life in order to design the best strategy to understand real life. The strategy applied in this study combines topological and functional analysis integrating the knowledge about the relative position of a node among the others in th
A tumor often consists of multiple cell subpopulations (clones). Current chemo-treatments often target one clone of a tumor. Although the drug kills that clone, other clones overtake it and the tumor reoccurs. Genome sequencing and computational analysis allows to computational dissection of clones from tumors, while singe-cell genome sequencing including RNA-Seq allows to profiling of these clones. This opens a new window for treating a tumor as a system in which clones are evolving. Future cancer systems biology studies should consider a tumor as an evolving system with multiple clones. Therefore, topics discussed in Part 2 of this review include evolutionary dynamics of clonal networks, early-warning signals for formation of fast-growing clones, dissecting tumor heterogeneity, and modeling of clone-clone-stroma interactions for drug resistance. The ultimate goal of the future systems biology analysis is to obtain a whole-system understanding of a tumor and therefore provides a more efficient and personalized management strategies for cancer patients.
Recently, microRNAs (miRNAs) have emerged as central posttranscriptional regulators of gene expression. miRNAs regulate many key biological processes, including cell growth, death, development and differentiation. This discovery is challenging the central dogma of molecular biology. Genes are working together by forming cellular networks. It has become an emerging concept that miRNAs could intertwine with cellular networks to exert their function. Thus, it is essential to understand how miRNAs take part in cellular processes at a systems-level. In this review, I will first introduce basic knowledge of miRNAs and their relations to heart disaeses and cancer, highlight recently dicovered functions such as filtering out gene expression noise by miRNAs. I will aslo introduce basic concepts of cellular networks and interpret their biological meaning in such a way that the network concepts are digested in a biological context and are understandable for biologists. Finally, I will summarize the most recent progress in understanding of miRNA biology at a systems-level: the principles of miRNA regulation of the major cellular networks including signaling, metabolic, protein interaction and
Composition is a powerful principle for systems biology, focused on the interfaces, interconnections, and orchestration of distributed processes to enable integrative multiscale simulations. Whereas traditional models focus on the structure or dynamics of specific subsystems in controlled conditions, compositional systems biology aims to connect these models, asking critical questions about the space between models: What variables should a submodel expose through its interface? How do coupled models connect and translate across scales? How do domain-specific models connect across biological and physical disciplines to drive the synthesis of new knowledge? This approach requires robust software to integrate diverse datasets and submodels, providing researchers with tools to flexibly recombine, iteratively refine, and collaboratively expand their models. This article offers a comprehensive framework to support this vision, including: a conceptual and graphical framework to define interfaces and composition patterns; standardized schemas that facilitate modular data and model assembly; biological templates that integrate detailed submodels that connect molecular processes to the emerg
Research on fault diagnosis on highly nonlinear dynamic systems such as the engine of a vehicle have garnered huge interest in recent years, especially with the automotive industry heading towards self-driving technologies. This article presents a novel opensource simulation testbed of a turbocharged spark ignited (TCSI) petrol engine system for testing and evaluation of residuals generation and fault diagnosis methods. Designed and developed using Matlab/Simulink, the user interacts with the testbed using a GUI interface, where the engine can be realistically simulated using industrial-standard driving cycles such as the Worldwide harmonized Light vehicles Test Procedures (WLTP), the New European Driving Cycle (NEDC), the Extra-Urban Driving Cycle (EUDC), and EPA Federal Test Procedure (FTP-75). The engine is modeled using the mean value engine model (MVEM) and is controlled using a proportional-integral (PI)-based boost controller. The GUI interface also allows the user to induce one of the 11 faults of interest, so that their effects on the performance of the engine are better understood. This minimizes the risk of causing permanent damages to the engine and shortening its lifes
Biological systems are generally complicated and/or complex. In the former approach, one sets up a model with a large number of parameters to describe the system in detail. The latter approach focuses on understanding the universal aspects of biological systems. In this case, an appropriate simple model represents a universality class. The extraction of universal properties is supported by evolutionary robustness and the reduction of dimensionality in high-dimensional states. Integrating the data-driven omics approach with the universality approach is an important step in systems biology.