共找到 20 条结果
For this Special Collection we invited experts in the area of mathematical and computational biology to share their views on the major problems in their areas of interest and their recent research results - focusing on the development of state-of-the-art modeling approaches and computational techniques applied to problems in the life sciences - and to present their vision of the new directions needed for addressing unsolved problems. Papers in this Special Collection address mathematical and computational problems in several areas of the life sciences, including theoretical neuroscience, cancer modeling, and cell and developmental systems. With respect to methodologies, these papers cover dynamical systems, differential equations, stochastic processes, and modern computational techniques, all with an emphasis on techniques in modern modeling and computational methodologies. This Special Collection is jointly hosted by the Bulletin of Mathematical Biology and the Journal of Mathematical Biology.
We are living in the era of large-scale data in the biological sciences. The completion of the Human Genome Project (HGP) marked a pivotal moment, not only for its monumental achievement but also for introducing the concept of Big Science to the life sciences. This project catalysed a technological revolution, shifting biology from a primarily descriptive and hypothesis-driven discipline to a data-intensive, interdisciplinary field capable of systematic and quantitative exploration of biological systems at an unprecedented scale.This new paradigm is embodied by omics approaches, defined as experimental methods based on technologies that enable the systematic, qualitative, quantitative, and unbiased characterization of all molecular components of a given type within a biological system, or the interactions between them. From genomics and transcriptomics to proteomics and metabolomics, these technologies have moved us from studying individual molecules to capturing global views of biological processes. This has been fundamental for the emergence of Systems Biology, a unifying approach that seeks to understand complex biological systems by identifying their components and interactions and predicting systems behaviour through mathematical and computational models.In this chapter we provide an overview of the rise of the "-omes" and their corresponding omics approaches, their conceptual foundations, and the implications they carry for research in the life sciences, including controversies, challenges, and opportunities. We conclude by discussing how the convergence of omics and artificial intelligence is reshaping the epistemological foundations of biological research, bridging data accumulation and inference of molecular mechanisms.
Natural products (NPs) have historically provided the foundational scaffolds for drug development, yet traditional bioprospecting faces critical limitations: high rediscovery rates, laborious isolation workflows, and substantial attrition during clinical translation. The emergence of big data technologies is fundamentally transforming this landscape, enabling a shift from serendipity-based discovery toward systematic, data-driven approaches. This review examines how the integration of artificial intelligence (AI), machine learning (ML), and multi-omics datasets is accelerating natural product research across three key domains: (1) genome mining for biosynthetic gene cluster identification using platforms such as antiSMASH, (2) cheminformatics-driven prediction of structure-activity relationships and ADMET properties, and (3) metabolomics-guided dereplication to prioritize novel bioactive scaffolds. We evaluate the convergence of genomics, metabolomics, and computational chemistry in enabling in silico lead optimization and the discovery of cryptic metabolites from previously inaccessible microbial taxa. While challenges in data standardization and scalability persist, the synergy between big data and NP research is accelerating clinical translation. Despite persistent challenges in data standardization, scalability, and equitable benefit-sharing, the convergence of big data and NP research is poised to redefine drug development. These advances position computational NP research as a cornerstone of next-generation drug development.
The same dataset can be analysed in different justifiable ways to answer the same research question, potentially challenging the robustness of empirical science1-3. In this crowd initiative, we investigated the degree to which research findings in the social and behavioural sciences are contingent on analysts' choices. We examined a stratified random sample of 100 studies published between 2009 and 2018, in which, for one claim per study, at least five reanalysts independently reanalysed the original data. The statistical appropriateness of the reanalyses was assessed in peer evaluations, and the robustness indicators were inspected along a range of research characteristics and study designs. We found that 34% of the independent reanalyses yielded the same result (within a tolerance region of ±0.05 Cohen's d) as the original report; with a four times broader tolerance region, this indicator increased to 57%. Of the reanalyses conducted, 74% reached the same conclusion as the original investigation, 24% yielded no effects or inconclusive results and 2% reported the opposite effect. This exploratory study indicates that the common single-path analyses in social and behavioural research should not be simply assumed to be robust to alternative analyses4. Therefore, we recommend the development and use of practices to explore and communicate this neglected source of uncertainty.
暂无摘要(点击查看详情)
Prognostic inequity has been identified as a barrier to accessing end-of-life care for underrepresented groups. Artificial intelligence-based clinical prediction models (AIPMs) for prognostication of mortality have the potential to offer rapid, accessible, and accurate predictions that could streamline care. However, they may also exacerbate preexisting inequities in the health care system rather than address accessibility and quality. This can be caused by erroneous outputs from biased training data, outcomes from out-of-scope operationalization, and inexplicability due to opacity. The goal of this study is to synthesize peer-reviewed literature on the creation and application of AIPMs to prognosticate mortality in acute care settings for adult patients, offering new insights into responsible and ethical model development. A transdisciplinary, structured search strategy was developed in consultation with librarians from both health sciences and engineering sciences. The academic databases queried were Medline, Embase, IEEE Xplore, ACM Digital Library, Compendex, and Scopus. The search was conducted in spring 2025, and the results were uploaded to Covidence. A team of reviewers will screen in 2 rounds: titles and abstracts, then full texts. Eligibility will be determined by publication in academic journals or as full-length conference proceedings, language, model output, and AI use. Data will be charted using adapted charting tools and then analyzed by descriptive, summary, and qualitative synthesis. The search was completed on March 25, 2025, with screening starting in May 2025. Results are anticipated for January 2026. This review will provide a comprehensive summary of AIPMs that predict mortality, highlighting the specific elements included in their development. Informed by the responsible research and innovation (RRI) framework, we will consider interest-holder engagement, interdisciplinary collaboration, and computational and clinical ethics will in the context of the four RRI dimensions: anticipation, reflexivity, inclusion, and responsiveness.
The advent of affordable whole-genome sequencing has spurred numerous large-scale projects aimed at inferring the tree of life, yet achieving a complete species-level phylogeny remains a distant goal due to significant costs and computational demands. Traditional species tree inference methods, though effective, are hampered by the need for high-coverage sequencing, high-quality genomic alignments, and extensive computational resources. To address these challenges, this study introduces WASTER, a novel de novo tool for inferring shallow phylogenies directly from short-read sequences. WASTER employs a k-mer based approach for identifying variable sites, circumventing the need for genome assembly and alignment. Using simulations, we demonstrate that WASTER achieves accuracy comparable to that of traditional alignment-based methods, even for low sequencing depth, and has substantially higher accuracy than other alignment-free methods. We validate WASTER's efficacy on real data, where it accurately reconstructs phylogenies of eukaryotic species with as low depth as 1.5X. WASTER provides a fast and efficient solution for phylogeny estimation in cases where genome assembly and/or alignment may bias analyses or is challenging, for example due to low sequencing depth. It also provides a method for generating guide trees for tree-based alignment algorithms. WASTER's ability to accurately estimate shallow phylogenies from low-coverage sequencing data without relying on assembly and alignment will lead to substantially reduced sequencing and computational costs in phylogenomic projects.
Chiral cyclophanes represent a unique supramolecular architecture that integrates chiral cavities with luminescent properties, enabled by their precisely controlled interchromophoric orientation, distance, and asymmetry. These features confer exceptional chiral recognition capabilities and distinctive chiroptical behaviors. Over the past decade, advances in synthetic strategies alongside improved computational and analytical tools, have greatly accelerated the development of diverse chiral cyclophane structures and deepened mechanistic understanding. Currently, the field is demonstrating expanding functionalities, showing significant promise in areas such as circularly polarized luminescence, asymmetric catalysis, supramolecular assembly, and biomedicine. This review systematically summarizes recent progress in chiral cyclophanes, offering a timely overview of the historical context, current achievements, and future directions. Based on the location of chiral elements and overall molecular geometry, the discussed cyclophanes are categorized into four subtypes: those containing chiral chromophores, those with chiral linkers, systems exhibiting planar chirality, and chiral cyclophanes with stacked multilayer chromophores. By organizing the literature according to these structural classes, this review aims to offer clear guidance for the rational design and functional exploration of chiral cyclophanes, thereby fostering interdisciplinary innovation across chemistry, materials science, life sciences and other related fields.
In many countries, lifespan has been increasing faster than healthspan, leading to more years spent with late-life disease and highlighting the need for reliable biomarkers to measure biological aging. We used data from the Berlin Aging Study II (BASE-II, 60-80 years of age at baseline, average follow-up 7.4 ± 1.5 years, range 3.9-10.4, n = 1,083) to compare 14 biomarkers of aging recently consented by an expert panel for the use as outcome measures in intervention studies: physiological (insulin-like growth factor 1 (IGF-1), growth-differentiating factor-15 (DNA methylation derived, DNAmGDF15)), inflammatory (high sensitivity C-reactive protein (CRP), interleukin-6 (IL-6)), functional (muscle mass, muscle strength, hand grip strength (HGS), Timed-Up-and-Go (TUG), gait speed, standing balance test, frailty phenotype (FP), cognitive health, blood pressure), and epigenetic (epigenetic clock, DunedinPACE). Cox proportional hazard regression analyses were performed to investigate their role in prediction of all-cause as well as cause-specific mortality. Results were adjusted for age, sex, lifestyle factors, and genetic ancestry. In adjusted models of all-cause mortality, HGS, IL-6, standing balance, cognitive health, and the epigenetic clock (DunedinPACE) statistically significantly predicted mortality, with the epigenetic clock (DunedinPACE) emerging as the strongest predictor. CRP, gait speed, IGF-1, blood pressure, muscle mass, DNAmGDF15, FP and TUG were not associated with mortality in this study. These results were corroborated in subgroup analyses stratified by cause of death. Feature selection identified a minimal biomarker set consisting of muscle mass, standing balance, and epigenetic clock (DunedinPACE) that predicted mortality with nearly the same discriminative accuracy (C-index = 0.63) as the full model including all biomarkers (C-index = 0.65). Among the fourteen investigated biomarkers of aging, DunedinPACE emerged with the strongest and most consistent association with mortality.
RNA-binding proteins (RBPs) play critical roles in the regulation of gene expression. Recent studies have begun to detail the RNA recognition mechanisms of diverse RBPs. However, given the array of RBPs studied so far, it is implausible to experimentally profile RBP-binding peaks for hundreds of RBPs in multiple non-model organisms. Here, we introduce MuSIC (Multi-Species RBP-RNA Interactions using Conservation), a deep learning-based framework for predicting cross-species RBP-RNA interactions by leveraging label smoothing and evolutionary conservation of RBPs across 11 phylogenetically diverse species ranging from human to yeast. MuSIC outperforms state-of-the-art computational methods, and achieves highly accurate prediction of RBP-binding peaks across species. The prediction confidence is higher in the metazoan species, partially reflecting differences in RBP conservation patterns. Finally, the effects of homologous genetic variants on RBP binding can be computationally quantified across species, followed by experimental validations. The target transcripts with disrupted binding events are enriched in the ubiquitination-associated pathways. To summarize, MuSIC provides a useful computational framework for predicting RBP-RNA interactions cross-species and quantifying the effects of genetic variants on RBP binding, offering insights into the RBP-mediated regulatory mechanisms implicated in human diseases.
Acute infectious diseases, particularly lots of neglected tropical diseases (NTDs), pose significant public health challenges, especially in resource-limited settings where diagnostic and surveillance capacities are often inadequate. This scoping review systematically explores methodologies for estimating the burden of acute infectious NTDs, focusing on metrics such as incidence, mortality, and disability-adjusted life years (DALYs). We identified 60 studies, predominantly on malaria and dengue, with a growing emphasis on advanced computational approaches like machine learning and Bayesian geospatial modeling. Key findings highlight the evolution from traditional surveillance-based methods to integrated frameworks incorporating environmental, demographic, and health system covariates. However, challenges persist, including data sparsity, underreporting, and methodological uncertainties. The review underscores the need for improved data integration, standardized frameworks, and interdisciplinary collaboration to enhance the accuracy and utility of burden estimates.
Breast cancer is a leading cause of mortality and morbidity among females worldwide. As part of the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2023, we provided an updated comprehensive assessment of the epidemiological trends, disease burden, and risk factors associated with breast cancer globally, regionally, and nationally from 1990 to 2023. Breast cancer incidence, mortality, prevalence, years lived with disability (YLDs), years of life lost (YLLs), and disability-adjusted life-years (DALYs) were estimated by age and sex for 204 countries and territories from 1990 to 2023. Mortality estimates were generated using GBD Cause of Death Ensemble models, leveraging data from population-based cancer registration systems, vital registration systems, and verbal autopsies. Mortality-to-incidence ratios were calculated to derive both mortality and incidence estimates. Prevalence was calculated by combining incidence and modelled survival estimates. YLLs were established by multiplying age-specific deaths with the GBD standard life expectancy at the age of death. YLDs were estimated by applying disability weights to prevalence estimates. The sum of YLLs and YLDs equalled the number of DALYs. Breast cancer burden attributable to seven risk factors was examined through the comparative risk assessment framework. The GBD forecasting framework was used to forecast breast cancer incidence and mortality from 2024 to 2050. Age-standardised rates were calculated for each metric using the GBD 2023 world standard population. In 2023, there were an estimated 2·30 million (95% uncertainty interval [UI] 2·01 to 2·61) breast cancer incident cases, 764 000 deaths (672 000 to 854 000), and 24·1 million (21·3 to 27·5) DALYs among females globally. In the World Bank low-income group, where a low age-standardised incidence rate (ASIR) was estimated (44·2 per 100 000 person-years [31·2 to 58·4]), the age-standardised mortality rate (ASMR) was the highest (24·1 per 100 000 [16·8 to 31·9]). The highest ASIR was in the high-income group (75·7 per 100 000 [67·1 to 84·0]), and the lowest ASMR was in the upper-middle-income group (11·2 per 100 000 [10·2 to 12·3]). Between 1990 and 2023, the ASIR in the low-income group increased by 147·2% (38·1 to 271·7), compared with a 1·2% (-11·5 to 17·2) change in the high-income group. The ASMR decreased in the high-income group, changing by -29·9% (-33·6 to -25·9), but increased by 99·3% (12·5 to 202·9) in the low-income group. The increase in age-standardised DALY rates followed that of ASMRs. Risk factors such as dietary risks, tobacco use, and high fasting plasma glucose contributed to 28·3% (16·6 to 38·9) of breast cancer DALYs in 2023. The risk factors with a decrease in attributable DALYs between 1990 and 2023 were high alcohol use and tobacco. By 2050, the global incident cases of breast cancer among females were forecast to reach 3·56 million (2·29 to 4·83), with 1·37 million (0·841 to 2·02) deaths. The stable incidence and declining mortality rates of female breast cancer in high-income nations reflect success in screening, diagnosis, and treatment. In contrast, the concurrent rise in incidence and mortality in other regions signals health system deficits. Without effective interventions, many countries will fall short of the WHO Global Breast Cancer Initiative's ambitious target of achieving an annual reduction of 2·5% in age-standardised mortality rates by 2040. The mounting breast cancer burden, disproportionately affecting some of the world's most vulnerable populations, will further exacerbate health inequalities across the globe without decisive immediate action. Gates Foundation, St Jude Children's Research Hospital.
Seascape genomics has rapidly evolved into an integrative field, merging genomics, oceanography, ecology, climate science, and computational modeling to assess the mechanisms that shape marine biodiversity distribution and adaptation. This review traces the evolution of seascape genomics from its roots in population genetics to an interdisciplinary and increasingly integrative science that supports marine management and conservation. This systematic synthesis of 93 empirical studies published between 2006 and 2025 highlights a methodological and international collaboration expansion within seascape genomic studies, while also exposing persistent inequities in geographic representation and gender diversity. Seascape genomics is characterized by a high proportion of women in lead-author roles, signaling a more inclusive trajectory than many related genomic disciplines, even as gender imbalances persist in senior last-author positions. While most studies achieved full methodological and analytical integration, only a few generate decision-support tools for conservation and climate adaptation, but still lack explicit participatory frameworks and stakeholder engagement. The continued development of seascape genomics depends on expanding beyond analytical integration to incorporate participatory, inclusive, and co-designed research practices. Advancing transdisciplinary literacy, equitable leadership and participation, especially in low-and-middle-income countries, and open data infrastructures will be key to realizing the full potential of seascape genomics as a decision-support science and a model for integrative ocean research.
Accurate free energy landscape (FEL) construction is vital for deciphering protein function but is hindered by computational bottlenecks and sampling inefficiencies, often leading to unreliable state characterization. To address this bottleneck, we introduce a conditional variational autoencoder (CVAE)-based framework integrating dimensionality reduction, clustering, and data balancing. By learning conformational distributions in low-dimensional latent space and adjusting weights for undersampled data, our approach enables more accurate FEL estimation and rapid optimization. Our method significantly reduces computational time compared to traditional enhanced sampling, while maintaining comparable accuracy and offering greater generalizability across different protein systems. Validation on four systems, chignolin, adenylate kinase, ribose-binding protein, and c-Abl tyrosine kinase, representing a diverse range of molecular complexity, demonstrates robust FEL resolution across scales. This adaptable, component-based approach makes FEL analysis more accessible for complex molecular systems, facilitating rapid investigation of protein dynamics and creating new opportunities for therapeutic development.
Lung cancer is one of the leading causes of cancer deaths. While low-dose computed tomography (CT) screening improves survival, radiological detection is increasingly challenged by a shortage of radiologists. This study aimed to develop and evaluate a novel, precise, and computationally efficient AI-based algorithm for lung cancer diagnosis using chest CT scans. A total of 156 patient chest CT scans were utilized to form Databases I and II. We then conducted extensive feature extraction [statistics, histograms, Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT), Walsh-Hadamard Transform (WHT)] and optimized classifiers [Multi Layer Perceptron (MLP), Generalized Feed Forward Neural Network (GFF-NN), Modular Neural Network (MNN), Support Vector Machine (SVM)] with genetic algorithms. Performance evaluation measures employed were classification accuracy, Mean Squared Error (MSE), Area under the ROC curve (AUC), and computational efficiency. The MNN (Topology II) classifier employing FFT-based features with momentum learning achieved 100% classification accuracy during cross-validation for both Database I and Database II, consistently yielding perfect average classification accuracy across both datasets. The genetically optimized MNN (Topology II) classifier shows remarkable performance in lung cancer diagnosis from CT scan images. Its ability to achieve perfect classification accuracy suggests strong potential for clinical application, offering both diagnostic precision, acting as a triage, and workload reduction in healthcare settings.
Long Coronavirus disease 2019 (COVID-19), also termed post-acute sequelae of severe acute respiratory syndrome coronavirus 2 infection (PASC), has emerged as a complex multisystem condition in children and adolescents worldwide. It can occur even after mild or asymptomatic acute infections, with symptoms that may persist, fluctuate, or relapse over time. This review aims to comprehensively explore the characteristic manifestations, management and current therapeutic possibilities of pediatric Long COVID-19 (L-C19). A systematic search was conducted in multiple databases such as PubMed, Scopus, Web of Science, and Google Scholar, for literature published between January 2020 and October 2025. Diagnosing pediatric L-C19 is challenging due to the heterogeneity of symptoms and lack of specific diagnostic biomarkers. Most young patients experience gradual improvement over months, but a significant subset remains symptomatic for >1 year with substantial disability, underscoring the need for timely diagnosis and intervention. Current clinical consensus emphasizes an individualized, multidisciplinary management approach focused on symptom relief and functional rehabilitation. No definitive cure exists for L-C19; thus, care is tailored to each patient's predominant issues. Therapeutic strategies combine supportive self-management (e.g. energy conservation and pacing) with both non-pharmacological and pharmacological interventions. Multimodal rehabilitation programs - including graded exercise therapy and cognitive behavioral therapy - have shown promise in improving fatigue, mental health, and overall quality of life. Targeted treatments for specific sequelae (such as autonomic dysfunction or chronic pain) are applied on a case-by-case basis, although high-quality evidence for medications remains limited. Globally, interdisciplinary collaborations have been established to provide harmonized diagnostic and treatment protocols, and major research initiatives are underway to evaluate novel therapies and include children in L-C19 clinical trials. Ongoing international efforts to develop standardized diagnostic tools, outcome measures, and evidence-based interventions are crucial to optimize care and long-term outcomes for children and adolescents affected by L-C19.
The analysis of single-cell RNA sequencing (scRNA-seq) data is beset by formidable hurdles, including large feature space, widespread sparsity, noise contamination, and inter-batch variability, which collectively compromise the accuracy of cell clustering and subsequent downstream analyses. To overcome these obstacles, we present scCMA, a novel computational framework that synergistically combines a discriminative representation learning scheme with a masked reconstruction autoencoder architecture to generate stable and biologically meaningful cell embeddings. The contrastive module sharpens the distinction between cell types by maximizing similarities within types while minimizing them across types, thereby implicitly mitigating batch effects without requiring prior dataset information. Concurrently, the masked autoencoder learns to reconstruct randomly masked gene expression profiles, enabling the model to capture global transcriptional dependencies and identify rare biological features while diminishing the influence of noise and sparsity. Comprehensive evaluations on a diverse array of public datasets reveal that scCMA demonstrates superior performance in improved clustering precision, effectively corrects for batch differences without sacrificing biological variance, and exhibits remarkable proficiency in recognizing rare cellular subsets. Moreover, the embeddings generated by scCMA accurately reflect the temporal progression of cell development, facilitating the faithful modeling of cellular lineage progression.
Grass inflorescence morphology displays remarkable diversity across species and is a key determinant of crop yield. Here, to elucidate how developmental morphodynamics shapes inflorescence architecture, we conducted a comparative analysis of early inflorescence development in bread wheat and rice. Computational modelling revealed that meristem fate transition and primordium initiation modes collectively contribute to the observed architecture diversity. Furthermore, the model elucidates the formation of distinct supernumerary spikelet types in wheat and predicts two independent developmental pathways for generating paired spikelets-a specialized form of inflorescence branching. We also identified a mutant allele, duo2, that results in accelerated developmental progression and demonstrated significant yield improvement in duo2 plants under field conditions. The causal gene RA2-D, an orthologue of maize RAMOSA2 (RA2), was found to regulate floral transition. This study elucidates how perturbations in developmental dynamics drive the diversification of grass inflorescence morphologies.
The structural information on protein-ligand complexes is crucial for small-molecule design and drug discovery. Yet primary resources often have heterogeneous annotations, lack machine-ready ligand categorization, and require substantial postprocessing before large-scale modeling. Here, we present LigandExplorer, an open-source, automated postprocessing pipeline that identifies and extracts covalent and noncovalent ligands from biomolecular complex structures and standardizes outputs for downstream use. Using residue-level graphs built solely from atomic coordinates, LigandExplorer is robust to missing or inconsistent metadata and integrates LightGBM models to classify ligands (peptides, nucleic acids, phospholipids, carbohydrates, organics, and ions) and assess interaction relevance. Because the pipeline is rerunnable, it can be applied to each new databases release to keep derived, categorized data sets current without altering source records. On the PDBbind v2020 refined set, LigandExplorer achieved a 98.38% raw structural agreement under harmonized comparison criteria prior to any manual reconciliation; the remaining discrepancies were analyzed separately and were dominated by divergences between raw RCSB entries and curated PDBBind records. On the PepBDB, LigandExplorer successfully processed 4881 of 5005 complexes, achieving a 97.52% success rate. Most failures reflected upstream record errors, where complex cyclic peptides constituted the primary algorithmic boundary. LigandExplorer thus mitigates data-cleaning burdens and enables rapidly refreshed, standardized data sets for computational modeling and molecular design.
Genomic promoters are crucial gene regulatory elements1,2. Yet, comparative analyses of promoter architecture have been constrained by the limited resolution of GC-rich regions in short-read-based genome resources3-6. The Vertebrate Genomes Project (VGP) provides more complete long-read-based assemblies7, which further detect 5-methylcytosine signals directly from PacBio HiFi circular consensus reads8,9. Here, we developed a scalable computational framework to characterize DNA methylomes from HiFi data on high-quality Phase I VGP assemblies with RefSeq gene annotations for 82 vertebrate species spanning seven major taxonomic classes: mammals, birds, reptiles, amphibians, lobe-finned fishes, ray-finned fishes, and cartilaginous fishes. We observed a conserved, transcription start site-centered hypomethylation signature in promoters across all vertebrates, and an unexpected hypermethylation signature near gene boundaries that is discordant with transcripts. In addition to this conserved pattern, there were lineage-specific differences in promoter methylation profiles, with birds showing the most diverse patterns. These epigenetic landscapes track phylogenetic relationships more closely than tissue-type methylation differences and infer lineage-dependent widths of core promoters and broader promoters across major vertebrate classes. Our findings establish a comparative epigenomic framework for profiling promoter methylomes from long-read sequencing data.