PURPOSE OF REVIEW: Few areas in oncology have witnessed the major paradigm shift that has been noted in the understanding and management of gastrointestinal stromal tumors. This review highlights the progress made over the last 2 years. RECENT FINDINGS: Population-based studies have provided insight into the true incidence of gastrointestinal stromal tumors. Improved understanding of the molecular biology has provided prognostic implications and may guide treatment in the future. More mature follow-up data from phase III trials have proven that the targeted tyrosine kinase inhibitor imatinib mesylate is a dramatically effective agent, but the duration of its benefits are finite, and drug resistance is an increasingly more common phenomenon. Adjuvant and neoadjuvant trials of imatinib are currently underway. A second targeted tyrosine kinase inhibitor, sunitinib malate, has been approved for the treatment of imatinib-resistant gastrointestinal stromal tumors after recent encouraging results. Finally, the success with imatinib and sunitinib has encouraged investigators to reevaluate the role of surgery in advanced gastrointestinal stromal tumors. SUMMARY: The multidisciplinary management of gastrointestinal stromal tumors serves as a model of how new targeted molecular therapies can be combined with traditional treatment modalities to improve survival in advanced malignancies.
Comprehensive analysis of omics data, such as genome, transcriptome, proteome, metabolome, and interactome, is a crucial technique for elucidating the complex mechanism of cancer onset and progression. Recently, a variety of new findings have been reported based on multi-omics analysis in combination with various clinical information. However, integrated analysis of multi-omics data is extremely labor intensive, making the development of new analysis technology indispensable. Artificial intelligence (AI), which has been under development in recent years, is quickly becoming an effective approach to reduce the labor involved in analyzing large amounts of complex data and to obtain valuable information that is often overlooked in manual analysis and experiments. The use of AI, such as machine learning approaches and deep learning systems, allows for the efficient analysis of massive omics data combined with accurate clinical information and can lead to comprehensive predictive models that will be desirable for further developing individual treatment strategies of immunotherapy and molecular target therapy. Here, we aim to review the potential of AI in the integrated analysis of omics data and clinical information with a special focus on recent advances in the discovery of new biomarkers and the future direction of personalized medicine in non-small lung cancer.
The WHO classification of digestive system tumours presented in the first volume of the WHO classification of tumours series, 5th edition, reflects important advancements in our understanding of tumours of the digestive system (Table 1). For the first time, certain tumour types are defined as much by their molecular phenotype as their histological characteristics; however, in most instances histopathological classification remains the gold standard for diagnosis. The WHO classification of tumours series is designed to be used worldwide, including those settings where a lack of tissue samples or of specific technical facilities limits the pathologist's ability to rely on molecular testing. Since the publication of the 4th-edition digestive system tumours volume in 2010,1 there have been important developments in our understanding of the aetiology and pathogenesis of many tumours. However, the extent to which this new information has altered clinical practice has been quite variable. For some of the tumours described in this volume there is little molecular pathology in clinical use, despite the fact that we now have a more detailed understanding of their molecular pathogenesis. A tumour's molecular pathology, as defined for the purposes of this publication, concerns the molecular markers that are relevant to the tumour's diagnosis, biological behaviour, outcome and treatment, rather than its molecular pathogenesis. However, the role of molecular pathology is expanding; for some tumour entities, molecular analysis is now essential for establishing an accurate diagnosis. Some of these analyses require investigation of somatic (acquired) genetic alterations, gene or protein expression, or even circulating tumour markers. For certain tumour types, specific analytical tests are needed to predict prognosis or tumour progression, and these tests are carefully outlined in this volume. In the following paragraphs, we have summarised some of the more notable changes since the 4th edition. In instances where the new WHO classification of tumours editorial board determined that there was insufficient evidence of the diagnostic or clinical relevance of new information about a particular tumour entity, the position held in the 4th edition has been maintained as the standard in the new volume. There has been substantial progress in our understanding of the development of glandular oesophageal neoplasia and the sequential neoplastic progression from inflammation to metaplasia (Barrett's oesophagus), dysplasia and, ultimately, adenocarcinoma. This process is initially driven by gastro-oesophageal reflux disease, which leads to reprogramming of cell differentiation and proliferation in the oesophagus. There is evidence that TP53 mutation in proliferating epithelium leads to high-grade dysplasia, while SMAD4 mutation precedes the development of invasive carcinoma. While demonstration of these mutations is not required clinically, testing oesophageal and gastric adenocarcinomas for ERBB2 [human epidermal growth factor receptor 2 (HER2)] is recommended, as this influences treatment decisions. The pathogenesis of precursor lesions is less clear in oesophageal squamous carcinogenesis than in gastric carcinogenesis. Environmental factors are believed to play an important role, but the mechanisms of neoplastic change as a result of specific factors, such as tobacco use and alcohol consumption, are poorly understood. For example, human papillomavirus (HPV) infection was initially believed to play a key role in squamous carcinogenesis, but recent evidence suggests that there is no such association in most cases of oesophageal squamous cell carcinoma. The molecular pathway of cancer progression in the stomach is less clear. Most epidemic gastric cancers are now considered inflammation-driven, and their aetiology is characteristically environmental – usually related to Helicobacter pylori infection. It is because of this infectious aetiology that gastric cancer is included among the limited number of highly lethal, but preventable, cancers. Chronic gastric inflammation leads to changes in the microenvironment (including the microbiome) that results in mucosal atrophy/metaplasia, which may then progress to neoplasia after further molecular alterations. Metaplastic changes in the upper gastrointestinal tract are well-recognised as early cancer precursors, but their precise molecular mechanisms and the exact role of progenitor cells in the oncogenic cascade remain a subject of intense investigation. For some rare tumours, distinctive driver mutations have been identified; for example, the characteristic MALAT1–GLI1 fusion gene in gastroblastoma and EWSR1 fusions in gastrointestinal clear cell sarcoma and malignant gastrointestinal neuroectodermal tumour. In both examples, demonstration of the fusion gene is now required for the diagnosis. The pathogenesis of adenocarcinomas of the intestines (the small and large bowel and the appendix) is now much better delineated than it was a decade ago. The introduction of population-based screening for colorectal cancer has laid the foundation for a better understanding of neoplastic precursor lesions and the molecular pathways associated with each type of tumour. For example, our knowledge of the molecular pathways and biological behaviour of conventional adenomas and serrated precursor lesions, including the recently renamed sessile serrated lesion (formerly called sessile serrated polyp/adenoma), has grown rapidly in the past decade, and this has enabled clinicians to provide tailored, evidence-driven screening and surveillance programmes. Colorectal cancers, in which it will make a difference to patient treatment, should undergo molecular testing for microsatellite instability and extended RAS testing for mutations in KRAS, NRAS and BRAF. Our understanding of appendiceal tumours has also improved. For example, we now know that many tumours of the appendix develop via neoplastic precursor lesions similar to those in the small and large intestines, and the biological potential and molecular pathways of appendiceal tumours are therefore much better appreciated. The recently renamed goblet cell adenocarcinoma (formerly called goblet cell carcinoid/carcinoma) of the appendix is a prime example of a tumour whose biological potential and histological characteristics have been better described, resulting in improvements in the pathological approach to these tumours. Studies of the aetiology and pathogenesis of anal squamous lesions suggests that HPV infection plays an important aetiological role, driving genetic alterations similar to those in cervical cancer. p16 and HPV testing are recommended for such lesions. One particularly important change in the 5th edition is in the classification of neuroendocrine neoplasms (NENs), which occur in multiple sites throughout the body. In this volume, NENs are covered within each organ-specific chapter, including the chapter on tumours of the pancreas, where detailed sections describing each functioning and non-functioning subtype are provided. Previously, these neoplasms were covered only in the volume on tumours of endocrine organs.2 The general principles guiding the classification of all NENs are presented in a separate introduction to this topic (Table 2). To consolidate our increased understanding of the genetics of these neoplasms, a group of experts met for a consensus conference at the International Agency for Research on Cancer (IARC) in November 2017 and subsequently published a paper in which they proposed distinguishing between well-differentiated neuroendocrine tumours (NETs) and poorly differentiated neuroendocrine carcinomas (NECs) in all sites where these neoplasms arise.3 NEN are divided into NET and NECs, based on their molecular differences. Mutations in MEN1, DAXX and ATRX are entity-defining for well-differentiated NETs, whereas NECs usually have TP53 or RB1 mutations. In some cases, these mutations can be of diagnostic benefit. Genomic data have also led to a change in the classification of mixed NENs, which are now grouped into the conceptual category of ‘mixed neuroendocrine–non-neuroendocrine neoplasms (MiNENs)’. Mixed adenoneuroendocrine carcinomas (MANECs), which show genomic alterations similar to those of adenocarcinomas or NECs rather than NETs, probably reflect clonal evolution within the tumours, which is a rapidly growing area of interest. The study of these mixed carcinomas may also lead to an improved understanding of other facets of clonality in tumours of the digestive system and other parts of the body. Another important change concerns the recognition that well-differentiated NETs may be high grade (G3 in the WHO grading system, defined as having a mitotic rate >20 per 2 mm2 or Ki67 >20%), but these neoplasms remain well-differentiated genetically and distinct from poorly differentiated NECs. G3 NETs were first recognised and are most common in the pancreas, but they can occur throughout the GI tract. Thus, the current WHO classification includes three grades (G1, G2 and G3) for NETs. NECs are no longer graded, as they are recognised to be uniformly high grade by definition, but continue to be separated into small-and large-cell types. There are certain terms in current day-to-day use about which many pathologists continue to disagree. The editorial board carefully considered our current understanding of carcinogenetic pathways when considering the use of specific terms and definitions. In general, the overall consensus was that established terms, definitions and criteria should not be changed unless there was strong evidence to support doing so and the proposed changes had clinical relevance. For some tumours, our understanding of the progression from normal epithelium to metastatic carcinoma remains inadequate. For example, in certain tumours the line between benign and malignant can be ambiguous, and in some cases the distinction is more definitional than biological. These are some of the many areas of tumour biology that need to be more fully investigated in the future. In the 5th edition, the terminology for precursors to invasive carcinoma in the digestive system has been standardised somewhat, although the terms ‘dysplasia’ and ‘intra-epithelial neoplasia’ are both still considered acceptable for lesions in certain anatomical locations, in acknowledgement of their ongoing clinical acceptance. For example, the term ‘dysplasia’ is preferred for lesions in the tubular gut, whereas ‘intra-epithelial neoplasia’ is preferred for those in the pancreas, gallbladder and biliary tree. For all anatomical sites, however, a two-tiered system (low- versus high-grade) is considered the standard grading system for neoplastic precursor lesions. This has replaced the three-tiered grading scheme previously used for lesions in the pancreatobiliary system.4 The term ‘carcinoma in situ’ continues to be strongly discouraged in clinical practice for a variety of reasons, most notably its clinical ambiguity. This term is encompassed by the category of high-grade dysplasia/intraepithelial neoplasia. Many refinements of the 4th-edition classification have been made concerning liver tumours, supported by novel molecular findings. For example, a comprehensive picture of the molecular changes that occur in common hepatocellular carcinoma has recently emerged from large-scale molecular profiling studies. Meanwhile, several rarer hepatocellular carcinoma subtypes, which together may account for 20–30% of cases, have been defined by consistent morphomolecular and clinical features, with fibrolamellar carcinoma and its diagnostic DNAJB1–PRKACA translocation being one prime example. Intrahepatic cholangiocarcinoma is now understood to be an anatomically defined entity with two different major subtypes: a large duct type, which resembles extrahepatic cholangiocarcinoma, and a small duct type, which shares significant aetiological, pathogenetic and imaging characteristics with hepatocellular carcinoma. The two subtypes have very different aetiologies, molecular alterations, growth patterns and clinical behaviours, exemplifying the conflict between anatomically and histogenetically/pathogenetically based classifications. Clinical research and study protocols will need to incorporate these findings in the near future. Also supported by molecular findings, the definition of combined hepatocellular–cholangiocarcinoma and its distinction from other entities has recently become clearer. Cholangiolocellular carcinoma is no longer considered a subtype of combined hepatocellular–cholangiocarcinoma, but rather a subtype of small duct intrahepatic cholangiocarcinoma, renamed cholangiolocarcinoma, meaning that all intrahepatic carcinomas with a ductal or tubular phenotype are now included within the category of intrahepatic cholangiocarcinoma. A classic example of morphology-based molecular profiling leading to a new classification based on a combination of biological and molecular factors is the classification of hepatocellular adenomas, which has gained a high degree of clinical relevance and has fuelled the implementation of refined morphological criteria and molecular testing in routine diagnostics. Most of the classification of pancreatic neoplasms in the 5th edition remains unchanged from the last volume. As highlighted above, precursor lesions including pancreatic intraepithelial neoplasia, intraductal papillary mucinous neoplasms and mucinous cystic neoplasms are now classified into two tiers of dysplasia, based on the highest grade of dysplasia detected, rather than the three-tier system used in the last edition of the WHO classification. Intraductal oncocytic papillary neoplasm and intraductal tubulopapillary neoplasms are now separated from the other subtypes of intraductal papillary mucinous neoplasm based on their distinct genomic and morphological features. The prior entity of acinar cell cystadenoma, which has recently been demonstrated to be non-neoplastic by molecular clonality analysis, is now termed ‘acinar cystic transformation of the pancreas’. Also, the entire spectrum of pancreatic neuroendocrine neoplasms is now included in this volume; previously, details concerning the individual functional types were presented in the WHO classification of tumours of the endocrine organs. Mixed tumours in several anatomical sites (e.g. oesophageal adenosquamous carcinoma and mucoepidermoid carcinoma, as well as hepatic carcinomas with mixed hepatocellular and cholangiocellular differentiation), remain subjects of some uncertainty. The relative importance of the various lineages of differentiation within these neoplasms remains unknown. It is also uncertain how these neoplasms develop and how they should be treated. These issues are a matter of debate because hard evidence is lacking, but there are improvements in the pathological criteria and classification of these neoplasms that should help to standardise the diagnostic approach and facilitate better clinical and genomic research. Each of these tumour types is grouped together in separate chapters. This ensures consistency and avoids duplication. The term ‘EBV positive inflammatory follicular dendritic cell sarcoma of the digestive tract’ has been adopted to replace the entity previously known as ‘inflammatory pseudotumour-like fibroblastic/follicular dendritic cell tumour’. New in this book is the chapter on genetic tumour syndromes of the digestive system, the introduction to which contains a table that lists each of the major syndromes and summarises key information about the disease/phenotype, pattern of inheritance, causative gene(s) and normal function of the encoded protein(s). Common syndromes, including Lynch syndrome and familial adenomatous polyposis 1 (FAP), are covered in detail, as well as several other adenomatous polyposes defined since the last volume and the GAPPS (gastric adenocarcinoma and proximal polyposis of the stomach) syndrome, now recognised as a FAP variant, with a unique phenotype. A number of other genetic tumour predisposition syndromes that confer a raised risk of various gastrointestinal tumours are also described, including Li–Fraumeni syndrome, hereditary haemorrhagic telangiectasia, syndromes associated with gastroenteropancreatic NETs and multilocus inherited neoplasia alleles syndrome. This should be helpful to many involved in the diagnosis of such syndromes, as well as those researching the mechanisms involved. The format of the books has been updated to reflect the new edition of the classification: the move from three to two columns has allowed larger illustrations, and the use of set headings for each tumour type show very clearly where evidence is lacking. The content of this article represents the personal views of the authors and does not represent the views of the authors' employers and associated institutions. Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer/World Health Organization. I.D.N. reports that her institute benefits from research funding from the Dutch Cancer Society (KWF) and the Dutch Digestive Foundation (MLDS). No other authors report any conflicts of interest to IARC that would affect their participation in forming the classification.
Reproducibility is a current concern for everyone involved in the conduct and publication of biomedical research. Recent attempts testing reproducibility, particularly the reproducibility project in cancer biology published in elife (https://elifesciences.org/collections/9b1e83d1/reproducibility-project-cancer-biology), have exposed major difficulties in repeating published preclinical experimental work. It is thought that some of these difficulties relate to uncertainty about the provenance of tools, lack of clarity in methodology and use of inappropriate approaches for analysis; the latter particularly related to untoward manipulation of images. In the past, some of these so-called untoward practices were considered the ‘norm’; however, today, the landscape is different. The expectations, not only of the readers of the published scientific word but also of the publishers and funders of research, have changed. This collective group now expects that any published data should be reproducible; but for this to be possible, experimental detail, confirmation of selectivity and quality of reagents/tools, analytical and statistical methods used need to be described adequately. Two powerful methodologies often used to support researchers' findings allow the detection of changes in protein expression, that is, immunoblotting (widely known as Western blotting) and immunohistochemistry. Undeniably, as a result of unintentional mistakes (often related to lack of antibody specificity; Baker, 2015), but, in some cases, deliberate alterations and questionable interpretations of results, the use of these two methods has led to many high profile retractions. Indeed, such images have driven the retractions that have occurred in BJP over the last two years. Today, immunoblotting and immunohistochemistry serve as primary methodologies for the detection and quantification of molecular signalling pathways and identification of therapeutic targets. This necessitates clear guidance for the application of these techniques, the need for controls (both positive and negative) and the most appropriate methods for quantification. Indeed, this need has spawned a number of initiatives to support researchers in assessing the validity of antibody resources including antibodypedia (Bjorling and Uhlen, 2008) and the resources available within ‘The Human Protein Atlas’ (Thul et al., 2017). The aim of this article is to outline the rationale for, and the expectations of, the BJP with respect to work published in the Journal that includes immunoblotting or immunohistochemical data. In creating these guidelines, our aim is to reduce potential misinterpretations and to maximise the communication and transparency of essential information, particularly with respect to the methodologies employed. We have generated the guidelines below for the benefit of authors, editors and reviewers. While we recognise other recently published guidelines (Uhlen et al., 2016) and indeed we have incorporated some of the advice provided in such reports, we focus, here, on the evidence required for publication in BJP. These guidelines join a series published in BJP regarding the reporting of animal experiments through adoption of the ARRIVE guidelines (McGrath et al., 2015; McGrath and Lilley, 2015), experimental design and analysis (Curtis et al., 2015) and data sharing and presentation (George et al., 2017) in preclinical pharmacology. We would be delighted if other journals were to also use these guidelines. If an antibody has been characterised previously, a citation must be included. This characterization should fulfil the requirements detailed in the recent guidelines generated by ‘The International Working Group for Antibody Validation’ and should take the form of at least, but preferably more than, one of the ‘five conceptual pillars of antibody validation’ described in the guidelines (Uhlen et al., 2016). We also ask authors to check currently available databases for known issues with selectivity. Currently, we highly recommend the NIH-established Research Resource Identification Protocol (RRID) at https://scicrunch.org/resources/Cell%20Lines/search, which provides a unique identifier for the antibody that enables reproducibility studies through clear indication of provenance. Although encouraged, the inclusion of an RRID is not currently mandatory for BJP; we envisage that a similar level of individualization will become essential for publication in the near future. To support the validity of conclusions emanating from immunoblotting or immunohistochemical observations (as with the majority of biological assays), it is important to conduct both positive and negative controls – although for some proteins, certain positive and negative controls are unknown or such materials are not available, thereby precluding such assessments. However, there are a number of other ‘method’ controls that are always possible and which should be included in assay protocols. We recognise that there may be no perfect control that provides confidence beyond doubt (Torlakovic et al., 2015). In this digital age, a growing concern amongst the scientific community is that images for publication are not being accurately presented, that is, the problem of unintentional and/or inappropriate manipulation of images (Cromey, 2010). For example, while it is usually acceptable to crop an image to simplify the information, choosing to crop out oversaturated regions and/or areas/bands displaying non-specific immunoreactivity is not acceptable. BJP requires submission of full immunoblot scans and immunohistochemical/fluorescent images, from which figures have been generated. These scans/images should be included as an additional file for the review process and will be used by the Editors and reviewers when assessing the manuscript, but need not be published. In instances where uncertainty occurs regarding image compilation and assembly during the review process, the BJP Editorial Office makes use of freely available Office of Research Integrity Forensic tools (https://ori.hhs.gov/forensic-tools), developed by the US Department of Health and Human Services. Conforming to these stipulations will ensure that reviewers and readers can confirm that the band identified is at the correct molecular size (or allow some interpretation of post-translational modifications, for example) and to determine the selectivity of the antibody used. For any antibody (including secondary antibodies) used, the Methods section should include the following:
INTRODUCTION: Breast cancer remains a significant scientific, clinical and societal challenge. This gap analysis has reviewed and critically assessed enduring issues and new challenges emerging from recent research, and proposes strategies for translating solutions into practice. METHODS: More than 100 internationally recognised specialist breast cancer scientists, clinicians and healthcare professionals collaborated to address nine thematic areas: genetics, epigenetics and epidemiology; molecular pathology and cell biology; hormonal influences and endocrine therapy; imaging, detection and screening; current/novel therapies and biomarkers; drug resistance; metastasis, angiogenesis, circulating tumour cells, cancer 'stem' cells; risk and prevention; living with and managing breast cancer and its treatment. The groups developed summary papers through an iterative process which, following further appraisal from experts and patients, were melded into this summary account. RESULTS: The 10 major gaps identified were: (1) understanding the functions and contextual interactions of genetic and epigenetic changes in normal breast development and during malignant transformation; (2) how to implement sustainable lifestyle changes (diet, exercise and weight) and chemopreventive strategies; (3) the need for tailored screening approaches including clinically actionable tests; (4) enhancing knowledge of molecular drivers behind breast cancer subtypes, progression and metastasis; (5) understanding the molecular mechanisms of tumour heterogeneity, dormancy, de novo or acquired resistance and how to target key nodes in these dynamic processes; (6) developing validated markers for chemosensitivity and radiosensitivity; (7) understanding the optimal duration, sequencing and rational combinations of treatment for improved personalised therapy; (8) validating multimodality imaging biomarkers for minimally invasive diagnosis and monitoring of responses in primary and metastatic disease; (9) developing interventions and support to improve the survivorship experience; (10) a continuing need for clinical material for translational research derived from normal breast, blood, primary, relapsed, metastatic and drug-resistant cancers with expert bioinformatics support to maximise its utility. The proposed infrastructural enablers include enhanced resources to support clinically relevant in vitro and in vivo tumour models; improved access to appropriate, fully annotated clinical samples; extended biomarker discovery, validation and standardisation; and facilitated cross-discipline working. CONCLUSIONS: With resources to conduct further high-quality targeted research focusing on the gaps identified, increased knowledge translating into improved clinical care should be achievable within five years.
Achieving complete reproducibility in science, particularly in research fields such as biodiversity, is challenging due to analytical choices, bias and interpretation. Here, we examine examples of reproducibility in biological systematics, ecology, and molecular biology. To mitigate the impact of interpretation and analytical choices, Artificial Intelligence (AI) has provided potential tools. In the present work, while emphasizing the need for methodological rigor and transparency, we acknowledge the role of interpretation in activities such as coding biological characters and detecting morphological patterns in nature. We explore the opportunities and limitations associated with the synergy between big data and AI in molecular biology, emphasizing the need for a more comprehensive and integrated approach based on dataset quality and usefulness. We conclude by advocating for AI-based tools to assist biologists, reinforcing consilience as a criterion for scientific validity without hindering scientific progress.
The field of dendritic cell (DC) biology is robust, with several new approaches to analyze their role in vivo and many newly recognized functions in the control of immunity and tolerance. There also is no shortage of mysteries and challenges. To introduce this volume, I would like to summarize four interfaces of DC research with other lines of investigation and highlight some current issues. One interface is with hematopoiesis. DCs constitute a distinct lineage of white blood cell development with some unique features, such as their origin from both lymphoid and myeloid progenitors, the existence of several distinct subsets, and an important final stage of differentiation termed "maturation," which occurs in response to inflammation and infection, and is pivotal for determining the subsequent immune response. A second interface is with lymphocyte biology. DCs are now known to influence many different classes of lymphocytes (B, NK, NKT) and many types of T cell responses (Th1/Th2, regulatory T cells, peripheral T cell deletion), not just the initial priming or induction of T cell-mediated immunity, which was the first function to be uncovered. DCs are sentinels, controlling many of the afferent or inductive limbs of immune function, alerting the immune system and controlling its early decisions. A third interface is with cell biology. This is a critical discipline to understand at the subcellular and molecular levels the distinct capacities of DCs to handle antigens, to move about the body in a directed way, to bind and activate lymphocytes, and to exert many quality controls on the type of responses, for both tolerance and immunity. A fourth interface is with medicine. Here DCs are providing new approaches to disease pathogenesis and therapy. This interface is perhaps the most demanding, because it requires research with humans. Human research currently is being slowed by the need to deal with many challenges in the design of such studies, and the need to excite, attract and support the young scientists who are essential to move human investigation forward. Nonetheless, DCs are providing new opportunities to study patients and the many clinical conditions that involve the immune system.
Understanding the biological mechanisms of disease is crucial for medicine, and in particular, for drug discovery. AI-powered analysis of genome-scale biological data holds great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving single-cell foundation models. First, we scaled the pre-training data to a diverse collection of 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the \model family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on several downstream evaluation tasks, including identifying the underlying disease state of held-out donors not seen during training, distinguishing between diseased and healthy cells for disease conditions and
In this paper, we propose and study several inverse problems of determining unknown parameters in nonlocal nonlinear coupled PDE systems, including the potentials, nonlinear interaction functions and time-fractional orders. In these coupled systems, we enforce non-negativity of the solutions, aligning with realistic scenarios in biology and ecology. There are several salient features of our inverse problem study: the drastic reduction in measurement/observation data due to averaging effects, the nonlinear coupling between multiple equations, and the nonlocality arising from fractional-type derivatives. These factors present significant challenges to our inverse problem, and such inverse problems have never been explored in previous literature. To address these challenges, we develop new and effective schemes. Our approach involves properly controlling the injection of different source terms to obtain multiple sets of mean flux data. This allows us to achieve unique identifiability results and accurately determine the unknown parameters. Finally, we establish a connection between our study and practical applications in biology, further highlighting the relevance of our work in real-
Molecular conduction channels between two ferromagnetic electrodes can produce strong exchange coupling and dramatic effect on the spin transport, thus enabling the realization of novel logic and memory devices. However, fabrication of molecular spintronics devices is extremely challenging and inhibits the insightful experimental studies. Recently, we produced Multilayer Edge Molecular Spintronics Devices (MEMSDs) by bridging the organometallic molecular clusters (OMCs) across a ~2 nm thick insulator of a magnetic tunnel junction (MTJ), along its exposed side edges. These MEMSDs exhibited unprecedented increase in exchange coupling between ferromagnetic films and dramatic changes in the spin transport. This paper focuses on the dramatic current suppression phenomenon exhibited by MEMSDs at room temperature. In the event of current suppression, the effective MEMESDs' current reduced by as much as six orders in magnitude as compared to the leakage current level of a MTJ test bed. In the suppressed current state, MEMSD's transport could be affected by the temperature, light radiation, and magnetic field. In the suppressed current state MEMSD also showed photovoltaic effect. This study
BACKGROUND: The goal of the first BioCreAtIvE challenge (Critical Assessment of Information Extraction in Biology) was to provide a set of common evaluation tasks to assess the state of the art for text mining applied to biological problems. The results were presented in a workshop held in Granada, Spain March 28-31, 2004. The articles collected in this BMC Bioinformatics supplement entitled "A critical assessment of text mining methods in molecular biology" describe the BioCreAtIvE tasks, systems, results and their independent evaluation. RESULTS: BioCreAtIvE focused on two tasks. The first dealt with extraction of gene or protein names from text, and their mapping into standardized gene identifiers for three model organism databases (fly, mouse, yeast). The second task addressed issues of functional annotation, requiring systems to identify specific text passages that supported Gene Ontology annotations for specific proteins, given full text articles. CONCLUSION: The first BioCreAtIvE assessment achieved a high level of international participation (27 groups from 10 countries). The assessment provided state-of-the-art performance results for a basic task (gene name finding and normalization), where the best systems achieved a balanced 80% precision / recall or better, which potentially makes them suitable for real applications in biology. The results for the advanced task (functional annotation from free text) were significantly lower, demonstrating the current limitations of text-mining approaches where knowledge extrapolation and interpretation are required. In addition, an important contribution of BioCreAtIvE has been the creation and release of training and test data sets for both tasks. There are 22 articles in this special issue, including six that provide analyses of results or data quality for the data sets, including a novel inter-annotator consistency assessment for the test set used in task 2.
Modulation of the properties of membrane ion channels is of fundamental importance for the regulation of neuronal electrical activity and of higher neural functions. Among the many potential molecular mechanisms for modulating the activity of membrane proteins such as ion channels, protein phosphorylation has been chosen by cells to play a particularly prominent part. This is not surprising given the central role of protein phosphorylation in a wide variety of cellular, metabolic, and signaling processes (26, 27, 48). As summarized here, regulation by phosphorylation is not restricted to one or another class of ion channel; rather, many, and perhaps all, ion channels are subject to modulation by phosphorylation. Similarly, a number of different protein kinase signaling pathways can participate in the regulation of ion channel properties, and it is not unusual to find that a particular channel is modulated by several different protein kinases, each influencing channel activity in a unique way. Finally, the biophysical mechanisms of modulation also exhibit a striking diversity that ranges from changes in desensitization rates to shifts in the voltage dependence and kinetics of channel activation and inactivation. The convergence of channel molecular biology with patch-clamp technology has been spectacularly productive, even allowing the identification of particular amino acid residues in ion channel proteins that participate in specific modulatory changes in channel biophysical properties. This task is far from complete, and no doubt there remain surprises in store for us, but nevertheless it is appropriate to ask where we go from here. One important direction will be to relate functional modulation, produced by phosphorylation, to changes in the three-dimensional structure of the ion channel protein. Unfortunately, structural studies of membrane proteins are extremely difficult, and to date there is no high resolution structure available for any ion channel protein. A complementary strategy that is more feasible with current technology is to investigate the ways in which channel modulation contributes to the regulation of cellular physiology. Novel computational approaches are being brought to bear on this complex issue, and their combination with channel molecular biology and biophysics should significantly advance our understanding of molecular mechanisms of neuronal plasticity.
Systems biology relies on mathematical models that often involve complex and intractable likelihood functions, posing challenges for efficient inference and model selection. Generative models, such as normalizing flows, have shown remarkable ability in approximating complex distributions in various domains. However, their application in systems biology for approximating intractable likelihood functions remains unexplored. Here, we elucidate a framework for leveraging normalizing flows to approximate complex likelihood functions inherent to systems biology models. By using normalizing flows in the Simulation-based inference setting, we demonstrate a method that not only approximates a likelihood function but also allows for model inference in the model selection setting. We showcase the effectiveness of this approach on real-world systems biology problems, providing practical guidance for implementation and highlighting its advantages over traditional computational methods.
The understanding of molecular cell biology requires insight into the structure and dynamics of networks that are made up of thousands of interacting molecules of DNA, RNA, proteins, metabolites, and other components. One of the central goals of systems biology is the unraveling of the as yet poorly characterized complex web of interactions among these components. This work is made harder by the fact that new species and interactions are continuously discovered in experimental work, necessitating the development of adaptive and fast algorithms for network construction and updating. Thus, the "reverse-engineering" of networks from data has emerged as one of the central concern of systems biology research. A variety of reverse-engineering methods have been developed, based on tools from statistics, machine learning, and other mathematical domains. In order to effectively use these methods, it is essential to develop an understanding of the fundamental characteristics of these algorithms. With that in mind, this chapter is dedicated to the reverse-engineering of biological systems. Specifically, we focus our attention on a particular class of methods for reverse-engineering, namely th
The molecular machinery of life is largely created via self-organisation of individual molecules into functional assemblies. Minimal coarse-grained models, where a whole macromolecule is represented by a small number of particles, can be of great value in identifying the main driving forces behind self-organisation in cell biology. Such models can incorporate data from both molecular and continuum scales, and their results can be directly compared to experiments. Here we review the state of the art of models for studying the formation and biological function of macromolecular assemblies in cells. We outline the key ingredients of each model and their main findings. We illustrate the contribution of this class of simulations to identifying the physical mechanisms behind life and diseases, and discuss their future developments.
In a recent paper, Wilmes et al. demonstrated a qualitative integration of omics data streams to gain a mechanistic understanding of cyclosporine A toxicity. One of their major conclusions was that cyclosporine A strongly activates the nuclear factor (erythroid-derived 2)-like 2 pathway (Nrf2) in renal proximal tubular epithelial cells exposed in vitro. We pursue here the analysis of those data with a quantitative integration of omics data with a differential equation model of the Nrf2 pathway. That was done in two steps: (i) Modeling the in vitro pharmacokinetics of cyclosporine A (exchange between cells, culture medium and vial walls) with a minimal distribution model. (ii) Modeling the time course of omics markers in response to cyclosporine A exposure at the cell level with a coupled PK-systems biology model. Posterior statistical distributions of the parameter values were obtained by Markov chain Monte Carlo sampling. Data were well simulated, and the known in vitro toxic effect EC50 was well matched by model predictions. The integration of in vitro pharmacokinetics and systems biology modeling gives us a quantitative insight into mechanisms of cyclosporine A oxidative-stress
Dynamical systems modeling, particularly via systems of ordinary differential equations, has been used to effectively capture the temporal behavior of different biochemical components in signal transduction networks. Despite the recent advances in experimental measurements, including sensor development and '-omics' studies that have helped populate protein-protein interaction networks in great detail, modeling in systems biology lacks systematic methods to estimate kinetic parameters and quantify associated uncertainties. This is because of multiple reasons, including sparse and noisy experimental measurements, lack of detailed molecular mechanisms underlying the reactions, and missing biochemical interactions. Additionally, the inherent nonlinearities with respect to the states and parameters associated with the system of differential equations further compound the challenges of parameter estimation. In this study, we propose a comprehensive framework for Bayesian parameter estimation and complete quantification of the effects of uncertainties in the data and models. We apply these methods to a series of signaling models of increasing mathematical complexity. Systematic analysis o
The central dogma of molecular biology, formulated more than five decades ago, compartmentalized information exchange in the cell into the DNA, RNA and protein domains. This formalization has served as an implicit thematic distinguisher for cell biological research ever since. However, a clear account of the distribution of research across this formalization over time does not exist. Abstracts of >3.5 million publications focusing on the cell from 1975 to 2011 were analyzed for the frequency of 100 single-word DNA-, RNA- and protein-centric search terms and amalgamated to produce domain- and subdomain-specific trends. A preponderance of protein- over DNA- and in turn over RNA-centric terms as a percentage of the total word count is evident until the early 1990s, at which point the trends for protein and DNA begin to coalesce while RNA percentages remain relatively unchanged. This term-based census provides a yearly snapshot of the distribution of research interests across the three domains of the central dogma of molecular biology. A frequency chart of the most dominantly-studied elements of the periodic table is provided as an addendum.
A number of models in mathematical epidemiology have been developed to account for control measures such as vaccination or quarantine. However, COVID-19 has brought unprecedented social distancing measures, with a challenge on how to include these in a manner that can explain the data but avoid overfitting in parameter inference. We here develop a simple time-dependent model, where social distancing effects are introduced analogous to coarse-grained models of gene expression control in systems biology. We apply our approach to understand drastic differences in COVID-19 infection and fatality counts, observed between Hubei (Wuhan) and other Mainland China provinces. We find that these unintuitive data may be explained through an interplay of differences in transmissibility, effective protection, and detection efficiencies between Hubei and other provinces. More generally, our results demonstrate that regional differences may drastically shape infection outbursts. The obtained results demonstrate the applicability of our developed method to extract key infection parameters directly from publically available data so that it can be globally applied to outbreaks of COVID-19 in a number
Abstract The study of population genetics of invasive species offers opportunities to investigate rapid evolutionary processes at work, and while the ecology of biological invasions has enjoyed extensive attention in the past, the recentness of molecular techniques makes their application in invasion ecology a fairly new approach. Despite this, molecular biology has already proved powerful in inferring aspects not only relevant to the evolutionary biologist but also to those concerned with invasive species management. Here, we review the different molecular markers routinely used in such studies and their application(s) in addressing different questions in invasion ecology. We then review the current literature on molecular genetic studies aimed at improving management and the understanding of invasive species by resolving of taxonomic issues, elucidating geographical sources of invaders, detecting hybridisation and introgression, tracking dispersal and spread and assessing the importance of genetic diversity in invasion success. Finally, we make some suggestions for future research efforts in molecular ecology of biological invasions.