Clavibacter nebraskensis (Cn) is a Gram-positive bacterial plant pathogen that infects the vascular tissue of maize, causing Goss's Wilt and Leaf Blight. Detection of highly virulent strains since the mid-2000s suggests that genetic divergence has contributed to the re-emergence of the disease following its effective management and subsequent disappearance in the 1980s. However, few genomic resources exist for this pathogen, and little is known about genomic variation in the species. Our study aimed to characterize genomic diversity across a set of differentially virulent strains isolated from both historic epidemics and recent outbreaks. We sequenced 17 Cn strains of varying virulence isolated from epidemics spanning their earliest detection to subsequent outbreaks in the mid-2010s across various US states and Manitoba, Canada. Contrary to previous studies, we found no strong population structure associated with geographic origin or year of isolation based on haplotype analysis. We detected five heterogeneous plasmids in strains CDK032, CDK039, CDK044, CDK045, and CDK046, whereas to date, only one plasmid has been sequenced in a single Cn strain. Genomic islands were detected in all strains, and the putative virulence gene celA was found to be encoded in one such island. Secreted CAZymes, hypothesized to be involved in pathogenicity, are well conserved across all Cn genomes, indicating that allelic or expression diversity, and not gene copy number, might be implicated in virulence diversity. Importantly, we identified deletions in a secreted cellulase within the linker peptide region between the carbohydrate-binding domain and the cellulase domain that have not been reported in the literature previously. Our analyses provide insight into genomic diversity within the species, revealing five novel plasmids and genomic islands harboring CAzymes, including the celA gene. Genes encoding cell wall degrading enzymes are well-conserved across strains of all virulence phenotypes. Deletions within a secreted cellulase not previously reported have been described for future functional analysis.
Population genomic workflows frequently rely on fragmented command-line utilities, custom conversion scripts, and programming language-specific environments, complicating computational reproducibility and obscuring data provenance. As analytical workflows become increasingly automated and computationally intensive, dependence on disparate preprocessing tools can introduce friction between raw genotype files, quality-control decisions, statistical analyses, and downstream workflows. We developed SNPio, a Python-native framework that consolidates single nucleotide polymorphism data parsing, filtering, visualization, numerical genotype encoding, and population genomic summary-statistic calculation within a unified software architecture. VCF file parsing and filtering benchmarks were compared against vcfR and SNPfiltR. SNPio demonstrated faster execution times but used more memory than its R-based comparators, reflecting SNPio's retention of genotype arrays, metadata, and provenance-tracking attributes. Pairwise Weir and Cockerham's FST and Nei's genetic distance estimates aligned with HierFstat expectations based on Pearson correlations and aggregate error metrics. D-statistics conformed to theoretical expectations across eleven simulated datasets spanning a range of introgression signal strengths. SNPio provides a reproducible Python-native workflow for processing, filtering, encoding, visualizing, and analyzing SNP datasets. It integrates common early-stage population genomic operations into a transparent, scriptable framework, which ultimately promotes workflow provenance and reduces reliance on disjointed software tools, unsaved terminal commands, and custom scripts. SNPio is particularly suited for population genomic studies of non-model organisms in ecological, evolutionary, and conservation contexts, where reproducible preprocessing and interoperability with downstream analyses are becoming increasingly important.
The Finnish Ayrshire cattle belong to the Nordic Red breeds. The basis of selection in Nordic Red breeds shifted from traditional pedigree-based breeding values to genomic breeding values between 2011 and 2014. Joint genetic evaluation and admixture among the Nordic Red breeds have led to the formation of a composite Nordic Red population; consequently, contemporary Finnish Ayrshire represents an admixed population. We identified recent selection signatures in the Finnish Ayrshire genome using two complementary approaches: the Hudson estimator of Wright's fixation index (FST) and generation proxy selection mapping (GPSM). Hudson FST quantifies population-differentiation between groups, whereas GPSM detects selection signatures within a single population by regressing birth year on SNP genotypes. The aim of this study was to identify temporal allele frequency changes in SNPs consistent with selection during the genomic era and to evaluate their associations with milk production and fertility traits in Finnish Ayrshire. Genotypes were available for 64,160 cows across 46,914 SNPs, and phenotypic data on milk production and fertility traits were available for 49,417 genotyped individuals. Based on Hudson FST, 56 SNPs showed genetic differentiation between cows selected using pedigree-based and genomic information. In addition, 54 SNPs exhibited temporal allele frequency changes consistent with selection according to GPSM. Overall, 11 SNPs were identified by both methods. Of the 54 SNPs, thirteen were associated with the interval from first to last insemination in Finnish Ayrshire heifers. These results suggest that a substantial proportion of SNPs exhibiting temporal allele frequency changes during genomic selection are associated with heifer fertility.
Banana (Musa spp.) is a critical global fruit crop whose production is severely threatened by sheath rot disease. Klebsiella variicola has been identified as one of the main causative agents of this disease, while the pathogenicity of Herbaspirillum spp. strains isolated from sheath rot-infected banana tissues remains to be clearly defined. We isolated five Herbaspirillum strains from infected banana sheaths, namely Herbaspirillum huttiense Musa1 (hereafter abbreviated as Her1), Herbaspirillum huttiense Musa2 (hereafter Her2), Herbaspirillum sp. Musa3 (hereafter Her3), Herbaspirillum huttiense Musa4 (hereafter Her4), and Herbaspirillum huttiense Musa5 (hereafter Her5), and performed whole-genome sequencing on each isolate. Comparative genomics revealed genomes of similar size and stable GC content across strains. Functional annotation uncovered a core set of carbohydrate-active enzymes (CAZymes) and antibiotic resistance genes (ARGs), indicating conserved metabolic capabilities. Notably, two pectate lyase genes, potentially involved in plant cell wall degradation, were uniquely identified in strain Her3. We observed marked variation in the repertoire of putative virulence determinants across the isolates: strains Her1 and Her4 each encoded 17 predicted type III secretion system (T3SS) effector proteins, a notably higher number than that of the other strains. All strains also harbored a substantial number of predicted type VI secretion system (T6SS) effectors. This study provides the first genomic resource for Herbaspirillum species associated with banana sheath rot. Our comparative analysis highlights key genomic differences, particularly in secreted effector profiles, that likely underlie strain-specific pathogenic mechanisms. These results enhance our fundamental understanding of the phytopathogenicity of Herbaspirillum in plants.
Predicting how populations will respond to climate change is an urgent priority in evolutionary and conservation biology. Genetic and environmental influences have long been known to impact responses to shifting climates, giving rise to the development of genomic offset methods that attempt to predict the degree of maladaptation following environmental change. Although these methods have the potential to improve our understanding of populations' responses to climate change, questions remain about how well they predict fitness in a diversity of systems. Here, we pair genomic offsets with a large-scale common garden experiment to investigate how fitness of a marine mollusc, Crassostrea virginica, varies across environmental gradients and to validate offset predictions against empirical fitness. We find that at two common garden sites in the Chesapeake Bay, Northern oysters had the lowest survival and growth, whereas Southern oysters had similar survival and growth to local wild oysters and selection lines. We compared survival and length to offset predictions, finding higher method performance at the lower salinity, moderate disease site than at the higher salinity, higher disease site. The inclusion of disease in the offset model training additionally impacted offset method performance, though the effect varied between common garden sites, fitness proxies and across methods. Despite a nearly 50% difference in survival across sites, the magnitude of offset predictions to both common gardens was similar, highlighting issues with extrapolating offset predictions across species ranges. Performance of genomic offset methods in this system was not adequate to make real-world predictions to environmental stressors.
Elective genomic sequencing (EGS) returns monogenic disease findings in multiple genes, including potentially novel variants, and may also provide participants with carrier status, pharmacogenomic and other health-related information. The PeopleSeq Study assessed participants' motivations for and concerns about EGS and the associated clinical and psychosocial outcomes across diverse EGS providers. We administered a shared questionnaire to participants who chose to undergo EGS via 18 academic, clinical, or commercial EGS platforms. We enrolled 1575 participants, of whom 1147 (72.8%) completed a questionnaire after receiving their EGS results. A majority (60.3%) of the participants who completed a post-result questionnaire self-reported receiving results they assessed as important, including negative findings, and 75.9% reported a form of health-related utility. Among a subset (19.4%) who shared their EGS reports, 16.6% (37 of n = 223) received a monogenic finding and self-reported results deemed "important" were consistent with EGS reports. Most participants (74.1%) discussed their results with their family, but fewer discussed their results with a healthcare provider other than the site team (41.7%) or had one or more medical visits as a direct result of their EGS testing (23.1%). Participants expressed diverse motivations for EGS, with 91.4% expressing interest in their personal disease risk and 54% who expressed quasi-indication-based motivations related to family medical history. Individuals motivated by family history reported important results at a significantly higher rate. Early adopters of EGS are motivated by general interest in their health as well as quasi-indication-based considerations such as family history. A majority of participants learned results they considered medically important, but a much smaller segment engaged healthcare providers with their results.
Research on regional circulation and evolution of influenza A viruses before and after the COVID-19 pandemic is crucial for informing vaccine updates and antiviral drug development. This study generated 260 new genomic sequences of influenza A(H1N1)pdm09 viruses collected in Yunnan province, China, between 2018 and 2023. Comparative genomics analyses elucidated their evolutionary characteristics and dynamics. Epidemiological analysis identified key risk factors (sex, age, occupation) for influenza infection. Phylogenetic analyses revealed the sequence divergences between the vaccine strains and Yunnan circulating strains, especially in the 2020-2024 influenza seasons. The subclade reassortment events were extremely limited among these sequenced Yunnan strains, suggesting the reassortment may be not a major contributor for the circulation and evolution of influenza A(H1N1)pdm09 viruses in Yunnan during these influenza seasons. We detected the elevated evolutionary pressures acting on the specific gene segments, reflected in increased dN/dS ratios, particularly for envelope proteins. Furthermore, numerous amino acid substitutions (e.g., S185I/T) within HA antigenic epitopes and receptor binding sites were identified in most Yunnan strains, indicating potential roles of antigenic drift in modulating viral antigenicity and host adaptation. Notably, 17 amino acid substitutions in HA and NA (including HA: N156K) accumulated to higher frequencies during the 2022-2023 and 2023-2024 seasons. These changes likely represented the molecular signature of contemporary A(H1N1)pdm09 viruses in Yunnan. Collectively, this study explored the molecular evolutionary dynamics of A(H1N1)pdm09 viruses in Yunnan province during diverse influenza seasons, providing new regional data for studying molecular characterization and evolution of A(H1N1)pdm09 within the global surveillance framework.
Autoimmune diseases are chronic and heterogeneous disorders resulting from the breakdown of immune tolerance and subsequent tissue damage. Beyond genetic predisposition, viral infections are increasingly recognized as pivotal environmental contributors to disease onset. In this study, we performed comprehensive viral metagenomic profiling of blood samples from 205 patients with systemic lupus erythematosus (SLE), Sjögren's syndrome (SS), ankylosing spondylitis (AS), and undifferentiated connective tissue disease (UCTD). A total of approximately 103.98 million sequencing reads were analyzed, revealing 44 viral families, including 30 DNA and 14 RNA families. RNA viruses dominated the virome composition, accounting for 71% of total reads, with Picobirnaviridae being consistently prevalent and abundant across all disease groups. Alpha and beta diversity analyses revealed significant heterogeneity in viral community structures among different disease groups, with a marked diversity skew observed in the SS group. Disease-specific viral composition patterns were prominent, and the number of core viral species shared across the four groups was limited. Of particular note, Anelloviridae was significantly enriched in the AS and UCTD groups, suggesting its potential as a biomarker for immunosuppressive states. Furthermore, bacteriophages such as Microviridae exhibited differential abundance across groups, reflecting the potential role of virus-microbe-host immune interactions in disease pathogenesis. In conclusion, this study provides a comprehensive profile of the blood virome in four autoimmune diseases, highlighting the potential role of viral communities in immune regulation and offering new perspectives for the development of related biomarkers.
Accurate species delimitation is crucial for biodiversity research and conservation, yet it remains challenging in taxon undergoing recent radiation. Gentiana is a species-rich genus that has undergone recent radiation. In this study, we assessed the phylogenetic relationships within G. flexicaulis complex, which is endemic to the Qinghai-Tibet Plateau, and explored the underlying evolutionary causes. By combining a large set of single-copy orthologous genes, complete chloroplast genome sequences, and morphological data covering its distribution range, we clarified the taxonomic positions of G. flexicaulis, G. complexa and G. subuniflora. Both nuclear and chloroplast phylogenies revealed robust phylogenetic relationships within the complex. The results clarified the delimitation of G. flexicaulis and identified three sub-lineages within the species. The East and West sub-lineages are geographically isolated by the Qionglai Mountains and differ morphologically in calyx and stem leave length. The third sub-lineage represents a new subspecies proposed in this study, named G. flexicaulis subsp. brevicorolla, which is diagnostically distinct from the autonym in having a shorter corolla and calyx, as well as smaller leaves. G. flexicaulis subsp. brevicorolla is closed related to G. complexa in morphology, sharing short corolla and calyx traits, but it shows clear phylogenetic independence in both nuclear and chloroplast phylogenies. Evidence from orthologous genes and genomic SNPs indicated that incomplete lineage sorting and hybridization in the G. flexicaulis complex occur mainly within species rather than among species. In summary, our study refines the delimitation of G. flexicaulis and demonstrates the utility of morphological statistics at the population level across geographical variation in recently radiated taxon.
R-loops and D-loops are three-stranded nucleic acid structures that have emerged as central regulators of genome stability, gene expression, and DNA metabolism. R-loops form co-transcriptionally or post-transcriptionally when nascent RNA re-anneals with the template DNA strand, generating an RNA: DNA hybrid that displaces the non-template strand into a single-stranded state. These structures are enriched at CpG island promoters, transcription termination sites, and immunoglobulin class-switch regions, where they coordinate transcription regulation, chromatin remodeling, and DNA damage signaling. D-loops are formed when a single-stranded DNA segment pairs with one strand of a duplex and displaces the other, arising through context-dependent mechanisms that include RAD51- or DMC1-mediated strand invasion in homologous recombination, shelterin-assisted invasion at telomeres, and replication-coupled strand displacement at the mitochondrial DNA origin. They serve as indispensable intermediates in double-strand break repair, telomere maintenance, and mitochondrial DNA replication. Recent cryo-electron microscopy studies have resolved the stepwise RAD51-mediated strand exchange mechanism at near-atomic resolution, substantially advancing structural understanding of D-loop biogenesis. Despite their differences in molecular composition, both structures remodel Watson-Crick base pairing and, when dysregulated, are associated with replication fork stalling, transcription-replication conflicts, and aberrant recombination. This review systematically compares the structural features, formation mechanisms, regulatory networks, and biological functions of R-loops and D-loops, with emphasis on their convergent roles in safeguarding genome integrity. We further discuss rapidly evolving detection technologies and emerging therapeutic strategies targeting these structures in cancer and neurodegeneration, identifying key unresolved questions for future investigation.
Understanding the proteins implicated in the pathogenesis of ischemic stroke is important for elucidating disease mechanisms and informing prevention strategies. In this study, we aim to identify plasma proteins with a potentially causal effect on risk of ischemic stroke by integrating the largest available genetic data sources for plasma proteins, ischemic stroke and its risk factors. We use genome-wide association (GWA) statistics to identify cis protein quantitative trait loci for over 3,500 proteins across two proteomic platforms. Subsequently, we perform two-sample Mendelian randomization (MR) to assess the potentially causal relationships between plasma proteins and a) risk of ischemic stroke and its subtypes, and b) well-established cardiovascular risk factors. Downstream analyses include Bayesian colocalization, phenome-wide associations and interrogation of biological databases. We identify 21 proteins with evidence of potentially causal associations with risk of ischemic stroke or its subtypes at 5% false discovery rate, with 16 supported further by colocalization. Four proteins (CEP85, KNG1, MMUT, SPATA20) represent findings not previously implicated in ischemic stroke through MR, or through cognate genes in GWA studies. Integration of evidence from phenome-wide MR, animal models and tissue-specific gene expression highlights agonists of MMUT, CEP85 and GRK5, and inhibitors of F11 and KNG1 as the most promising for further consideration as targets for prevention of ischemic stroke. Our study provides the most comprehensive data integration to date supporting the identification and causal relevance of plasma proteins for ischemic stroke and implicating a number of potential therapeutic targets. Ischemic stroke occurs when the blood supply to a part of the brain is interrupted. It is a major cause of death and long-term disability worldwide, but we still do not fully understand the underlying biological processes. Proteins circulating in the blood may help explain why some people are at higher risk. They may also point to new treatments. In this study, we use large-scale genetic data from over a million people. We aim to identify specific proteins in the blood that may play a role in ischemic stroke. After examining more than 3500 proteins, we identified 11 proteins relevant to ischemic stroke. After considering additional existing knowledge, 4 of them appear to be the most promising as potential drug targets. In summary, our findings improve our understanding of stroke biology and highlight new opportunities for preventing ischemic stroke.
Rutaceae encompasses numerous medicinal plants with unclear genetic relationships and mitochondrial genome (mitogenome) characteristics, which hinders understanding of their evolutionary adaptation and medicinal trait development. Here, we aimed to investigate the conserved and divergent features of mitogenomes among closely related medicinal genera of Rutaceae, and to explore how mitogenomic variations correlate with their phylogenetic positions. Citrus aurantium, Toddalia asiatica, and Zanthoxylum nitidum (Roxb.) DC are medicinally valuable but understudied Rutaceae species belonging to different genera. We sequenced, assembled, and annotated their complete mitogenomes: C. aurantium (504,387 bp, 45.17% GC content) contained 62 genes (36 protein-coding genes (PCGs), 3 rRNAs, and 23 tRNAs), T. asiatica (566,784 bp, 45.37% GC content) harbored 58 genes (35 PCGs, 3 rRNAs, and 20 tRNAs), and Z. nitidum (539,013 bp, 45.48% GC content) possessed 61 genes (34 PCGs, 3 rRNAs, and 24 tRNAs). Comparative analyses with 12 published Rutaceae mitogenomes showed that the three species exhibited the highest number of RNA editing sites in PCGs among the examined species. Evaluation of selective pressure and gene clusters revealed fewer differences between C. aurantium and C. maxima cultivar Hirado Buntan than with the other two species, while C. sinensis displayed greater dissimilarities with other Citrus species (notably, C. maxima and C. maxima cultivar Hirado Buntan were conspecific and should have the least divergence). Phylogenetic trees constructed using 24 conserved mitochondrial PCGs clarified evolutionary relationships: C. maxima was closest to C. sinensis, followed by C. maxima cultivar Hirado Buntan and C. aurantium formed an unexpected separate clade, highlighting the need for more Rutaceae species' mitogenome in future phylogenetic analyses. This study expands the resource pool in Rutaceae mitogenomic resources, provides insights into features that may be associated with ecological adaptation, and offers valuable genomic data for future phylogenetic and comparative studies of Rutaceae medicinal plants.
Microbial iron cycling regulates nutrient availability and redox balance in global ecosystems, yet its pathways remain underexplored in ice-free Antarctic terrestrial ecosystems. This study reports the enrichment of a psychrotolerant microbial consortium from penguin-impacted soils on Beaufort Island, Antarctica, capable of reducing Fe(III) to Fe(II) at 4 °C via an anaerobic (likely fermentative) iron-reducing pathway. The consortium was dominated by Clostridium sensu stricto 13 and completely reduced 230 mg L-1 Fe(III) citrate within three months and drove the biogenic formation of magnetite (Fe3O4). Metagenomic binning yielded four high-quality Clostridium genomes harboring multiple hydrogenases and cold-shock proteins (csp), revealing genomic strategies for energy conservation and psychrotolerance. Hydrogen production was strongly suppressed in the presence of Fe(III) citrate, indicating an intimate coupling of fermentation-derived electron flow to Fe(III) reduction. Our findings reveal a previously unrecognized low-temperature iron reduction mechanism and highlight the ecological significance of anaerobic (likely fermentative) iron reducers in ornithogenic soils-microhabitats enriched in organic matter and metals by penguin guano. This work expands the known diversity of Fe(III)-reducing microorganisms, demonstrates their role in magnetite biomineralization under extreme conditions, and provides insights into microbial modulation of iron speciation in Antarctic ornithogenic soils.
K-mer-based analysis of genomic data is ubiquitous, but the presence of repetitive k-mers continues to pose problems for the accuracy of many methods. For example, the Mash tool (Ondov et al 2016) can accurately estimate the substitution rate between two low-repetitive sequences from their k-mer sketches; however, it is inaccurate on repetitive sequences such as the centromere of a human chromosome. Follow-up work by Blanca et al. (2021) has attempted to model how mutations affect k-mer sets based on strong assumptions that the sequence is non-repetitive and that mutations do not create spurious k-mer matches. However, the theoretical foundations for extending an estimator like Mash to work in the presence of repeat sequences have been lacking. In this work, we relax the non-repetitive assumption and propose a novel estimator for the mutation rate. We derive theoretical bounds on our estimator's bias. Our experiments show that it remains accurate for repetitive genomic sequences, such as the alpha satellite higher order repeats in centromeres. We demonstrate our estimator's robustness across diverse datasets and various ranges of the substitution rate and k-mer size. Finally, we show how sketching can be used to avoid dealing with large k-mer sets while retaining accuracy. Our software is available at https://github.com/medvedevgroup/Repeat-Aware_Substitution_Rate_Estimator.
Inborn errors of metabolism (IEMs) are among the most clinically actionable groups of rare genetic diseases, yet therapeutic knowledge remains distributed across multiple databases, complicating consistent identification of treatable IEMs. In this study, we performed a cross-database analysis to characterize the current therapeutic landscape of IEM genes using GeneReviews as a clinical reference. Gene symbols from GeneReviews, Metabolic Treatabolome, Treatable ID, and the Drug Database for Inborn Errors of Metabolism (DDIEM) were harmonized to HGNC nomenclature and mapped to the International Classification of Inherited Metabolic Disorders (ICIMD). IEM genes were categorized as associated with targeted or supportive treatment based on a GeneReviews-derived benchmark. Among 656 ICIMD-defined IEM genes in GeneReviews, 155 (23.6%) were associated with targeted therapies and 501 (76.4%) with supportive treatment. Cross-database comparison demonstrated limited concordance, with only 36 genes associated with targeted therapy shared across Metabolic Treatabolome, Treatable ID, and DDIEM, reflecting heterogeneous representation of treatable IEMs across databases. Targeted therapies were more common in intermediary metabolic pathways, including fatty acid, tetrapyrrole, and vitamin/cofactor metabolism, whereas mitochondrial and complex cellular disorders were predominantly associated with supportive management. Moreover, 53% of the ICIMD-defined IEM genes remain unrepresented in structured databases. Our study defines a set of IEM genes for which targeted therapies are available and demonstrates that treatment information is largely fragmented across databases. This set of IEM genes may facilitate the interpretation of genomic findings and aid the prioritization of IEMs for genomic screening and clinical decision-making.
This Data Note reports the complete genome sequence and associated functional-genomic data of Methylomonas sp. strain 2F7, a methane-oxidizing bacterium isolated from rice paddy soil in South Korea. These data were generated to support comparative genomic analyses of methanotrophic bacteria and to document the genome-based taxonomic position and C1 metabolism gene repertoire of strain 2F7. The dataset includes Oxford Nanopore Technologies MinION raw reads, a two-replicon genome assembly, annotation summaries, functional classifications, a phylogenetic analysis, average nucleotide identity (ANI) and digital DNA-DNA hybridization (dDDH) comparisons, and a curated C1 metabolism gene table. The genome consists of a 4,979,984-bp chromosome and a 23,603-bp plasmid, with an overall G+C content of 51.4%. Annotation identified 4,520 genes, including 4,429 coding sequences (CDSs), nine rRNA genes, 46 tRNA genes, and four ncRNAs. The 16S rRNA gene sequence showed the highest similarity to Methylomonas fluvii EbBᵀ, whereas ANI/dDDH analysis identified Methylomonas rosea WSC-7ᵀ as the closest related type strain by genome-distance criteria, with ANI/dDDH values of 89.6% and 38.0%, respectively. Additional ANI/dDDH comparisons with M. fluvii EbBᵀ and M. defluvii OY6ᵀ yielded values of 89.3%/37.4% and 89.4%/36.9%, respectively. These genome-based values fall below accepted species-boundary thresholds and support placement of strain 2F7 as a putative novel species within the genus Methylomonas. The curated gene table documents pMMO-like and sMMO gene clusters and genes involved in methanol oxidation, formaldehyde oxidation, and ribulose monophosphate (RuMP)-linked C1 assimilation.
Copy number variation (CNV) is a major source of genomic diversity that shapes gene family evolution and may contribute to ecological differentiation, yet its genome-wide ecological relevance remains poorly understood. Here, we analyzed CNV across four Brassicaceae species (Arabidopsis thaliana, A. lyrata, A. halleri, and Arabis alpina) to identify gene families undergoing rapid expansion or contraction. Using a birth-death model, we identified 231 rapidly evolving gene families spanning diverse functional categories. We then characterized CNV within populations of A. thaliana and A. lyrata using population-scale long-read assemblies. CNV exhibited strong heterogeneity across gene families and species, with contrasting evolutionary outcomes: A. thaliana showed greater retention of duplicated copies, whereas A. lyrata exhibited higher pseudogenization and turnover. CNV profiles were strongly structured geographically, reflecting known demographic and phylogeographic patterns in both species. Environmental association analyses revealed species-specific architectures: CNV in A. thaliana showed a diffuse, polygenic association with climate, whereas in A. lyrata associations were more strongly coupled to population structure. Despite the functional diversity of rapidly evolving families, environmentally associated CNVs were significantly enriched in defense- and stress-related functions. These results demonstrate that CNV-environment relationships emerge at the level of gene family networks and are shaped by genomic architecture and lineage history, highlighting CNV as a context-dependent driver of genome evolution and ecological differentiation.
Sex hormones and HIV infection both influence cardiovascular health. However, the association between sex hormones and subclinical atherosclerosis is not fully understood, especially in the context of HIV. Among 321 men (65% with HIV) from the MACS/WIHS Combined Cohort Study, we measured 14 serum sex hormones and sex hormone-binding globulin (SHBG), assessed carotid artery plaque (IMT > 1.5 mm) using high-resolution B-mode ultrasound, and performed metagenomic sequencing on stool samples. In 312 men, we measured 986 plasma metabolites via liquid chromatography-tandem mass spectrometry and 2883 plasma proteins using the Olink Explore 3072 platform. In stratified analyses of men with (MWH) and without HIV (MWOH) and adjusting for covariates and multiple testing, we (1) examined associations of sex hormones with plaque; (2) characterized multi-omics profiles related to sex hormones; and (3) generated sex hormone-related omics scores via linear combination of related species, metabolites, and proteins, respectively, to explore whether these sex hormone-related multi-omics profiles were associated with plaque. Median age of participants was 62 years (interquartile range: 58-68), and 31.5% had carotid artery plaque. Sex hormones were differentially associated with plaque in MWH and MWOH. In MWH, an inverse association was observed between SHBG and plaque (OR = 0.60 per 1-SD increase, 95% CI: 0.41, 0.90). Furthermore, higher SHBG levels were associated with overall gut microbial composition, lower abundance of species from genera Prevotella, Fibrobacter and Coprococcus, higher levels of certain metabolites (primarily lipid and carnitine metabolites) and proteins enriched in the cell-cell adhesion pathway. Some SHBG-related species (e.g., Mediterranea massiliensis), metabolites (e.g., phosphatidylcholine-based lipids) and proteins (e.g., enriched in immune response pathway) were also associated with plaque in MWH. All three SHBG-related omics scores were inter-correlated and inversely associated with plaque in MWH. In MWOH, estrone-sulfate was positively associated with plaque (OR = 3.80, 95% CI: 1.41, 10.22) but not with any species, metabolites or proteins. Higher SHBG, and related microbial species, circulating metabolites, and proteins, were inversely associated with carotid artery plaque. These findings suggested that SHBG may play a protective role in subclinical atherosclerosis in MWH.
Chromatin accessibility is important for genome architecture and gene expression in plants. In this study, the ATAC-seq method was utilized to identify accessible chromatin regions (ACRs) across the chickpea (Cicer arietinum) genome, an important legume crop cultivated worldwide. A total of 11,555 ACRs were identified in the chickpea genome, enriched at gene transcription start sites (TSS) and positively correlated with gene expression. Furthermore, as expected, a number of known transcription factor (TF) binding motifs were enriched in these ACRs. Integration of histone modification data, ACRs were found to be closely associated with H3K27ac and H3K4me3 modifications to regulate gene expression. In addition, the gain and loss of ACRs has been demonstrated to have significant effects on the expression of homologous genes. Collectively, this study provides a comprehensive understanding of the genomic function of ACRs in chickpea.
Sesame (Sesamum indicum L., 2n = 26) is one of the oldest oilseed crops and is often called the 'queen of oilseeds' due to its high content of unsaturated fatty acids and natural antioxidants. Despite its long history, the origin and global spread of cultivated sesame remain unresolved. We assembled a telomere-to-telomere (T2T), high-quality reference genome of sesame (cv. Yuzhi11) to investigate sequence differences between genomes and its origin and the local adaptation evolution of flowering time (DF). We generated a 305 Mb T2T sesame reference genome (cv. Yuzhi11) with > 99.99% base-level accuracy, identifying 31 063 protein-coding genes. Repetitive elements accounted for 52.03% of the genome. Population genomic analysis of 927 accessions from 14 regions identified four major groups. Integrative analyses of linkage disequilibrium decay (LD), nucleotide diversity (π), and fixation index (FST) support East Africa as the center of origin, with subsequent migration through the Middle East, to South Asia, South-East Asia, East Asia and ultimately to other parts of the world. Genome-wide association studies (GWAS) and selection scans identified 30 genes associated with flowering time. SiUBP16 is a candidate associated with 7.6% of DF variation. Early-flowering accessions carried up to 225 favourable alleles. A flowering time prediction model for high-latitude regions achieved 96% accuracy. We present a high-quality T2T reference genome for cultivated sesame, shedding light on its origin, evolutionary history, and regional flowering time adaptation. This genome insights valuable tools for breeding programs aimed at improving yield and environmental adaptation in sesame and related crops.