搜索结果：Genomic

共找到 20 条结果

高级筛选 ▾

Genomic Next-Token Predictors are In-Context Learners

arXiv

In-context learning (ICL) -- the capacity of a model to infer and apply abstract patterns from examples provided within its input -- has been extensively studied in large language models trained for next-token prediction on human text. In fact, prior work often attributes this emergent behavior to distinctive statistical properties in human language. This raises a fundamental question: can ICL arise organically in other sequence domains purely through large-scale predictive training? To explore this, we turn to genomic sequences, an alternative symbolic domain rich in statistical structure. Specifically, we study the Evo2 genomic model, trained predominantly on next-nucleotide (A/T/C/G) prediction, at a scale comparable to mid-sized LLMs. We develop a controlled experimental framework comprising symbolic reasoning tasks instantiated in both linguistic and genomic forms, enabling direct comparison of ICL across genomic and linguistic models. Our results show that genomic models, like their linguistic counterparts, exhibit log-linear gains in pattern induction as the number of in-context demonstrations increases. To the best of our knowledge, this is the first evidence of organically

Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models

arXiv2025-09-13作者：Weimin Wu, Xuefeng Song, Yibo Wen

We introduce Genome-Factory, the first integrated Python library for tuning, deploying, and interpreting genomic foundation models. Our core contribution is to simplify and unify the workflow for genomic model development: data collection, model tuning, inference, benchmarking, and interpretability. For data collection, Genome-Factory offers an automated pipeline to download genomic sequences and preprocess them. For model tuning, Genome-Factory supports both full and parameter-efficient fine-tuning across diverse genomic models. For inference, Genome-Factory enables both embedding extraction and DNA sequence generation. For benchmarking, we include two existing benchmarks and provide a flexible interface to incorporate additional benchmarks. For interpretability, Genome-Factory introduces an open-source biological interpreter based on a sparse auto-encoder. We validate the utility of Genome-Factory across three dimensions: (i) Compatibility with diverse models and fine-tuning methods; (ii) Benchmarking downstream performance using two open-source benchmarks; (iii) Biological interpretation of learned representations with DNABERT-2. These results highlight its practical value for r

搜索结果：Genomic

Genomic Next-Token Predictors are In-Context Learners

Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models

Genome-on-Diet: Taming Large-Scale Genomic Analyses via Sparsified Genomics

Genomic data processing with GenomeFlow

Genomic reproducibility in the bioinformatics era

Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer

Multi-modal Imaging Genomics Transformer: Attentive Integration of Imaging with Genomic Biomarkers for Schizophrenia Classification

The Impact of Genomic Variation on Function (IGVF) Consortium

A Multi-Evidence Framework Rescues Low-Power Prognostic Signals and Rejects Statistical Artifacts in Cancer Genomics

The genomic architecture and evolutionary fates of supergenes

Robust Fingerprinting of Genomic Databases

Genomics as a Service: a Joint Computing and Networking Perspective

Blockchain for Genomics: A Systematic Literature Review

Deep Learning for Genomics: A Concise Overview

An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding

The Genomic Code: The genome instantiates a generative model of the organism

Diversifying the Genomic Data Science Research Community

First large-scale genomic prediction in the honey bee

An investigation into inter- and intragenomic variations of graphic genomic signatures

Tasks, Techniques, and Tools for Genomic Data Visualization