搜索 — ResearchTracker

Natural Language Processing (NLP) has transformed various fields beyond linguistics by applying techniques originally developed for human language to the analysis of biological sequences. This review explores the application of NLP methods to biological sequence data, focusing on genomics, transcriptomics, and proteomics. We examine how various NLP methods, from classic approaches like word2vec to advanced models employing transformers and hyena operators, are being adapted to analyze DNA, RNA, protein sequences, and entire genomes. The review also examines tokenization strategies and model architectures, evaluating their strengths, limitations, and suitability for different biological tasks. We further cover recent advances in NLP applications for biological data, such as structure prediction, gene expression, and evolutionary analysis, highlighting the potential of these methods for extracting meaningful insights from large-scale genomic data. As language models continue to advance, their integration into bioinformatics holds immense promise for advancing our understanding of biological processes in all domains of life.

gggenomes: effective and versatile visualizations for comparative genomics

arXiv2024-11-05作者：Thomas Hackl, Markus Ankenbrand, Bart van Adrichem

The effective visualization of genomic data is crucial for exploring and interpreting complex relationships within and across genes and genomes. Despite advances in developing dedicated bioinformatics software, common visualization tools often fail to efficiently integrate the diverse datasets produced in comparative genomics, lack intuitive interfaces to construct complex plots and are missing functionalities to inspect the underlying data iteratively and at scale. Here, we introduce gggenomes, a versatile R package designed to overcome these challenges by extending the widely used ggplot2 framework for comparative genomics. gggenomes is available from CRAN and GitHub, accompanied by detailed and user-friendly documentation (https://thackl.github.io/gggenomes).

搜索结果：Comparative biochemistry and physiology. Part D, Genomics & proteomics

Leveraging Natural Language Processing to Unravel the Mystery of Life: A Review of NLP Approaches in Genomics, Transcriptomics, and Proteomics

gggenomes: effective and versatile visualizations for comparative genomics

Use of Interactive Simulations in Fundamentals of Biochemistry, a LibreText Online Educational Resource, to Promote Understanding of Dynamic Reactions

Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence

GenomeFingerprinter and universal genome fingerprint analysis for systematic comparative genomics

Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution

ProteinPNet: Prototypical Part Networks for Concept Learning in Spatial Proteomics

GREGoR: Accelerating Genomics for Rare Diseases

Comprehensive Overview of Bottom-up Proteomics using Mass Spectrometry

Molecular responses of mouse macrophages to copper and copper oxide nanoparticles inferred from proteomic analyses

Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research

Two-dimensional gel electrophoresis in proteomics: past, present and future

Paleoproteomics explained to youngsters: how did the wedding of two-dimensional electrophoresis and protein sequencing spark proteomics on: Let there be light

Comparative evaluation of future collider options

Digital N-of-1 Trials and their Application in Experimental Physiology

Leveraging State Space Models in Long Range Genomics

Two-dimensional gel electrophoresis in proteomics: A tutorial

The complex hybrid origins of the root knot nematodes revealed through comparative genomics

A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

A Misclassification Network-Based Method for Comparative Genomic Analysis