{"title":"A foundation language model to decipher diverse regulation of RNAs","authors":"Hanwen Zhou, Yue Hu, Yulong Zheng, Jiefu Li, Jielong Peng, Jiang Hu, Yun Yang, Wei Chen, Guoqing Zhang, Zefeng Wang","doi":"10.1186/s13059-025-03752-x","DOIUrl":"https://doi.org/10.1186/s13059-025-03752-x","url":null,"abstract":"RNA metabolism is tightly regulated by cis-elements and trans-acting factors. Most information guiding such regulation is encoded in RNA sequences. Deciphering the regulatory rules is critical for RNA biology and therapeutics; however, the prediction of diverse regulation from RNA sequences remains a formidable challenge. Considering the similarities in semantic and syntactic features between RNAs and human language, we present LAMAR, a transformer-based foundation LAnguage Model for RNA Regulation, to decipher general rules underlying RNA processing. The model is pretrained on approximately 15 million sequences from both genome and transcriptome of 225 mammals and 1569 viruses, and further fine-tuned with labeled datasets for various tasks. The resulting fine-tuned models outperform the state-of-the-art methods in predicting mRNA translation efficiency and mRNA half-life, while achieving comparable accuracy to specifically designed methods in predicting splice sites of pre-mRNAs and internal ribosome entry sites (IRESs). The fine-tuned LAMAR is further applied to predict mutational effects of cis-regulatory elements and reveals known and novel regulatory elements that modulate RNA degradation. The fine-tuned LAMAR is also applied in an in silico screen of novel IRESs, resulting in the identifications of highly active IRESs that promote circRNA translation. Our results indicate that a single foundation language model is applicable in the comprehensive analysis of different aspects of RNA regulation and predictive identification of novel regulatory elements, providing new insight into the design and optimization of RNA drugs.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"38 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145133500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-09-24DOI: 10.1186/s13059-025-03779-0
Haohuai He, Zhenchao Tang, Guanxing Chen, Fan Xu, Yao Hu, Yinglan Feng, Jibin Wu, Yu-An Huang, Zhi-An Huang, Kay Chen Tan
{"title":"scKAN: interpretable single-cell analysis for cell-type-specific gene discovery and drug repurposing via Kolmogorov-Arnold networks","authors":"Haohuai He, Zhenchao Tang, Guanxing Chen, Fan Xu, Yao Hu, Yinglan Feng, Jibin Wu, Yu-An Huang, Zhi-An Huang, Kay Chen Tan","doi":"10.1186/s13059-025-03779-0","DOIUrl":"https://doi.org/10.1186/s13059-025-03779-0","url":null,"abstract":"Analysis of single-cell RNA sequencing (scRNA-seq) data has revolutionized our understanding of cellular heterogeneity, yet current approaches face challenges in efficiency, interpretability, and connecting molecular insights to therapeutic applications. Despite advances in deep learning methods, identifying cell-type-specific functional gene sets remains difficult. In this study, we present scKAN, an interpretable framework for scRNA-seq analysis with two primary goals: accurate cell-type annotation and the discovery of cell-type-specific marker genes and gene sets. The key innovation is using the learnable activation curves of the Kolmogorov-Arnold network to model gene-to-cell relationships. This approach provides a more direct way to visualize and interpret these specific interactions compared to the aggregated weighting schemes typical of attention mechanisms. This architecture achieves superior performance in cell-type annotation, with a 6.63% improvement in macro F1 score over state-of-the-art methods. Additionally, it enables the systematic identification of functionally coherent cell-type-specific gene sets. We demonstrate the framework’s translational potential through a case study on pancreatic ductal adenocarcinoma, where gene signatures identified by scKAN led to a potential drug repurposing candidate, whose binding stability was supported by molecular dynamics simulations. Our work establishes scKAN as an efficient and interpretable framework that effectively bridges single-cell analysis with drug discovery. By combining lightweight architecture with the ability to uncover nuanced biological patterns, our approach offers an interpretable method for translating large-scale single-cell data into actionable therapeutic strategies. This approach provides a robust foundation for accelerating the identification of cell-type-specific targets in complex diseases.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"40 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145127787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-09-23DOI: 10.1186/s13059-025-03756-7
Ruiying Zhu, Chuanhong Ren, Zehua Bao
{"title":"Fueling chromosomal gene diversification and artificial evolution with CRISPR","authors":"Ruiying Zhu, Chuanhong Ren, Zehua Bao","doi":"10.1186/s13059-025-03756-7","DOIUrl":"https://doi.org/10.1186/s13059-025-03756-7","url":null,"abstract":"Gene diversification is an effective approach to massively dissecting variant functions and evolving sequences when paired with an appropriate assay. In vitro mutagenesis and ectopic gene expression, however, fail to simulate the endogenous regulatory environment of the variants. The development of clustered, regularly interspaced short palindromic repeats (CRISPR) systems has greatly boosted the efficiency of targeted gene diversification in various species. Here, we review recent CRISPR-assisted methods for chromosomal gene diversification and artificial evolution, focusing on the advantages and limitations of each approach, and propose possible strategies to overcome current limitations and directions in future technology development.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"39 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145116505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-09-23DOI: 10.1186/s13059-025-03656-w
Ellyn Rousselot, Zofia Nehr, Jean-Marc Aury, France Denoeud, J. Mark Cock, Leïla Tirichine, Céline Duc
{"title":"Identification of novel H2A histone variants across diverse clades of algae","authors":"Ellyn Rousselot, Zofia Nehr, Jean-Marc Aury, France Denoeud, J. Mark Cock, Leïla Tirichine, Céline Duc","doi":"10.1186/s13059-025-03656-w","DOIUrl":"https://doi.org/10.1186/s13059-025-03656-w","url":null,"abstract":"Histones are among the most conserved proteins in eukaryotes. They not only ensure DNA compaction in the nucleus but also participate in epigenetic regulation of gene expression. These key epigenetic players are divided into replication-coupled histones, expressed during the S-phase, and replication-independent variants, expressed throughout the cell cycle. Compared with other core histones, H2A proteins exhibit a high level of variability but the characterization of algal H2A variants remains very limited. In this study, we exploit genome and transcriptome data from 22 species to identify H2A variants in brown seaweeds. Combined analyses of phylogenetic data, synteny and protein motifs enable us to reveal the presence of new H2A variants as well as their evolutionary history. We identify three new H2A variants: H2A.N, H2A.O and H2A.E. In brown seaweeds, the H2A.E and H2A.O variants arose from the same monophyletic clade while the H2A.N variant emerged independently. Moreover, the H2A.E variant seems to have a shared ancestry with RC H2A while the H2A.O variant has an H2A.X-characteristic signature without being orthologous to this variant. Based on mass spectrometry, we identify distinct epigenetic marks on these H2A variants. Finally, the H2A.Z, H2A.N and H2A.O from brown seaweeds are ubiquitously expressed while expression of H2A.E has tissue-specific patterns, especially in reproductive tissues. We thus hypothesize that H2A.O and H2A.X might have convergent functions while H2A.E might fulfil some functions of replication-coupled H2As and/or compensate for the absence of repressive histone marks along with H2A.N.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"40 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145116931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-09-23DOI: 10.1186/s13059-025-03724-1
Ricardo De Paoli-Iseppi, Shweta S. Joshi, Josie Gleeson, Yair D. J. Prawer, Yupei You, Ria Agarwal, Anran Li, Anthea Hull, Eloise M. Whitehead, Yoonji Seo, Rhea Kujawa, Raphael Chang, Mriga Dutt, Catriona McLean, Benjamin L. Parker, Michael B. Clark
{"title":"Long-read sequencing reveals the RNA isoform repertoire of neuropsychiatric risk genes in human brain","authors":"Ricardo De Paoli-Iseppi, Shweta S. Joshi, Josie Gleeson, Yair D. J. Prawer, Yupei You, Ria Agarwal, Anran Li, Anthea Hull, Eloise M. Whitehead, Yoonji Seo, Rhea Kujawa, Raphael Chang, Mriga Dutt, Catriona McLean, Benjamin L. Parker, Michael B. Clark","doi":"10.1186/s13059-025-03724-1","DOIUrl":"https://doi.org/10.1186/s13059-025-03724-1","url":null,"abstract":"Neuropsychiatric disorders are highly complex conditions and the risk of developing a disorder has been tied to hundreds of genomic variants that alter the expression and/or RNA isoforms made by risk genes. However, how these genes contribute to disease risk and onset through altered expression and RNA splicing is not well understood. Combining our new bioinformatic pipeline IsoLamp with nanopore long-read amplicon sequencing, we deeply profile the RNA isoform repertoire of 31 high-confidence neuropsychiatric disorder risk genes in Human brain. We show most risk genes are more complex than previously reported, identifying 363 novel isoforms and 28 novel exons, including isoforms which alter protein domains, and genes such as ATG13 and GATAD2A where most expression was from previously undiscovered isoforms. The greatest isoform diversity is detected in the schizophrenia risk gene ITIH4. Mass spectrometry of brain protein isolates confirms translation of a novel exon skipping event in ITIH4, suggesting a new regulatory mechanism for this gene in the brain. Our results emphasize the widespread presence of previously undetected RNA and protein isoforms in the human brain and provide an effective approach to address this knowledge gap. Uncovering the isoform repertoire of candidate neuropsychiatric risk genes will underpin future analyses of the functional impact these isoforms have on neuropsychiatric disorders, enabling the translation of genomic findings into a pathophysiological understanding of disease.\u0000","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"190 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145116218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-09-22DOI: 10.1186/s13059-025-03769-2
Haris Zafeiropoulos, Ermis Ioannis Michail Delopoulos, Andi Erega, Aline Schneider, Annelies Geirnaert, John Morris, Karoline Faust
{"title":"microbetag: simplifying microbial network interpretation through annotation, enrichment tests, and metabolic complementarity analysis","authors":"Haris Zafeiropoulos, Ermis Ioannis Michail Delopoulos, Andi Erega, Aline Schneider, Annelies Geirnaert, John Morris, Karoline Faust","doi":"10.1186/s13059-025-03769-2","DOIUrl":"https://doi.org/10.1186/s13059-025-03769-2","url":null,"abstract":"Microbial co-occurrence network inference is often hindered by low accuracy and tool dependency. We introduce microbetag, a comprehensive software ecosystem designed to annotate microbial networks. Nodes, representing taxa, are enriched with phenotypic traits, while edges are enhanced with metabolic complementarities, highlighting potential cross-feeding relationships. microbetag’s online version relies on microbetagDB, a database of 34,608 annotated representative genomes. microbetag can be applied to custom (metagenome-assembled) genomes via its stand-alone version. MGG, a Cytoscape app designed to support microbetag, offers a streamlined, user-friendly interface for network retrieval and visualization. microbetag effectively identified known metabolic interactions and serves as a robust hypothesis-generating tool.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"7 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-09-22DOI: 10.1186/s13059-025-03798-x
Bruce Budowle, Kristen Mittelman, David Mittelman
{"title":"Genomics will forever reshape forensic science and criminal justice","authors":"Bruce Budowle, Kristen Mittelman, David Mittelman","doi":"10.1186/s13059-025-03798-x","DOIUrl":"https://doi.org/10.1186/s13059-025-03798-x","url":null,"abstract":"Dense single nucleotide polymorphism testing has revolutionized forensic science, helping solve decadesold, current and future cases by overcoming the limitations of traditional short tandem repeat profiling. By embracing innovations from fields such as ancient DNA analysis, forensics can deliver long-awaited answers and justice to victims and their families.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"65 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-09-22DOI: 10.1186/s13059-025-03780-7
Pengxiao Li, Lin Li, Jingminjie Nan, Jiahuan Chen, Jielin Sun, Yanan Cao
{"title":"KEGNI: knowledge graph enhanced framework for gene regulatory network inference","authors":"Pengxiao Li, Lin Li, Jingminjie Nan, Jiahuan Chen, Jielin Sun, Yanan Cao","doi":"10.1186/s13059-025-03780-7","DOIUrl":"https://doi.org/10.1186/s13059-025-03780-7","url":null,"abstract":"Inference of cell type-specific gene regulatory networks (GRNs) is a fundamental step in investigating complex regulatory mechanisms. Here, we present KEGNI (Knowledge graph-Enhanced Gene regulatory Network Inference), a knowledge-guided framework that employs a graph autoencoder to capture gene regulatory relationships and incorporates a knowledge graph to infer GRNs based on scRNA-seq data. KEGNI shows superior performance compared to multiple methods using scRNA-seq data or paired scRNA-seq and scATAC-seq data. KEGNI can identify driver genes and elucidate the regulatory mechanisms underlying distinct cellular contexts. The modular design of KEGNI supports the integration of various knowledge graphs for context-specific tasks.\u0000","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"85 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deciphering the sequence basis and application of transcriptional initiation regulation in plant genomes through deep learning","authors":"Pengfei Gao, Lijie Lian, Wanjie Feng, Yuxue Ma, Jieni Lin, Liya Qin, Shanmeng Hao, Haonan Zhao, Xuantong Liu, Jing Yuan, Zongcheng Lin, Xia Li, Yuefeng Guan, Xutong Wang","doi":"10.1186/s13059-025-03782-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03782-5","url":null,"abstract":"Transcription initiation is a key checkpoint in plant gene regulation, yet the DNA features that determine where and the frequency of the genes start transcription remain unclear. We develop GenoRetriever, an interpretable deep learning model trained on base pair resolution STRIPE-seq data from multiple crop genomes, to systematically reveal and quantify the sequence code that governs transcription start sites (TSSs). Using TSS profiles from 16 soybean tissues and six additional crops, GenoRetriever identifies 27 core promoter motifs, including canonical TATA box and initiator elements, that together dictate TSS choice and activity. Model interpretation shows how each motif modulates both initiation frequency and precise start site position; these effects are confirmed by in silico motif edits, saturation mutagenesis, and targeted promoter assays. A new telomere-to-telomere assembly of wild soybean, Glycine soja, reveals that 31.85% of natural promoter variants shift dominant motifs relative to cultivated soybean, explaining domestication-driven changes in transcriptional regulation. Cross-species comparisons further indicate that, although many motif functions are conserved, monocots and dicots display distinct motif frequencies and positional preferences. GenoRetriever provides an interpretable, cross species framework for decoding transcription initiation in plants. By linking specific sequence motifs to quantitative transcriptional outcomes and validating these links experimentally, our study advances fundamental knowledge of promoter architecture and supplies a practical platform for rational engineering of gene expression in crop improvement and functional genomics.\u0000","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"32 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-09-22DOI: 10.1186/s13059-025-03763-8
Jonas Scheid, Steffen Lemke, Naomi Hoenisch-Gravel, Anna Dengler, Timo Sachsenberg, Arthur Declerq, Ralf Gabriels, Jens Bauer, Marcel Wacker, Leon Bichmann, Lennart Martens, Marissa L. Dubbelaar, Sven Nahnsen, Juliane S. Walz
{"title":"MHCquant2 refines immunopeptidomics tumor antigen discovery","authors":"Jonas Scheid, Steffen Lemke, Naomi Hoenisch-Gravel, Anna Dengler, Timo Sachsenberg, Arthur Declerq, Ralf Gabriels, Jens Bauer, Marcel Wacker, Leon Bichmann, Lennart Martens, Marissa L. Dubbelaar, Sven Nahnsen, Juliane S. Walz","doi":"10.1186/s13059-025-03763-8","DOIUrl":"https://doi.org/10.1186/s13059-025-03763-8","url":null,"abstract":"Confident identification of human leukocyte antigen (HLA)-presented peptides is crucial for advancing cancer immunotherapy. We present MHCquant2, a scalable and modular Nextflow pipeline integrated into nf-core as a reproducible, portable, and standardized workflow for immunopeptidomics. This integration allows a community-driven, robust solution for high-throughput analyses across operating systems and cloud infrastructures. MHCquant2 integrates open-source tools including OpenMS, DeepLC, and MS2PIP, improving peptide identifications by up to 27% across diverse MS platforms, particularly enriching low-abundant peptides. MHCquant2 demonstrates state-of-the-art performance on our novel benignMHCquant2 dataset (n = 92) and expands the benign human immunopeptidome by over 160,000 unique naturally presented HLA peptides.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"21 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}