Zheng Su, Mingyan Fang, Andrei Smolnikov, Marcel E. Dinger, Emily C. Oates, Fatemeh Vafaee
{"title":"GeneRAIN: multifaceted representation of genes via deep learning of gene expression networks","authors":"Zheng Su, Mingyan Fang, Andrei Smolnikov, Marcel E. Dinger, Emily C. Oates, Fatemeh Vafaee","doi":"10.1186/s13059-025-03749-6","DOIUrl":null,"url":null,"abstract":"We develop GeneRAIN, a suite of Transformer-based models that learn gene expression relationships from 410 K human bulk RNA-seq samples. Featuring a novel Binning-By-Gene normalization technique, our models capture diverse biological information beyond expression. We introduce GeneRAIN-vec, a multifaceted vectorized gene representation that outperforms those from existing models. We demonstrate knowledge transfer from protein-coding genes to Make 62.5 million biological attribute predictions for 13,030 long noncoding RNAs. This work advances Transformer and self-supervised deep learning applications to expression data, enhancing biological exploration.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"18 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-025-03749-6","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
We develop GeneRAIN, a suite of Transformer-based models that learn gene expression relationships from 410 K human bulk RNA-seq samples. Featuring a novel Binning-By-Gene normalization technique, our models capture diverse biological information beyond expression. We introduce GeneRAIN-vec, a multifaceted vectorized gene representation that outperforms those from existing models. We demonstrate knowledge transfer from protein-coding genes to Make 62.5 million biological attribute predictions for 13,030 long noncoding RNAs. This work advances Transformer and self-supervised deep learning applications to expression data, enhancing biological exploration.
Genome BiologyBiochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍:
Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens.
With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category.
Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.