Briefings in bioinformatics最新文献

筛选
英文 中文
MUTATE: a human genetic atlas of multiorgan artificial intelligence endophenotypes using genome-wide association summary statistics. MUTATE:使用全基因组关联汇总统计的人类多器官人工智能内表型遗传图谱。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf125
Aleix Boquet-Pujadas, Jian Zeng, Ye Ella Tian, Zhijian Yang, Li Shen, Andrew Zalesky, Christos Davatzikos, Junhao Wen
{"title":"MUTATE: a human genetic atlas of multiorgan artificial intelligence endophenotypes using genome-wide association summary statistics.","authors":"Aleix Boquet-Pujadas, Jian Zeng, Ye Ella Tian, Zhijian Yang, Li Shen, Andrew Zalesky, Christos Davatzikos, Junhao Wen","doi":"10.1093/bib/bbaf125","DOIUrl":"10.1093/bib/bbaf125","url":null,"abstract":"<p><p>Artificial intelligence (AI) has been increasingly integrated into imaging genetics to provide intermediate phenotypes (i.e. endophenotypes) that bridge the genetics and clinical manifestations of human disease. However, the genetic architecture of these AI endophenotypes remains largely unexplored in the context of human multiorgan system diseases. Using publicly available genome-wide association study summary statistics from the UK Biobank (UKBB), FinnGen, and the Psychiatric Genomics Consortium, we comprehensively depicted the genetic architecture of 2024 multiorgan AI endophenotypes (MAEs). We comparatively assessed the single-nucleotide polymorphism-based heritability, polygenicity, and natural selection signatures of 2024 MAEs using methods commonly used in the field. Genetic correlation and Mendelian randomization analyses reveal both within-organ relationships and cross-organ interconnections. Bi-directional causal relationships were established between chronic human diseases and MAEs across multiple organ systems, including Alzheimer's disease for the brain, diabetes for the metabolic system, asthma for the pulmonary system, and hypertension for the cardiovascular system. Finally, we derived polygenic risk scores for the 2024 MAEs for individuals not used to calculate MAEs and returned these to the UKBB. Our findings underscore the promise of the MAEs as new instruments to ameliorate overall human health. All results are encapsulated into the MUlTiorgan AI endophenoTypE genetic atlas and are publicly available at https://labs-laboratory.com/mutate.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11938998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143708594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing and overcoming the limitations of modular response analysis. 测试和克服模块化响应分析的局限性。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf098
Jean-Pierre Borg, Jacques Colinge, Patrice Ravel
{"title":"Testing and overcoming the limitations of modular response analysis.","authors":"Jean-Pierre Borg, Jacques Colinge, Patrice Ravel","doi":"10.1093/bib/bbaf098","DOIUrl":"10.1093/bib/bbaf098","url":null,"abstract":"<p><p>Modular response analysis (MRA) is an effective method to infer biological networks from perturbation data. However, it has several limitations such as strong sensitivity to noise, need of performing independent perturbations that hit a single node at a time, and linear approximation of dependencies within the network. Previously, we addressed the sensitivity of MRA to noise by reinterpreting MRA as a multilinear regression problem. We demonstrated the advantages of this approach over the conventional MRA and other known inference methods, particularly in handling noise measurements and nonlinear networks. Here, we provide new contributions to complement this theory. First, we overcome the need of perturbations to be independent, thereby augmenting MRA applicability. Second, using analysis of variance and lack-of-fit tests, we can now assess MRA compatibility with the data and identify the primary source of errors. In cases where nonlinearity prevails, we propose extending the model to a second-order polynomial. Third, we demonstrate how to effectively use prior knowledge about a network. We validated these results using 4 networks with known dynamics (3, 4, and 6 nodes) and 40 simulated networks, ranging from 10 to 200 nodes. Finally, we incorporated these innovations into our R software package MRARegress to offer a comprehensive, extended theory for MRA and to facilitate its use by the community. Mathematical aspects, tests details, and scripts are provided as Supplementary Information (see 'Data Availability Statement').</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11891662/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143584585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep generative model for protein subcellular localization prediction. 蛋白质亚细胞定位预测的深度生成模型。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf152
Guo-Hua Yuan, Jinzhe Li, Zejun Yang, Yao-Qi Chen, Zhonghang Yuan, Tao Chen, Wanli Ouyang, Nanqing Dong, Li Yang
{"title":"Deep generative model for protein subcellular localization prediction.","authors":"Guo-Hua Yuan, Jinzhe Li, Zejun Yang, Yao-Qi Chen, Zhonghang Yuan, Tao Chen, Wanli Ouyang, Nanqing Dong, Li Yang","doi":"10.1093/bib/bbaf152","DOIUrl":"https://doi.org/10.1093/bib/bbaf152","url":null,"abstract":"<p><p>Protein sequence not only determines its structure but also provides important clues of its subcellular localization. Although a series of artificial intelligence models have been reported to predict protein subcellular localization, most of them provide only textual outputs. Here, we present deepGPS, a deep generative model for protein subcellular localization prediction. After training with protein primary sequences and fluorescence images, deepGPS shows the ability to predict cytoplasmic and nuclear localizations by reporting both textual labels and generative images as outputs. In addition, cell-type-specific deepGPS models can be developed by using distinct image datasets from different cell lines for comparative analyses. Moreover, deepGPS shows potential to be further extended for other specific organelles, such as vesicles and endoplasmic reticulum, even with limited volumes of training data. Finally, the openGPS website (https://bits.fudan.edu.cn/opengps) is constructed to provide a publicly accessible and user-friendly platform for studying protein subcellular localization and function.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986326/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143975739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIRACN: a residual convolutional neural network for predicting cell line specific functional regulatory variants. MIRACN:残差卷积神经网络预测细胞系特异性功能调节变异。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf196
Zeyin Li, Min Wang, Songge Li, Fangyuan Shi
{"title":"MIRACN: a residual convolutional neural network for predicting cell line specific functional regulatory variants.","authors":"Zeyin Li, Min Wang, Songge Li, Fangyuan Shi","doi":"10.1093/bib/bbaf196","DOIUrl":"https://doi.org/10.1093/bib/bbaf196","url":null,"abstract":"<p><p>In post-genome-wide association study era, interpretation of noncoding variants remains a significant challenge due to their complexity and the limited understanding of their functions. Here, we developed MIRACN, a novel residual convolutional neural network designed to predict cell line-specific functional regulatory variants. By utilizing a substantial dataset from massively parallel reporter assays (MPRAs) and employing a multitask learning strategy, MIRACN was trained across seven distinct cell lines, attaining superior performance compared to existing methods, especially in predicting cell type specificity. Comparative evaluations on an independent MPRA test dataset demonstrated that MIRACN not only outperformed in identifying regulatory variants but also provided valuable insights into their cellular context-specific regulatory mechanisms. MIRACN is capable of not only providing scores for functional variants but also pinpointing the specific cell line in which these variants display their function. This enhancement has improved the resolution of current research on the functionality of noncoding variants and has paved the way for more precise diagnostic and therapeutic strategies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12021264/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143976948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PathSynergy: a deep learning model for predicting drug synergy in liver cancer. PathSynergy:用于预测肝癌药物协同作用的深度学习模型。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf192
Fengyue Zhang, Xuqi Zhao, Jinrui Wei, Lichuan Wu
{"title":"PathSynergy: a deep learning model for predicting drug synergy in liver cancer.","authors":"Fengyue Zhang, Xuqi Zhao, Jinrui Wei, Lichuan Wu","doi":"10.1093/bib/bbaf192","DOIUrl":"https://doi.org/10.1093/bib/bbaf192","url":null,"abstract":"<p><p>Cancer is a major public health problem while liver cancer is the main cause of global cancer-related deaths. The previous study demonstrates that the 5-year survival rate for advanced liver cancer is only 30%. Few of the first-line targeted drugs including sorafenib and lenvatinib are available, which often develop resistance. Drug combination therapy is crucial for improving the efficacy of cancer therapy and overcoming resistance. However, traditional methods for discovering drug synergy are costly and time consuming. In this study, we developed a novel predicting model PathSynergy by integrating drug feature data, cell line data, drug-target interactions, and signaling pathways. PathSynergy combined the advantages of graph neural networks and pathway map mapping. Comparing with other baseline models, PathSynergy showed better performance in model classification, accuracy, and precision. Excitingly, six Food and Drug Administration (FDA)-approved drugs including pimecrolimus, topiramate, nandrolone_decanoate, fluticasone propionate, zanubrutinib, and levonorgestrel were predicted and validated to show synergistic effects with sorafenib or lenvatinib against liver cancer for the first time. In general, the PathSynergy model provides a new perspective to discover synergistic combinations of drugs and has broad application potential in the fields of drug discovery and personalized medicine.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12021016/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143972948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
sTPLS: identifying common and specific correlated patterns under multiple biological conditions. sTPLS:在多种生物条件下识别共同和特定的相关模式。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf195
Jinyu Chen, Wenwen Min
{"title":"sTPLS: identifying common and specific correlated patterns under multiple biological conditions.","authors":"Jinyu Chen, Wenwen Min","doi":"10.1093/bib/bbaf195","DOIUrl":"https://doi.org/10.1093/bib/bbaf195","url":null,"abstract":"<p><p>The rapidly emerging large-scale data in diverse biological research fields present valuable opportunities to explore the underlying mechanisms of tissue development and disease progression. However, few existing methods can simultaneously capture common and condition-specific association between different types of features across different biological conditions, such as cancer types or cell populations. Therefore, we developed the sparse tensor-based partial least squares (sTPLS) method, which integrates multiple pairs of datasets containing two types of features but derived from different biological conditions. We demonstrated the effectiveness and versatility of sTPLS through simulation study and three biological applications. By integrating the pairwise pharmacogenomic data, sTPLS identified 11 gene-drug comodules with high biological functional relevance specific for seven cancer types and two comodules that shared across multi-type cancers, such as breast, ovarian, and colorectal cancers. When applied to single-cell data, it uncovered nine gene-peak comodules representing transcriptional regulatory relationships specific for five cell types and three comodules shared across similar cell types, such as intermediate and naïve B cells. Furthermore, sTPLS can be directly applied to tensor-structured data, successfully revealing shared and distinct cell communication patterns mediated by the MK signaling pathway in coronavirus disease 2019 patients and healthy controls. These results highlight the effectiveness of sTPLS in identifying biologically meaningful relationships across diverse conditions, making it useful for multi-omics integrative analysis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12031727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143959543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FGeneBERT: function-driven pre-trained gene language model for metagenomics. 元基因组学的功能驱动预训练基因语言模型。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf149
Chenrui Duan, Zelin Zang, Yongjie Xu, Hang He, Siyuan Li, Zihan Liu, Zhen Lei, Ju-Sheng Zheng, Stan Z Li
{"title":"FGeneBERT: function-driven pre-trained gene language model for metagenomics.","authors":"Chenrui Duan, Zelin Zang, Yongjie Xu, Hang He, Siyuan Li, Zihan Liu, Zhen Lei, Ju-Sheng Zheng, Stan Z Li","doi":"10.1093/bib/bbaf149","DOIUrl":"https://doi.org/10.1093/bib/bbaf149","url":null,"abstract":"<p><p>Metagenomic data, comprising mixed multi-species genomes, are prevalent in diverse environments like oceans and soils, significantly impacting human health and ecological functions. However, current research relies on K-mer, which limits the capture of structurally and functionally relevant gene contexts. Moreover, these approaches struggle with encoding biologically meaningful genes and fail to address the one-to-many and many-to-one relationships inherent in metagenomic data. To overcome these challenges, we introduce FGeneBERT, a novel metagenomic pre-trained model that employs a protein-based gene representation as a context-aware and structure-relevant tokenizer. FGeneBERT incorporates masked gene modeling to enhance the understanding of inter-gene contextual relationships and triplet enhanced metagenomic contrastive learning to elucidate gene sequence-function relationships. Pre-trained on over 100 million metagenomic sequences, FGeneBERT demonstrates superior performance on metagenomic datasets at four levels, spanning gene, functional, bacterial, and environmental levels and ranging from 1 to 213 k input sequences. Case studies of ATP synthase and gene operons highlight FGeneBERT's capability for functional recognition and its biological relevance in metagenomic research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143974992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MOSim: bulk and single-cell multilayer regulatory network simulator. MOSim:批量和单细胞多层调节网络模拟器。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf110
Carolina Monzó, Maider Aguerralde-Martin, Carlos Martínez-Mira, Ángeles Arzalluz-Luque, Ana Conesa, Sonia Tarazona
{"title":"MOSim: bulk and single-cell multilayer regulatory network simulator.","authors":"Carolina Monzó, Maider Aguerralde-Martin, Carlos Martínez-Mira, Ángeles Arzalluz-Luque, Ana Conesa, Sonia Tarazona","doi":"10.1093/bib/bbaf110","DOIUrl":"10.1093/bib/bbaf110","url":null,"abstract":"<p><p>As multi-omics sequencing technologies advance, the need for simulation tools capable of generating realistic and diverse (bulk and single-cell) multi-omics datasets for method testing and benchmarking becomes increasingly important. We present MOSim, an R package that simulates both bulk (via mosim function) and single-cell (via sc_mosim function) multi-omics data. The mosim function generates bulk transcriptomics data (RNA-seq) and additional regulatory omics layers (ATAC-seq, miRNA-seq, ChIP-seq, Methyl-seq, and transcription factors), while sc_mosim simulates single-cell transcriptomics data (scRNA-seq) with scATAC-seq and transcription factors as regulatory layers. The tool supports various experimental designs, including simulation of gene co-expression patterns, biological replicates, and differential expression between conditions. MOSim enables users to generate quantification matrices for each simulated omics data type, capturing the heterogeneity and complexity of bulk and single-cell multi-omics datasets. Furthermore, MOSim provides differentially abundant features within each omics layer and elucidates the active regulatory relationships between regulatory omics and gene expression data at both bulk and single-cell levels. By leveraging MOSim, researchers will be able to generate realistic and customizable bulk and single-cell multi-omics datasets to benchmark and validate analytical methods specifically designed for the integrative analysis of diverse regulatory omics data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11926980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143673317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cancer gene identification through integrating causal prompting large language model with omics data-driven causal inference. 整合因果提示大语言模型与组学数据驱动的因果推理的癌症基因鉴定。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf113
Haolong Zeng, Chaoyi Yin, Chunyang Chai, Yuezhu Wang, Qi Dai, Huiyan Sun
{"title":"Cancer gene identification through integrating causal prompting large language model with omics data-driven causal inference.","authors":"Haolong Zeng, Chaoyi Yin, Chunyang Chai, Yuezhu Wang, Qi Dai, Huiyan Sun","doi":"10.1093/bib/bbaf113","DOIUrl":"10.1093/bib/bbaf113","url":null,"abstract":"<p><p>Identifying genes causally linked to cancer from a multi-omics perspective is essential for understanding the mechanisms of cancer and improving therapeutic strategies. Traditional statistical and machine-learning methods that rely on generalized correlation approaches to identify cancer genes often produce redundant, biased predictions with limited interpretability, largely due to overlooking confounding factors, selection biases, and the nonlinear activation function in neural networks. In this study, we introduce a novel framework for identifying cancer genes across multiple omics domains, named ICGI (Integrative Causal Gene Identification), which leverages a large language model (LLM) prompted with causality contextual cues and prompts, in conjunction with data-driven causal feature selection. This approach demonstrates the effectiveness and potential of LLMs in uncovering cancer genes and comprehending disease mechanisms, particularly at the genomic level. However, our findings also highlight that current LLMs may not capture comprehensive information across all omics levels. By applying the proposed causal feature selection module to transcriptomic datasets from six cancer types in The Cancer Genome Atlas and comparing its performance with state-of-the-art methods, it demonstrates superior capability in identifying cancer genes that distinguish between cancerous and normal samples. Additionally, we have developed an online service platform that allows users to input a gene of interest and a specific cancer type. The platform provides automated results indicating whether the gene plays a significant role in cancer, along with clear and accessible explanations. Moreover, the platform summarizes the inference outcomes obtained from data-driven causal learning methods.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899576/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143613380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting differentially methylated cytosines in TET and DNMT3 knockout mutants via a large language model. 通过大型语言模型预测TET和DNMT3敲除突变体中不同甲基化的胞嘧啶。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf092
Saleh Sereshki, Stefano Lonardi
{"title":"Predicting differentially methylated cytosines in TET and DNMT3 knockout mutants via a large language model.","authors":"Saleh Sereshki, Stefano Lonardi","doi":"10.1093/bib/bbaf092","DOIUrl":"10.1093/bib/bbaf092","url":null,"abstract":"<p><p>DNA methylation is an epigenetic marker that directly or indirectly regulates several critical cellular processes. While cytosines in mammalian genomes generally maintain stable methylation patterns over time, other cytosines that belong to specific regulatory regions, such as promoters and enhancers, can exhibit dynamic changes. These changes in methylation are driven by a complex cellular machinery, in which the enzymes DNMT3 and TET play key roles. The objective of this study is to design a machine learning model capable of accurately predicting which cytosines have a fluctuating methylation level [hereafter called differentially methylated cytosines (DMCs)] from the surrounding DNA sequence. Here, we introduce L-MAP, a transformer-based large language model that is trained on DNMT3-knockout and TET-knockout data in human and mouse embryonic stem cells. Our extensive experimental results demonstrate the high accuracy of L-MAP in predicting DMCs. Our experiments also explore whether a classifier trained on human knockout data could predict DMCs in the mouse genome (and vice versa), and whether a classifier trained on DNMT3 knockout data could predict DMCs in TET knockouts (and vice versa). L-MAP enables the identification of sequence motifs associated with the enzymatic activity of DNMT3 and TET, which include known motifs but also novel binding sites that could provide new insights into DNA methylation in stem cells. L-MAP is available at https://github.com/ucrbioinfo/dmc_prediction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11904404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143623602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信