Bioinformatics advances最新文献

筛选
英文 中文
smartSim: simulation of splice aware single cell smart-seq3 data. smartSim:模拟拼接感知的单细胞smart-seq3数据。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-30 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf183
Marie Van Hecke, Kathleen Marchal
{"title":"smartSim: simulation of splice aware single cell smart-seq3 data.","authors":"Marie Van Hecke, Kathleen Marchal","doi":"10.1093/bioadv/vbaf183","DOIUrl":"10.1093/bioadv/vbaf183","url":null,"abstract":"<p><strong>Motivation: </strong>Smart-seq3 is a powerful full-length single-cell RNA sequencing protocol that enables transcript-level quantification and splicing analysis by preserving unique molecular identifier (UMI) information. However, benchmarking computational tools for isoform reconstruction and splicing quantification remains challenging due to the lack of ground truth datasets. Herein, we present smartSim, a Smart-seq3 read simulator designed to generate realistic sequencing data that accurately reflects the complexities of single-cell transcriptomics.</p><p><strong>Results: </strong>smartSim simulates known and novel splicing events, generates both UMI-containing and internal reads, and mimics protocol-specific biases by leveraging empirical data distributions. Our results show that smartSim-generated data closely resembles real Smart-seq3 datasets in terms of fragment length distributions, internal read counts, and read quality scores. It generates raw sequencing reads in FASTQ format, making it compatible with both genome- and transcriptome-based alignment tools. By extending simulation beyond gene-level quantification, smartSim provides a crucial resource for evaluating and improving computational methods for alternative splicing detection and isoform reconstruction in single-cell RNA sequencing.</p><p><strong>Availability and implementation: </strong>smartSim is available at https://github.com/MarchalLab/smartSim.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf183"},"PeriodicalIF":2.8,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12373632/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating protein complex model accuracy using graph transformers and pairwise similarity graphs. 利用图转换器和两两相似图估计蛋白质复合体模型的精度。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-29 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf180
Jian Liu, Pawan Neupane, Jianlin Cheng
{"title":"Estimating protein complex model accuracy using graph transformers and pairwise similarity graphs.","authors":"Jian Liu, Pawan Neupane, Jianlin Cheng","doi":"10.1093/bioadv/vbaf180","DOIUrl":"10.1093/bioadv/vbaf180","url":null,"abstract":"<p><strong>Motivation: </strong>Estimation of protein complex structure accuracy is essential for effective structural model selection in structural biology applications such as protein function analysis and drug design. Despite the success of structure prediction methods such as AlphaFold2 and AlphaFold3, selecting top-quality structural models from large model pools remains challenging.</p><p><strong>Results: </strong>We present GATE, a novel method that uses graph transformers on pairwise model similarity graphs to predict the quality (accuracy) of complex structural models. By integrating single-model and multimodel quality features, GATE captures intrinsic model characteristics and intermodel geometric similarities to make robust predictions. On the dataset of the 15th Critical Assessment of Protein Structure Prediction (CASP15), GATE achieved the highest Pearson's correlation (0.748) and the lowest ranking loss (0.1191) compared with existing methods. In the blind CASP16 experiment, GATE ranked fifth based on the sum of z-scores, with a Pearson's correlation of 0.7076 (first), a Spearman's correlation of 0.4514 (fourth), a ranking loss of 0.1221 (third), and an area under the curve score of 0.6680 (third) on per-target TM-score-based metrics. Additionally, GATE also performed consistently on large in-house datasets generated by extensive AlphaFold-based sampling with MULTICOM4, confirming its robustness and practical applicability in real-world model selection scenarios.</p><p><strong>Availability and implementation: </strong>GATE is available at https://github.com/BioinfoMachineLearning/GATE.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf180"},"PeriodicalIF":2.8,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diffusion model for imputing time-series gut microbiome profiles using phylogenetic information and metadata integration. 基于系统发育信息和元数据集成的时间序列肠道微生物组图谱的扩散模型。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-28 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf181
Misato Seki, Yao-Zhong Zhang, Seiya Imoto
{"title":"Diffusion model for imputing time-series gut microbiome profiles using phylogenetic information and metadata integration.","authors":"Misato Seki, Yao-Zhong Zhang, Seiya Imoto","doi":"10.1093/bioadv/vbaf181","DOIUrl":"10.1093/bioadv/vbaf181","url":null,"abstract":"<p><strong>Motivation: </strong>The gut microbiota interacts closely with the host, playing crucial roles in maintaining health. Analysing time-series genomic data enables the investigation of dynamic microbiota changes. However, missing values create significant analytical challenges.</p><p><strong>Results: </strong>We propose a microbiome imputation framework based on a conditional score-based diffusion model, tailored to microbiome data by incorporating phylogenetic convolutional layers. Our method effectively reduces mean absolute error across various missing data ratios for both 16S rRNA and whole-genome shotgun profiles. The imputed datasets enhance downstream predictive tasks, achieving area under the curve scores that exceed or are comparable with those of the existing methods. To further improve the performance, we embedded host metadata into the model using a tabular encoding approach, which yielded additional improvements particularly under higher missing ratios. Our findings underscore the potential of the diffusion model for processing time-series microbiome data with missing values.</p><p><strong>Availability and implementation: </strong>Related codes and dataset can be found at: https://github.com/misatoseki/metag_time_impute_phylo.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf181"},"PeriodicalIF":2.8,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12371328/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exonize: a tool for finding and classifying exon duplications in annotated genomes. Exonize:一个在带注释的基因组中发现和分类外显子重复的工具。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-28 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf177
Marina Herrera Sarrias, Christopher W Wheat, Liam M Longo, Lars Arvestad
{"title":"Exonize: a tool for finding and classifying exon duplications in annotated genomes.","authors":"Marina Herrera Sarrias, Christopher W Wheat, Liam M Longo, Lars Arvestad","doi":"10.1093/bioadv/vbaf177","DOIUrl":"10.1093/bioadv/vbaf177","url":null,"abstract":"<p><strong>Summary: </strong>The protein-coding regions of eukaryotic genes are fragmented into exons that, like the genes within which they are situated, can be duplicated, deleted, or reorganized. Cataloging and organizing within-gene exon similarities is necessary for a systematic study of exon evolution and its consequences. To facilitate the study of exon duplications, we present Exonize, a computational tool that identifies and classifies coding exon duplications in annotated genomes. Exonize implements a graph-based framework to handle clusters of related exons resulting from repeated rounds of exon duplication. The interdependence between duplicated exons or groups of exons across transcripts is classified. By identifying duplication events between exonic and intronic regions, Exonize can detect unannotated or degenerate exons. To aid in data parsing and downstream analysis, the Python module exonize_analysis is provided. The application of Exonize to 20 eukaryote genomes identifies full-exon duplications in at least 4% of vertebrate genes, with more than 900 human genes having a full-exon duplication event.</p><p><strong>Availability and implementation: </strong>Exonize is available at https://github.com/msarrias/exonize.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf177"},"PeriodicalIF":2.8,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343006/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In silico design of immunogenic antigen cocktail via affinity maturation-guided optimization. 基于亲和成熟导向优化的免疫原性抗原鸡尾酒的芯片设计。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-28 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf182
A N M Nafiz Abeer, Bong-Seong Koo, Byung-Jun Yoon
{"title":"<i>In silico</i> design of immunogenic antigen cocktail via affinity maturation-guided optimization.","authors":"A N M Nafiz Abeer, Bong-Seong Koo, Byung-Jun Yoon","doi":"10.1093/bioadv/vbaf182","DOIUrl":"10.1093/bioadv/vbaf182","url":null,"abstract":"<p><strong>Summary: </strong>The increasing emergence of new virus strains with increased infectiousness necessitates a more proactive approach for effective vaccine design. To achieve this goal, it is critical to shift the vaccine design paradigm from traditional approaches that rely on expert intuition and experimental methods toward data-driven strategies that leverage <i>in silico</i> design and virtual screening. In this work, we propose a computational pipeline for designing an optimized immunogenic cocktail that can boost the immune response. The proposed pipeline consists of two stages, where potential antigen candidates are identified in the first stage, followed by the optimal selection and combination of the candidates in the second stage to maximize the expected immunogenicity. We leverage predictive models trained using deep mutational scanning data to drive the candidate antigen selection process based on three selection criteria-namely, binding affinity between viral protein and receptor, antibody escape probability, and sequence diversity. To identify the optimal cocktail within the pool of selected antigens, we adopt a combinatorial optimization framework, where the cocktail design is iteratively refined based on the expected efficacy predicted by a sequence-based computational model of affinity maturation. Validation of the designed cocktails through structure-based affinity maturation simulation demonstrates the efficacy of the proposed modular framework for designing an optimized immunogenic cocktail.</p><p><strong>Availability and implementation: </strong>The code for cocktail design is available in https://github.com/nafizabeer/Antigen_Cocktail_Design.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf182"},"PeriodicalIF":2.8,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360842/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144884405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SW-actors: accelerating the Smith-Waterman algorithm via actors. SW-actors:通过actor加速Smith-Waterman算法。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-28 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf173
Reza Rafati Bonab, Ali Akbar Jamali, Kyle Klenk, Mohammad Mahdi Moayeri, Raymond J Spiteri
{"title":"SW-actors: accelerating the Smith-Waterman algorithm via actors.","authors":"Reza Rafati Bonab, Ali Akbar Jamali, Kyle Klenk, Mohammad Mahdi Moayeri, Raymond J Spiteri","doi":"10.1093/bioadv/vbaf173","DOIUrl":"10.1093/bioadv/vbaf173","url":null,"abstract":"<p><strong>Motivation: </strong>The Smith-Waterman (SW) algorithm is widely regarded as the gold standard for local sequence alignment. However, its time complexity in a serial implementation limits its practicality for large datasets. In this article, we introduce SW-actors, a parallel implementation of the SW algorithm that leverages the actor model of concurrent computation to optimize resource utilization by efficiently scheduling and managing independent alignment tasks across processors at both the interalignment and intraalignment levels.</p><p><strong>Results: </strong>SW-actors is compared with the state-of-the-art implementations Parasail, SeqAn, and SWIPE using four datasets of varying sequence lengths ranging from 85 to 74778 nucleotides. In terms of wall-clock time, SW-actors is 1.33 <math><mo>×</mo></math> , 2.00 <math><mo>×</mo></math> , 2.49 <math><mo>×</mo></math> , and 1.94 <math><mo>×</mo></math> faster than the next best implementation for the different datasets. SW-actors is up to 22 <math><mo>×</mo></math> faster than serial on 40 cores. The speedup is consistent for larger datasets and hence offers significant advantages for medium- to large-scale alignments.</p><p><strong>Availability and implementation: </strong>The SW-actors source code and underlying data are available at https://git.cs.usask.ca/numerical_simulations_lab/actors/papers/sw-actors.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf173"},"PeriodicalIF":2.8,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gene-set enrichment analysis and visualization on the web using EnrichmentMap:RNASeq. 基因集富集分析和可视化在网络上使用富集地图:RNASeq。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-24 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf178
Max Franz, Christian T Lopes, Mike Kucera, Veronique Voisin, Ruth Isserlin, Gary D Bader
{"title":"Gene-set enrichment analysis and visualization on the web using EnrichmentMap:RNASeq.","authors":"Max Franz, Christian T Lopes, Mike Kucera, Veronique Voisin, Ruth Isserlin, Gary D Bader","doi":"10.1093/bioadv/vbaf178","DOIUrl":"10.1093/bioadv/vbaf178","url":null,"abstract":"<p><strong>Summary: </strong>EnrichmentMap: RNASeq (enrichmentmap.org) is an intuitive, web-based app for gene-set enrichment analysis and visualization, specifically supporting two-case RNA-Seq experiments for Homo sapiens. The web app introduces a simplified user interface, faster processing times, and eliminates the need for software installation compared to running similar workflows in the Cytoscape desktop software, catering to biologists with minimal computational experience. EnrichmentMap: RNASeq is a new type of Cytoscape web app that is interoperable with Cytoscape.</p><p><strong>Availability and implementation: </strong>The app is available to use at enrichmentmap.org and the source code is available at github.com/cytoscape/enrichment-map-webapp.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf178"},"PeriodicalIF":2.8,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12373637/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-metric locality sensitive hashing enhances alignment accuracy of bisulfite sequencing reads: BisHash. 多度量局域敏感哈希法提高亚硫酸根序列的比对精度。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf144
Hassan Nikaein, Ali Sharifi-Zarchi
{"title":"Multi-metric locality sensitive hashing enhances alignment accuracy of bisulfite sequencing reads: BisHash.","authors":"Hassan Nikaein, Ali Sharifi-Zarchi","doi":"10.1093/bioadv/vbaf144","DOIUrl":"10.1093/bioadv/vbaf144","url":null,"abstract":"<p><strong>Motivation: </strong>Locality-Sensitive Hashing (LSH) is a widely used algorithm for estimating similarity between large datasets in bioinformatics, with applications in genome assembly, sequence alignment, and metagenomics. However, traditional single-metric LSH approaches often lead to inefficiencies, particularly when handling biological data where regions may have diverse evolutionary histories or structural properties. This limitation can reduce accuracy in sequence alignment, variant calling, and functional analysis.</p><p><strong>Results: </strong>We propose Multi-Metric Locality-Sensitive Hashing (M2LSH), an extension of LSH that integrates multiple similarity metrics for more accurate analysis of complex biological data. By capturing diverse sequence and structural features, M2LSH improves performance in heterogeneous and evolutionarily diverse regions. Building on this, we introduce Multi-Metric MinHash (M3Hash), enhancing sequence alignment and similarity detection. As a proof of concept, we present BisHash, which applies M2LSH to bisulfite sequencing, a key method in DNA methylation analysis. Although not fully optimized, BisHash demonstrates superior accuracy, particularly in challenging scenarios like cancer studies where traditional approaches often fail. Our results highlight the potential of M2LSH and M3Hash to advance bioinformatics research.</p><p><strong>Availability and implementation: </strong>The source code for BisHash and the test procedures for benchmarking aligners using simulated data are publicly accessible at https://github.com/hnikaein/bisHash.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf144"},"PeriodicalIF":2.8,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144884407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical relationships across epigenomes using large-scale hierarchical clustering. 使用大规模分层聚类的表观基因组统计关系。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf175
Anastasiia Kim, Nicholas Lubbers, Christina R Steadman, Karissa Y Sanbonmatsu
{"title":"Statistical relationships across epigenomes using large-scale hierarchical clustering.","authors":"Anastasiia Kim, Nicholas Lubbers, Christina R Steadman, Karissa Y Sanbonmatsu","doi":"10.1093/bioadv/vbaf175","DOIUrl":"10.1093/bioadv/vbaf175","url":null,"abstract":"<p><strong>Motivation: </strong>Recent advances in genomics and sequencing platforms have revolutionized our ability to create immense data sets, particularly for studying epigenetic regulation of gene expression. However, the avalanche of epigenomic data is difficult to parse for biological interpretation given nonlinear complex patterns and relationships. This attractive challenge in epigenomic data lends itself to machine learning for discerning infectivity and susceptibility. In this study, we explore over 3000 epigenomes of uninfected individuals and provide a framework to characterize the relationships among epigenetic modifiers, their modifiers, genetic loci, and specific immune cell types across all chromosomes using hierarchical clustering.</p><p><strong>Results: </strong>Hierarchical clustering of epigenomic data revealed consistent epigenetic patterns across chromosomes, demonstrating that variation due to epigenetic modifiers is greater than variation between cell types. Gene Ontology and KEGG pathway analyses indicated significant enrichment of genes involved in chromatin remodeling, mRNA splicing, immune responses, and the regulation of microRNAs and snoRNAs. Epigenetic modifiers frequently formed biologically relevant clusters, including the cohesin complex, RNA Polymerase II transcription factors, and PRC2 complex members. These clustering behaviors remained consistent across all chromosomes, supported by entropy analysis and high Adjusted Rand Index scores, indicating robust cross-chromosomal similarity. Co-occurrence analysis further revealed specific sets of modifiers that consistently appeared together within clusters, reflecting shared biological functions and interactions. Validation using another dataset confirmed the reproducibility of these clustering patterns and modifier co-occurrence relationships, underscoring the reliability and generalizability of the methodology.</p><p><strong>Availability and implementation: </strong>The analysis pipeline for this study is freely available online at the GitHub repository: https://github.com/lanl/epigen.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf175"},"PeriodicalIF":2.8,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12373635/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-based metabolite function prediction using graph neural networks. 基于结构的基于图神经网络的代谢物功能预测。
IF 2.8
Bioinformatics advances Pub Date : 2025-07-21 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf174
Tancredi Cogne, Mariam Ait Oumelloul, Ali Saadat, Janna Hastings, Jacques Fellay
{"title":"Structure-based metabolite function prediction using graph neural networks.","authors":"Tancredi Cogne, Mariam Ait Oumelloul, Ali Saadat, Janna Hastings, Jacques Fellay","doi":"10.1093/bioadv/vbaf174","DOIUrl":"10.1093/bioadv/vbaf174","url":null,"abstract":"<p><strong>Motivation: </strong>Being able to broadly predict the function of novel metabolites based on their structures has applications in systems biology, environmental monitoring, and drug discovery. To date, machine learning models aiming to predict functional characteristics of metabolites have largely been limited in scope to predicting single functions, or only a small number of functions simultaneously.</p><p><strong>Results: </strong>Using the Human Metabolome Database as a source for a wider range of functional annotations, we assess the feasibility of predicting metabolite functions more broadly, as defined by four elements, namely location, role, the process it is involved in, and its physiological effect. We evaluated three graph neural network architectures to predict available functional ontology terms. We compared the graph models with two multilayer perceptron architectures using circular fingerprints and Chemical BiDirectional Encoder Representations from Transformers (ChemBERTa) embeddings. Among the models tested, the graph attention network, incorporating embeddings from the pretrained ChemBERTa model to predict the process metabolites are involved in, achieved the highest performance with a macro F1-score of 0.903 and an area under the precision-recall curve of 0.926.</p><p><strong>Availability and implementation: </strong>The model identified function-associated structural patterns within metabolite families, demonstrating the potential for interpretable prediction of metabolite functions from structural information.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf174"},"PeriodicalIF":2.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343106/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信