BMC Bioinformatics最新文献_第9页

Adaptive gradient scaling: integrating Adam and landscape modification for protein structure prediction. 自适应梯度缩放：整合亚当和景观修饰用于蛋白质结构预测。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-07-01 DOI: 10.1186/s12859-025-06185-2

Vitalii Kapitan, Michael Choi

{"title":"Adaptive gradient scaling: integrating Adam and landscape modification for protein structure prediction.","authors":"Vitalii Kapitan, Michael Choi","doi":"10.1186/s12859-025-06185-2","DOIUrl":"10.1186/s12859-025-06185-2","url":null,"abstract":"Background: Protein structure prediction is one of the most important scientific problems, on the one hand, it is one of the NP-hard problems, and on the other hand, it has a wide range of applications including drug discovery and biotechnology development. Since experimental methods for structure determination remain expensive and time-consuming, computational structure prediction offers a scalable and cost-effective alternative and application of machine learning in structural biology has revolutionized protein structure prediction. Despite their success, machine learning methods face fundamental limitations in optimizing complex high-dimensional energy landscapes, which motivates research into new methods to improve the robustness and performance of optimization algorithms.Results: This study presents a novel approach to protein structure prediction by integrating the Landscape Modification (LM) method with the Adam optimizer for OpenFold. The main idea is to change the optimization dynamics by introducing a gradient scaling mechanism based on energy landscape transformations. LM dynamically adjusts gradients using a threshold parameter and a transformation function, thereby improving the optimizer's ability to avoid local minima, more efficiently traverse flat or rough landscape regions, and potentially converge faster to global or high-quality local optima. By integrating simulated annealing into the LM approach, we propose LM SA, a variant designed to improve convergence stability while facilitating more efficient exploration of complex landscapes.Conclusion: We compare the performance of standard Adam, LM, and LM SA on different datasets and computational conditions. Performance was evaluated using Loss function values, predicted Local Distance Difference Test (pLDDT), distance-based Root Mean Square Deviation (dRMSD), and Template Modeling (TM) scores. Our results show that LM and LM SA outperform the standard Adam across all metrics, showing faster convergence and better generalization, particularly on proteins not included in the training set. These results demonstrate that integrating landscape-aware gradient scaling into first-order optimizers advances research in computational optimization and improves prediction performance for complex problems such as protein folding.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"161"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12210780/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differential expression analysis with inmoose, the integrated multi-omic open-source environment in Python. 差分表达式分析用inmoose，集成在Python中的多组开源环境。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-23 DOI: 10.1186/s12859-025-06180-7

Maximilien Colange, Guillaume Appé, Léa Meunier, Solène Weill, Akpéli Nordor, Abdelkader Behdenna

{"title":"Differential expression analysis with inmoose, the integrated multi-omic open-source environment in Python.","authors":"Maximilien Colange, Guillaume Appé, Léa Meunier, Solène Weill, Akpéli Nordor, Abdelkader Behdenna","doi":"10.1186/s12859-025-06180-7","DOIUrl":"10.1186/s12859-025-06180-7","url":null,"abstract":"Background: Differential gene expression analysis is a prominent technique for the analysis of biomolecular data to identify genetic features associated with phenotypes. Limma-for microarray data -, and edgeR and DESeq2-for RNA-Seq data-, are the most widely used tools for differential gene expression analysis of bulk transcriptomic data.Results: We present the differential expression features of InMoose, a Python implementation of R tools limma, edgeR, and DESeq2. We experimentally show that InMoose stands as a drop-in replacement for those tools, with nearly identical results. This ensures reproducibility when interfacing both languages in bioinformatic pipelines. InMoose is an open source software released under the GPL3 license, available at www.github.com/epigenelabs/inmoose and https://inmoose.readthedocs.io .Conclusions: We present a new Python implementation of state-of-the-art tools limma, edgeR, and DESeq2, to perform differential gene expression analysis of bulk transcriptomic data. This new implementation exhibits results nearly identical to the original tools, improving interoperability and reproducibility between Python and R bioinformatics pipelines.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"160"},"PeriodicalIF":2.9,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12183803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PRCFX-DT: a new graph-based approach for feature selection and classification of genomic sequences. PRCFX-DT：一种新的基于图的基因组序列特征选择和分类方法。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-17 DOI: 10.1186/s12859-025-06183-4

Amin Khodaei, Sania Eskandari, Hadi Sharifi, Behzad Mozaffari-Tazehkand

{"title":"PRCFX-DT: a new graph-based approach for feature selection and classification of genomic sequences.","authors":"Amin Khodaei, Sania Eskandari, Hadi Sharifi, Behzad Mozaffari-Tazehkand","doi":"10.1186/s12859-025-06183-4","DOIUrl":"10.1186/s12859-025-06183-4","url":null,"abstract":"Background: In recent years, viral diseases have exhibited a significant incidence of infections and fatalities. The analysis of viral genomic sequences can be efficacious in evaluating the present and potentially forthcoming condition of viruses. Considering the importance of the internal structure of the cell and the nucleotide sequences within it, analyzing nucleotide sequences can provide a range of discussable features. On the other hand, it has been demonstrated that the use of graph algorithms and machine learning in the analysis and examination of virus samples and even viral variants can yield beneficial results.Results: This study proposes a novel approach that utilizes complex networks and probabilistic graph modeling methods to analyze viral genomic sequences for feature extraction. The proposed approach, which relies on the PageRank centrality algorithm, operates on codons that are associated with the nucleotide sequences. Experiments with machine learning algorithms were conducted on multiple datasets of viruses and various variants of coronavirus and influenza viruses. The use of a decision tree classifier model on the extracted distinguishing features enabled the differentiation of coronavirus samples from other samples. The high discriminative capability of the graph node centrality feature played a significant role in these experiments, establishing a meaningful connection with genetic concepts as well. The decision tree classifier applied on 173,228 genomic sequence samples originating from 30 distinct virus types, showed a remarkable accuracy rate of 99.73%.Conclusion: The proposed algorithm was successfully tested on several types of viruses, and the interpretability of the extracted features also enabled its structural analysis. The use of a graph-based approach on genetic features containing information about the internal structure of nucleotides yielded results that could be significant for the identification of any type of virus or specific viral variant.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"159"},"PeriodicalIF":2.9,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12172359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144315857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Start & Stop: a PhysiCell and PhysiBoSS 2.0 add-on for interactive simulation control. 开始和停止：一个物理和物理boss 2.0插件，用于交互式仿真控制。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-11 DOI: 10.1186/s12859-025-06144-x

Riccardo Smeriglio, Roberta Bardini, Alessandro Savino, Stefano Di Carlo

{"title":"Start & Stop: a PhysiCell and PhysiBoSS 2.0 add-on for interactive simulation control.","authors":"Riccardo Smeriglio, Roberta Bardini, Alessandro Savino, Stefano Di Carlo","doi":"10.1186/s12859-025-06144-x","DOIUrl":"10.1186/s12859-025-06144-x","url":null,"abstract":"In computational biology, in silico simulators are vital for exploring and understanding the behavior of complex biological systems. Hybrid multi-level simulators, such as PhysiCell and PhysiBoSS 2.0, integrate multiple layers of biological complexity, providing deeper insights into emergent patterns. However, one key limitation of these simulators is the inability to adjust simulation parameters once the simulation has started, which hinders the interactive exploration and adaptation of dynamic protocols ranging from biofabrication to in vitro pharmacological testing. To address this challenge, we introduce the Start & Stop add-on for PhysiCell and PhysiBoSS 2.0. This add-on offers multi-level state preservation and multi-modal stop control, triggered by simulation time or cell conditions, enabling users to pause a simulation, adjust parameters, and then resume from the exact halted state. We validate Start & Stop using two well-established PhysiBoSS 2.0 use cases, a tumor spheroid 3T3 mouse fibroblasts use case under tumor necrosis factor (TNF) stimulation, and a lung cancer cell line invasion simulation, demonstrating that it preserves the simulator's original behavior while enabling interactive configuration changes that facilitate the exploration of diverse and adaptive treatment strategies. By enhancing flexibility and user interaction, Start & Stop makes PhysiCell and PhysiBoSS 2.0 more akin to real in vitro scenarios, thus expanding the range of potential simulations and advancing more effective protocol development in a variety of applications.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"158"},"PeriodicalIF":2.9,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12160357/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144274182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SCATrans: semantic cross-attention transformer for drug-drug interaction predication through multimodal biomedical data. SCATrans：通过多模态生物医学数据预测药物-药物相互作用的语义交叉注意转换器。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-10 DOI: 10.1186/s12859-025-06165-6

Shanwen Zhang, Changqing Yu, Chuanlei Zhang

{"title":"SCATrans: semantic cross-attention transformer for drug-drug interaction predication through multimodal biomedical data.","authors":"Shanwen Zhang, Changqing Yu, Chuanlei Zhang","doi":"10.1186/s12859-025-06165-6","DOIUrl":"10.1186/s12859-025-06165-6","url":null,"abstract":"Predicting potential drug-drug interactions (DDIs) from biomedical data plays a critical role in drug therapy, drug development, drug regulation, and public health. However, it remains challenging due to the large number of possible drug combinations, and multimodal biomedical data, which is disorder, imbalanced, more prone to linguistic errors, and difficult to label. A Semantic Cross-Attention Transformer (SCAT) model is constructed to address the above challenge. In the model, BioBERT, Doc2Vec and graph convolutional network are utilized to embed the multimodal biomedical data into vector representation, BiGRU is adopted to capture contextual dependencies in both forward and backward directions, Cross-Attention is employed to integrate the extracted features and explicitly model dependencies between them, and a feature-joint classifier is adopted to implement DDI predication (DDIP). The experiment results on the DDIExtraction-2013 dataset demonstrate that SCAT outperforms the state-of-the-art DDIP approaches. SCAT expands the application of multimodal deep learning in the field of multimodal DDIP, and can be applied to drug regulation systems to predict novel DDIs and DDI-related events.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"157"},"PeriodicalIF":2.9,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144265200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SEQSIM: A novel bioinformatics tool for comparisons of promoter regions-a case study of calcium binding protein spermatid associated 1 (CABS1). SEQSIM：一种用于比较启动子区域的新型生物信息学工具——以钙结合蛋白精细胞相关蛋白1 （CABS1）为例。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-09 DOI: 10.1186/s12859-025-06160-x

Joy Ramielle L Santos, Weijie Sun, A Dean Befus, Marcelo Marcet-Palacios

{"title":"SEQSIM: A novel bioinformatics tool for comparisons of promoter regions-a case study of calcium binding protein spermatid associated 1 (CABS1).","authors":"Joy Ramielle L Santos, Weijie Sun, A Dean Befus, Marcelo Marcet-Palacios","doi":"10.1186/s12859-025-06160-x","DOIUrl":"10.1186/s12859-025-06160-x","url":null,"abstract":"Background: Understanding transcriptional regulation requires an in-depth analysis of promoter regions, which house vital cis-regulatory elements such as core promoters, enhancers, and silencers. Despite the significance of these regions, genome-wide characterization remains a challenge due to data complexity and computational constraints. Traditional bioinformatics tools like Clustal Omega face limitations in handling extensive datasets, impeding comprehensive analysis. To bridge this gap, we developed SEQSIM, a sequence comparison tool leveraging an optimized Needleman-Wunsch algorithm for high-speed comparisons. SEQSIM can analyze complete human promoter datasets in under an hour, overcoming prior computational barriers.Results: Applying SEQSIM, we conducted a case study on CABS1, a gene associated with spermatogenesis and stress response but lacking well-defined functions. Our genome-wide promoter analysis revealed 41 distinct homology clusters, with CABS1 residing within a cluster that includes promoters of genes such as VWCE, SPOCK1, and TMX2. These associations suggest potential co-regulatory networks. Additionally, our findings unveiled conserved promoter motifs and long-range regulatory sequences, including LINE-1 transposable element fragments shared by CABS1 and nearby genes, implying evolutionary conservation and regulatory significance.Conclusions: These results provide insight into potential gene regulation mechanisms, enhancing our understanding of transcriptional control and suggesting new pathways for functional exploration. Future studies incorporating SEQSIM could elucidate co-regulatory networks and chromatin interactions that impact gene expression.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"156"},"PeriodicalIF":2.9,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150522/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144257284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bigpicc: a graph-based approach to identifying carcinogenic gene combinations from mutation data. Bigpicc：从突变数据中识别致癌基因组合的基于图的方法。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-07 DOI: 10.1186/s12859-025-06043-1

Vladyslav Oles, Sajal Dash, Ramu Anandakrishnan

{"title":"Bigpicc: a graph-based approach to identifying carcinogenic gene combinations from mutation data.","authors":"Vladyslav Oles, Sajal Dash, Ramu Anandakrishnan","doi":"10.1186/s12859-025-06043-1","DOIUrl":"10.1186/s12859-025-06043-1","url":null,"abstract":"Genome data from cancer patients represents relationships between the presence of a gene mutation and cancer occurrence in a patient. Different types of cancer in human are thought to be caused by combinations of two to nine gene mutations. Identifying these combinations through traditional exhaustive search requires the amount of computation that scales exponentially with the combination size and in most cases is intractable even for cutting-edge supercomputers. We propose a parameter-free heuristic approach that leverages the intrinsic topology of gene-patient mutations to identify carcinogenic combinations. The biological relevance of the identified combinations is measured by using them to predict the presence of tumor in previously unseen samples. The resulting classifiers for 16 cancer types perform on par with exhaustive search results, and score the average of 80.1% sensitivity and 91.6% specificity for the best choice of hit range per cancer type. Our approach is able to find higher-hit carcinogenic combinations targeting which would take years of computations using exhaustive search.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"155"},"PeriodicalIF":2.9,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12144835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144246251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

'gitana' (phyloGenetic Imaging Tool for Adjusting Nodes and other Arrangements), a tool for plotting phylogenetic trees into ready-to-publish figures. “gitana”（调整节点和其他安排的系统发育成像工具），一个将系统发育树绘制成即将发布的数据的工具。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-05 DOI: 10.1186/s12859-025-06178-1

Cristina Galisteo, Rafael R de la Haba

{"title":"'gitana' (phyloGenetic Imaging Tool for Adjusting Nodes and other Arrangements), a tool for plotting phylogenetic trees into ready-to-publish figures.","authors":"Cristina Galisteo, Rafael R de la Haba","doi":"10.1186/s12859-025-06178-1","DOIUrl":"10.1186/s12859-025-06178-1","url":null,"abstract":"Background: Phylogenetic trees are essential diagrams used in different sciences, such as evolutionary biology or taxonomy, and they depict the relationships between a given set of taxa sharing a common ancestor. So far, a multitude of tools have already been developed to infer phylogeny, and even more to visualize the resulting trees. However, editing generated graphical plots to obtain ready-to-publish figures is still a major issue. Most available tools do not take into consideration important aspects in nomenclature, such as the use of italics for taxon names or the superscript T that must be displayed after the strain/specimen designation to denote the type strain/specimen, at least not automatically. A gap also exists to easily highlight tree branches conserved across different phylogenies containing the same taxa. The lack of available tools to achieve these tasks is challenging for scientists, since manual formatting of phylogenetic trees is very time-consuming.Results: Here, we present a tool named 'gitana', running in Linux/Windows/Mac operating systems with R software installed. It creates ready-to-publish trees with formatting taxon nomenclature and editing options such as rerooting, clade highlighting or collapsing, among other features. Moreover, 'gitana' performs node comparisons among phylogenies comprising the same taxa to identify conserved branches.Conclusions: 'gitana' is a user-friendly tool to output high-quality and ready-to-publish phylogenetic trees for users without R-coding skills. It combines dedicated functions of popular R packages for phylogeny and graphical visualization into an easy one-line-command. The users' manual and source code are freely available at https://github.com/cristinagalisteo/gitana .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"154"},"PeriodicalIF":2.9,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12142833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144233073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HLN-DDI: hierarchical molecular representation learning with co-attention mechanism for drug-drug interaction prediction. HLN-DDI：基于共注意机制的分层分子表示学习用于药物-药物相互作用预测。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-04 DOI: 10.1186/s12859-025-06157-6

Yue Luo, Lei Deng, Zhijian Huang

{"title":"HLN-DDI: hierarchical molecular representation learning with co-attention mechanism for drug-drug interaction prediction.","authors":"Yue Luo, Lei Deng, Zhijian Huang","doi":"10.1186/s12859-025-06157-6","DOIUrl":"10.1186/s12859-025-06157-6","url":null,"abstract":"Background: Accurate identification of drug-drug interactions (DDIs) is critical in pharmacology, as DDIs can either enhance therapeutic efficacy or trigger adverse reactions when multiple medications are administered concurrently. Traditional methods for identifying DDIs are labor-intensive and time-consuming, prompting the development of computational alternatives. However, existing computational approaches frequently encounter challenges related to interpretability and struggle to effectively capture the complex, multi-level structures inherent in drug molecules. Specifically, they often fail to adequately analyze substructural components and neglect interactions across hierarchical structural levels, resulting in incomplete molecular representations.Results: In this study, we propose a Hierarchical Learning Network with a co-attention mechanism tailored to molecular structure representation for predicting DDIs, named HLN-DDI. The proposed method advances existing approaches by explicitly encoding motif-level structures and capturing hierarchical molecular representations at atom-level, motif-level, and whole-molecule scales. These hierarchical representations are integrated using a co-attention mechanism and combined with interaction-type information to enhance predictive performance. Comprehensive evaluations demonstrate that HLN-DDI significantly outperforms state-of-the-art methods across multiple benchmark datasets, achieving over 98% accuracy under transductive scenarios and surpassing 99% on various evaluation metrics. Moreover, HLN-DDI achieves a notable accuracy improvement of 2.75% in predicting DDIs involving unseen drugs. Practical assessments with real-world DDI scenarios further validate the efficacy and utility of our proposed model.Conclusion: By leveraging hierarchical molecular structures and employing a co-attention mechanism to effectively integrate multi-level representations, HLN-DDI generates comprehensive and precise drug representations, leading to substantially improved predictions of potential drug-drug interactions.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"152"},"PeriodicalIF":2.9,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12135231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144224124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GNNMutation: a heterogeneous graph-based framework for cancer detection. GNNMutation：一个基于异构图的癌症检测框架。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-06-04 DOI: 10.1186/s12859-025-06133-0

Nuriye Özlem Özcan Şimşek, Arzucan Özgür, Fikret Gürgen

{"title":"GNNMutation: a heterogeneous graph-based framework for cancer detection.","authors":"Nuriye Özlem Özcan Şimşek, Arzucan Özgür, Fikret Gürgen","doi":"10.1186/s12859-025-06133-0","DOIUrl":"10.1186/s12859-025-06133-0","url":null,"abstract":"Background: When genes are translated into proteins, mutations in the gene sequence can lead to changes in protein structure and function as well as in the interactions between proteins. These changes can disrupt cell function and contribute to the development of tumors. In this study, we introduce a novel approach based on graph neural networks that jointly considers genetic mutations and protein interactions for cancer prediction. We use DNA mutations in whole exome sequencing data and construct a heterogeneous graph in which patients and proteins are represented as nodes and protein-protein interactions as edges. Furthermore, patient nodes are connected to protein nodes based on mutations in the patient's DNA. Each patient node is represented by a feature vector derived from the mutations in specific genes. The feature values are calculated using a weighting scheme inspired by information retrieval, where whole genomes are treated as documents and mutations as words within these documents. The weighting of each gene, determined by its mutations, reflects its contribution to disease development. The patient nodes are updated by both mutations and protein interactions within our noval heterogeneous graph structure. Since the effects of each mutation on disease development are different, we processed the input graph with attention-based graph neural networks.Results: We compiled a dataset from the UKBiobank consisting of patients with a cancer diagnosis as the case group and those without a cancer diagnosis as the control group. We evaluated our approach for the four most common cancer types, which are breast, prostate, lung and colon cancer, and showed that the proposed framework effectively discriminates between case and control groups.Conclusions: The results indicate that our proposed graph structure and node updating strategy improve cancer classification performance. Additionally, we extended our system with an explainer that identifies a list of causal genes which are effective in the model's cancer diagnosis predictions. Notably, some of these genes have already been studied in cancer research, demonstrating the system's ability to recognize causal genes for the selected cancer types and make predictions based on them.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"153"},"PeriodicalIF":2.9,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12139269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144224123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0