{"title":"An End-to-End Knowledge Graph Fused Graph Neural Network for Accurate Protein-Protein Interactions Prediction","authors":"Jie Yang;Yapeng Li;Guoyin Wang;Zhong Chen;Di Wu","doi":"10.1109/TCBB.2024.3486216","DOIUrl":"10.1109/TCBB.2024.3486216","url":null,"abstract":"Protein-protein interactions (PPIs) are essential to understanding cellular mechanisms, signaling networks, disease processes, and drug development, as they represent the physical contacts and functional associations between proteins. Recent advances have witnessed the achievements of artificial intelligence (AI) methods aimed at predicting PPIs. However, these approaches often handle the intricate web of relationships and mechanisms among proteins, drugs, diseases, ribonucleic acid (RNA), and protein structures in a fragmented or superficial manner. This is typically due to the limitations of non-end-to-end learning frameworks, which can lead to sub-optimal feature extraction and fusion, thereby compromising the prediction accuracy. To address these deficiencies, this paper introduces a novel end-to-end learning model, the Knowledge Graph Fused Graph Neural Network (KGF-GNN). This model comprises three integral components: (1) \u0000<bold>Protein Associated Network (PAN) Construction</b>\u0000: We begin by constructing a PAN that extensively captures the diverse relationships and mechanisms linking proteins with drugs, diseases, RNA, and protein structures. (2) \u0000<bold>Graph Neural Network for Feature Extraction</b>\u0000: A Graph Neural Network (GNN) is then employed to distill both topological and semantic features from the PAN, alongside another GNN designed to extract topological features directly from observed PPI networks. (3) \u0000<bold>Multi-layer Perceptron for Feature Fusion</b>\u0000: Finally, a multi-layer perceptron integrates these varied features through end-to-end learning, ensuring that the feature extraction and fusion processes are both comprehensive and optimized for PPI prediction. Extensive experiments conducted on real-world PPI datasets validate the effectiveness of our proposed KGF-GNN approach, which not only achieves high accuracy in predicting PPIs but also significantly surpasses existing state-of-the-art models. This work not only enhances our ability to predict PPIs with a higher precision but also contributes to the broader application of AI in Bioinformatics, offering profound implications for biological research and therapeutic development.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2518-2530"},"PeriodicalIF":3.6,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142499542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Cattelani;Arindam Ghosh;Teemu J. Rintala;Vittorio Fortino
{"title":"A Comprehensive Evaluation Framework for Benchmarking Multi-Objective Feature Selection in Omics-Based Biomarker Discovery","authors":"Luca Cattelani;Arindam Ghosh;Teemu J. Rintala;Vittorio Fortino","doi":"10.1109/TCBB.2024.3480150","DOIUrl":"10.1109/TCBB.2024.3480150","url":null,"abstract":"Machine learning algorithms have been extensively used for accurate classification of cancer subtypes driven by gene expression-based biomarkers. However, biomarker models combining multiple gene expression signatures are often not reproducible in external validation datasets and their feature set size is often not optimized, jeopardizing their translatability into cost-effective clinical tools. We investigated how to solve the multi-objective problem of finding the best trade-offs between classification performance and set size applying seven algorithms for machine learning-driven feature subset selection and analyse how they perform in a benchmark with eight large-scale transcriptome datasets of cancer, covering both training and external validation sets. The benchmark includes evaluation metrics assessing the performance of the individual biomarkers and the solution sets, according to their accuracy, diversity, and stability of the composing genes. Moreover, a new evaluation metric for cross-validation studies is proposed that generalizes the hypervolume, which is commonly used to assess the performance of multi-objective optimization algorithms. Biomarkers exhibiting 0.8 of balanced accuracy on the external dataset for breast, kidney and ovarian cancer using respectively 4, 2 and 7 features, were obtained. Genetic algorithms often provided better performance than other considered algorithms, and the recently proposed NSGA2-CH and NSGA2-CHS were the best performing methods in most cases.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2432-2446"},"PeriodicalIF":3.6,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10716353","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangfang Su;Chong Teng;Fei Li;Bobo Li;Jun Zhou;Donghong Ji
{"title":"Generative Biomedical Event Extraction With Constrained Decoding Strategy","authors":"Fangfang Su;Chong Teng;Fei Li;Bobo Li;Jun Zhou;Donghong Ji","doi":"10.1109/TCBB.2024.3480088","DOIUrl":"10.1109/TCBB.2024.3480088","url":null,"abstract":"Currently, biomedical event extraction has received considerable attention in various fields, including natural language processing, bioinformatics, and computational biomedicine. This has led to the emergence of numerous machine learning and deep learning models that have been proposed and applied to tackle this complex task. While existing models typically adopt an extraction-based approach, which requires breaking down the extraction of biomedical events into multiple subtasks for sequential processing, making it prone to cascading errors. This paper presents a novel approach by constructing a biomedical event generation model based on the framework of the pre-trained language model \u0000<italic>T5</i>\u0000. We employ a sequence-to-sequence generation paradigm to obtain events, the model utilizes constrained decoding algorithm to guide sequence generation, and a curriculum learning algorithm for efficient model learning. To demonstrate the effectiveness of our model, we evaluate it on two public benchmark datasets, Genia 2011 and Genia 2013. Our model achieves superior performance, illustrating the effectiveness of generative modeling of biomedical events.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2471-2484"},"PeriodicalIF":3.6,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Guest Editors' Introduction to the Special Section on Bioinformatics Research and Applications","authors":"Zhipeng Cai;Alexander Zelikovsky","doi":"10.1109/TCBB.2024.3390374","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3390374","url":null,"abstract":"","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 5","pages":"1141-1142"},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10712175","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"De Novo Drug Design by Multi-Objective Path Consistency Learning With Beam A* Search","authors":"Dengwei Zhao;Jingyuan Zhou;Shikui Tu;Lei Xu","doi":"10.1109/TCBB.2024.3477592","DOIUrl":"10.1109/TCBB.2024.3477592","url":null,"abstract":"Generating high-quality and drug-like molecules from scratch within the expansive chemical space presents a significant challenge in the field of drug discovery. In prior research, value-based reinforcement learning algorithms have been employed to generate molecules with multiple desired properties iteratively. The immediate reward was defined as the evaluation of intermediate-state molecules at each step, and the learning objective would be maximizing the expected cumulative evaluation scores for all molecules along the generative path. However, this definition of the reward was misleading, as in reality, the optimization target should be the evaluation score of only the final generated molecule. Furthermore, in previous works, randomness was introduced into the decision-making process, enabling the generation of diverse molecules but no longer pursuing the maximum future rewards. In this paper, immediate reward is defined as the improvement achieved through the modification of the molecule to maximize the evaluation score of the final generated molecule exclusively. Originating from the A\u0000<inline-formula><tex-math>$^*$</tex-math></inline-formula>\u0000 search, path consistency (PC), i.e., \u0000<inline-formula><tex-math>$f$</tex-math></inline-formula>\u0000 values on one optimal path should be identical, is employed as the objective function in the update of the \u0000<inline-formula><tex-math>$f$</tex-math></inline-formula>\u0000 value estimator to train a multi-objective \u0000<i>de novo</i>\u0000 drug designer. By incorporating the \u0000<inline-formula><tex-math>$f$</tex-math></inline-formula>\u0000 value into the decision-making process of beam search, the DrugBA\u0000<inline-formula><tex-math>$^*$</tex-math></inline-formula>\u0000 algorithm is proposed to enable the large-scale generation of molecules that exhibit both high quality and diversity. Experimental results demonstrate a substantial enhancement over the state-of-the-art algorithm QADD in multiple molecular properties of the generated molecules.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2459-2470"},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangwen Wang;Qiaoying Jin;Li Zou;Xianghong Lin;Yonggang Lu
{"title":"Orientation Determination of Cryo-EM Projection Images Using Reliable Common Lines and Spherical Embeddings","authors":"Xiangwen Wang;Qiaoying Jin;Li Zou;Xianghong Lin;Yonggang Lu","doi":"10.1109/TCBB.2024.3476619","DOIUrl":"10.1109/TCBB.2024.3476619","url":null,"abstract":"Three-dimensional (3D) reconstruction in single-particle cryo-electron microscopy (cryo-EM) is a critical technique for recovering and studying the fine 3D structure of proteins and other biological macromolecules, where the primary issue is to determine the orientations of projection images with high levels of noise. This paper proposes a method to determine the orientations of cryo-EM projection images using reliable common lines and spherical embeddings. First, the reliability of common lines between projection images is evaluated using a weighted voting algorithm based on an iterative improvement technique and binarized weighting. Then, the reliable common lines are used to calculate the normal vectors and local \u0000<inline-formula><tex-math>$X$</tex-math></inline-formula>\u0000-axis vectors of projection images after two spherical embeddings. Finally, the orientations of projection images are determined by aligning the results of the two spherical embeddings using an orthogonal constraint. Experimental results on both synthetic and real cryo-EM projection image datasets demonstrate that the proposed method can achieve higher accuracy in estimating the orientations of projection images and higher resolution in reconstructing preliminary 3D structures than some common line-based methods, indicating that the proposed method is effective in single-particle cryo-EM 3D reconstruction.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2496-2509"},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Guest Editorial Selected Papers From BIOKDD 2022","authors":"Da Yan;Catia Pesquita;Carsten Görg;Jake Y. Chen","doi":"10.1109/TCBB.2024.3429784","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3429784","url":null,"abstract":"","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 5","pages":"1165-1167"},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10712183","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Knowledge Graph-Based Method for Drug-Drug Interaction Prediction With Contrastive Learning","authors":"Jian Zhong;Haochen Zhao;Qichang Zhao;Jianxin Wang","doi":"10.1109/TCBB.2024.3477410","DOIUrl":"10.1109/TCBB.2024.3477410","url":null,"abstract":"Precisely predicting Drug-Drug Interactions (DDIs) carries the potential to elevate the quality and safety of drug therapies, protecting the well-being of patients, and providing essential guidance and decision support at every stage of the drug development process. In recent years, leveraging large-scale biomedical knowledge graphs has improved DDI prediction performance. However, the feature extraction procedures in these methods are still rough. More refined features may further improve the quality of predictions. To overcome these limitations, we develop a knowledge graph-based method for multi-typed DDI prediction with contrastive learning (KG-CLDDI). In KG-CLDDI, we combine drug knowledge aggregation features from the knowledge graph with drug topological aggregation features from the DDI graph. Additionally, we build a contrastive learning module that uses horizontal reversal and dropout operations to produce high-quality embeddings for drug-drug pairs. The comparison results indicate that KG-CLDDI is superior to state-of-the-art models in both the transductive and inductive settings. Notably, for the inductive setting, KG-CLDDI outperforms the previous best method by 17.49% and 24.97% in terms of AUC and AUPR, respectively. Furthermore, we conduct the ablation analysis and case study to show the effectiveness of KG-CLDDI. These findings illustrate the potential significance of KG-CLDDI in advancing DDI research and its clinical applications.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2485-2495"},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kiefer Andre Bedoya Benites;Wilser Andrés García-Quispes
{"title":"RFLP-Inator: Interactive Web Platform for In Silico Simulation and Complementary Tools of the PCR-RFLP Technique","authors":"Kiefer Andre Bedoya Benites;Wilser Andrés García-Quispes","doi":"10.1109/TCBB.2024.3476453","DOIUrl":"10.1109/TCBB.2024.3476453","url":null,"abstract":"Polymerase chain reaction - Restriction Fragment Length Polymorphism (PCR-RFLP) is an established molecular biology technique leveraging DNA sequence variability for organism identification, genetic disease detection, biodiversity analysis, etc. Traditional PCR-RFLP requires wet-laboratory procedures that can result in technical errors, procedural challenges, and financial costs. With the aim of providing an accessible and efficient PCR-RFLP technique complement, we introduce RFLP-inator. This is a comprehensive web-based platform developed in R using the package Shiny, which simulates the PCR-RFLP technique, integrates analysis capabilities, and offers complementary tools for both pre- and post-evaluation of in vitro results. We developed the RFLP-inator's algorithm independently and our platform offers seven dynamic tools: RFLP simulator, Pattern identifier, Enzyme selector, RFLP analyzer, Multiplex PCR, Restriction map maker, and Gel plotter. Moreover, the software includes a restriction pattern database of more than 250,000 sequences of the bacterial 16S rRNA gene. We successfully validated the core tools against published research findings. This new platform is open access and user-friendly, offering a valuable resource for researchers, educators, and students specializing in molecular genetics. RFLP-inator not only streamlines RFLP technique application but also supports pedagogical efforts in genetics, illustrating its utility and reliability.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2510-2517"},"PeriodicalIF":3.6,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10709661","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reza Mazloom;N. Tessa Pierce-Ward;Parul Sharma;Leighton Pritchard;C. Titus Brown;Boris A. Vinatzer;Lenwood S. Heath
{"title":"LINgroups as a Robust Principled Approach to Compare and Integrate Multiple Bacterial Taxonomies","authors":"Reza Mazloom;N. Tessa Pierce-Ward;Parul Sharma;Leighton Pritchard;C. Titus Brown;Boris A. Vinatzer;Lenwood S. Heath","doi":"10.1109/TCBB.2024.3475917","DOIUrl":"10.1109/TCBB.2024.3475917","url":null,"abstract":"As a central organizing principle of biology, bacteria and archaea are classified into a hierarchical structure across taxonomic ranks from kingdom to subspecies. Traditionally, this organization was based on observable characteristics of form and chemistry but recently, bacterial taxonomy has been robustly quantified using comparisons of sequenced genomes, as exemplified in the Genome Taxonomy Database (GTDB). Such genome-based taxonomies resolve genomes down to genera and species and are useful in many contexts yet lack the flexibility and resolution of a fine-grained approach. The Life Identification Number (LIN) approach is a common, quantitative framework to tie existing (and future) bacterial taxonomies together, increase the resolution of genome-based discrimination of taxa, and extend taxonomic identification below the species level in a principled way. Utilizing LINgroup as an organizational concept helps resolve some of the confusion and unforeseen negative effects resulting from nomenclature changes of microorganisms that are closely related by overall genomic similarity (often due to genome-based reclassification). Our experimental results demonstrate the value of LINs and LINgroups in mapping between taxonomies, translating between different nomenclatures, and integrating them into a single taxonomic framework. They also reveal the robustness of LIN assignment to hyper-parameter changes when considering within-species taxonomic groups.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2304-2314"},"PeriodicalIF":3.6,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}