Zhen Zhang;Zhaohong Deng;Ruibo Li;Wei Zhang;Qiongdan Lou;Kup-Sze Choi;Shitong Wang
{"title":"HGLA: Biomolecular Interaction Prediction Based on Mixed High-Order Graph Convolution With Filter Network via LSTM and Channel Attention","authors":"Zhen Zhang;Zhaohong Deng;Ruibo Li;Wei Zhang;Qiongdan Lou;Kup-Sze Choi;Shitong Wang","doi":"10.1109/TCBB.2024.3434399","DOIUrl":"10.1109/TCBB.2024.3434399","url":null,"abstract":"Predicting biomolecular interactions is significant for understanding biological systems. Most existing methods for link prediction are based on graph convolution. Although graph convolution methods are advantageous in extracting structure information of biomolecular interactions, two key challenges still remain. One is how to consider both the immediate and high-order neighbors. Another is how to reduce noise when aggregating high-order neighbors. To address these challenges, we propose a novel method, called mixed high-order graph convolution with filter network via LSTM and channel attention (HGLA), to predict biomolecular interactions. Firstly, the basic and high-order features are extracted respectively through the traditional graph convolutional network (GCN) and the two-layer Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing (MixHop). Secondly, these features are mixed and input into the filter network composed of LayerNorm, SENet and LSTM to generate filtered features, which are concatenated and used for link prediction. The advantages of HGLA are: 1) HGLA processes high-order features separately, rather than simply concatenating them; 2) HGLA better balances the basic features and high-order features; 3) HGLA effectively filters the noise from high-order neighbors. It outperforms state-of-the-art networks on four benchmark datasets.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2011-2024"},"PeriodicalIF":3.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fadi Shehadeh;LewisOscar Felix;Markos Kalligeros;Adnan Shehadeh;Beth Burgwyn Fuchs;Frederick M. Ausubel;Paul P. Sotiriadis;Eleftherios Mylonakis
{"title":"Machine Learning-Assisted High-Throughput Screening for Anti-MRSA Compounds","authors":"Fadi Shehadeh;LewisOscar Felix;Markos Kalligeros;Adnan Shehadeh;Beth Burgwyn Fuchs;Frederick M. Ausubel;Paul P. Sotiriadis;Eleftherios Mylonakis","doi":"10.1109/TCBB.2024.3434340","DOIUrl":"10.1109/TCBB.2024.3434340","url":null,"abstract":"Background: Antimicrobial resistance is a major public health threat, and new agents are needed. Computational approaches have been proposed to reduce the cost and time needed for compound screening. Aims: A machine learning (ML) model was developed for the \u0000<italic>in silico</i>\u0000 screening of low molecular weight molecules. Methods: We used the results of a high-throughput \u0000<italic>Caenorhabditis elegans</i>\u0000 methicillin-resistant \u0000<italic>Staphylococcus aureus</i>\u0000 (MRSA) liquid infection assay to develop ML models for compound prioritization and quality control. Results: The compound prioritization model achieved an AUC of 0.795 with a sensitivity of 81% and a specificity of 70%. When applied to a validation set of 22,768 compounds, the model identified 81% of the active compounds identified by high-throughput screening (HTS) among only 30.6% of the total 22,768 compounds, resulting in a 2.67-fold increase in hit rate. When we retrained the model on all the compounds of the HTS dataset, it further identified 45 discordant molecules classified as non-hits by the HTS, with 42/45 (93%) having known antimicrobial activity. Conclusion: Our ML approach can be used to increase HTS efficiency by reducing the number of compounds that need to be physically screened and identifying potential missed hits, making HTS more accessible and reducing barriers to entry.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1911-1921"},"PeriodicalIF":3.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules With Desirable Properties","authors":"Siyuan Guo;Jihong Guan;Shuigeng Zhou","doi":"10.1109/TCBB.2024.3434461","DOIUrl":"10.1109/TCBB.2024.3434461","url":null,"abstract":"In the past decade, Artificial Intelligence (AI) driven drug design and discovery has been a hot research topic in the AI area, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue mainly the basic properties like \u0000<italic>validity</i>\u0000 and \u0000<italic>uniqueness</i>\u0000 of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g. QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g. pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel \u0000<italic>electronic effect</i>\u0000 based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250 k show that the molecules generated by our proposed method have better \u0000<italic>validity, uniqueness, novelty, Fréchet ChemNet Distance (FCD), QED, and PlogP</i>\u0000 than those generated by current SOTA models.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2050-2063"},"PeriodicalIF":3.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gao-Fei Wang, Juan Wang, Shasha Yuan, Chun-Hou Zheng, Jin-Xing Liu
{"title":"MLRR-ATV: A Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization for scRNA-seq Data Clustering.","authors":"Gao-Fei Wang, Juan Wang, Shasha Yuan, Chun-Hou Zheng, Jin-Xing Liu","doi":"10.1109/TCBB.2024.3432740","DOIUrl":"10.1109/TCBB.2024.3432740","url":null,"abstract":"<p><p>Since genomics was proposed, the exploration of genes has been the focus of research. The emergence of single-cell RNA sequencing (scRNA-seq) technology makes it possible to explore gene expression at the single-cell level. Due to the limitations of sequencing technology, the data contains a lot of noise. At the same time, it also has the characteristics of highdimensional and sparse. Clustering is a common method of analyzing scRNA-seq data. This paper proposes a novel singlecell clustering method called Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization (MLRR-ATV). The Adaptive Total-Variation (ATV) regularization is introduced into Low-Rank Representation (LRR) model to reduce the influence of noise through gradient learning. Then, the linear and nonlinear manifold structures in the data are learned through Euclidean distance and cosine similarity, and more valuable information is retained. Because the model is non-convex, we use the Alternating Direction Method of Multipliers (ADMM) to optimize the model. We tested the performance of the MLRRATV model on eight real scRNA-seq datasets and selected nine state-of-the-art methods as comparison methods. The experimental results show that the performance of the MLRRATV model is better than the other nine methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141758476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changyong Yu;Dekuan Gao;Xu Guo;Haitao Ma;Yuhai Zhao;Guoren Wang
{"title":"dwMLCS: An Efficient MLCS Algorithm Based on Dynamic and Weighted Directed Acyclic Graph","authors":"Changyong Yu;Dekuan Gao;Xu Guo;Haitao Ma;Yuhai Zhao;Guoren Wang","doi":"10.1109/TCBB.2024.3431558","DOIUrl":"10.1109/TCBB.2024.3431558","url":null,"abstract":"The problem of finding the longest common subsequence (MLCS) for multiple sequences is a computationally intensive and challenging problem that has significant applications in various fields such as text comparison, pattern recognition, and gene diagnosis. Currently, the dominant point-based MLCS algorithms have become popular and extensively studied. Generally, they construct the directed acyclic graph (DAG) of matching points and convert the MLCS problem into a search for the longest paths in the DAG. Several improvements have been made, focusing on decreasing model size and reducing redundant computations. These include 1) hash methods for eliminating duplicated nodes, 2) dynamic structures for supporting smaller DAG and 3) path pruning strategy and so on. However, the algorithms are still too limited when facing large-scale MLCS problem due to 1) the dynamic structures are too time-consuming to maintain and 2) the path pruning relies heavily on the tightness of the lower and upper bound of the MLCS. These factors contribute to the large-scale MLCS problem remaining a challenge. We propose a novel algorithm for the large-scale MLCS problem, named dwMLCS. It is based on two models: one is a dynamic DAG model which is both space and time efficient. It can decrease the size of the DAG significantly. The other is a weighted DAG model with new successor strategies. With this model, we design the algorithm for finding a tighter lower bound of the MLCS. Then, the path pruning is conducted to further reduce the size of the DAG and eliminate redundant computation. Additionally, we propose an upper bound method for improving the efficiency of the path pruning strategy. The experimental results demonstrate that the effectiveness and efficiency of the models and algorithms proposed are better than state-of-the-art algorithms.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1987-1999"},"PeriodicalIF":3.6,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141748107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative Adversarial Network-Based Augmentation With Noval 2-Step Authentication for Anti-Coronavirus Peptide Prediction","authors":"Aditya Kumar;Deepak Singh","doi":"10.1109/TCBB.2024.3431688","DOIUrl":"10.1109/TCBB.2024.3431688","url":null,"abstract":"The virus poses a longstanding and enduring danger to various forms of life. Despite the ongoing endeavors to combat viral diseases, there exists a necessity to explore and develop novel therapeutic options. Antiviral peptides are bioactive molecules with a favorable toxicity profile, making them promising alternatives for viral infection treatment. Therefore, this article employed a generative adversarial network for antiviral peptide augmentation and a novel two-step authentication process for augmented synthetic peptides to enhance antiviral activity prediction. Additionally, five widely utilized deep learning models were employed for classification purposes. Initially, a GAN was used to augment the antiviral peptide. In a two-step authentication process, the NCBI-BLAST was utilized to identify the antiviral activity resemblance between the synthetic and real peptide. Subsequently, the hydrophobicity, hydrophilicity, hydroxylic nature, positive charge, and negative charge of synthetic and authentic antiviral peptides were compared before their utilization. Later, to examine the impact of authenticated peptide augmentation in the prediction of antiviral peptides, a comparison is conducted with the outcomes of non-peptide augmented prediction. The study demonstrates that the 1-D convolution neural network with augmented peptide exhibits superior performance compared to other employed classifiers and state-of-the-art models. The network attains a mean classification accuracy of 95.41%, an AUC value of 0.95, and an MCC value of 0.90 on the benchmark antiviral and anti-corona peptides dataset. Thus, the performance of the proposed model indicates its efficacy in predicting the antiviral activity of peptides.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1942-1954"},"PeriodicalIF":3.6,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141748108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenkang Wang;Xiangmao Meng;Ju Xiang;Hayat Dino Bedru;Min Li
{"title":"Dopcc: Detecting Overlapping Protein Complexes via Multi-Metrics and Co-Core Attachment Method","authors":"Wenkang Wang;Xiangmao Meng;Ju Xiang;Hayat Dino Bedru;Min Li","doi":"10.1109/TCBB.2024.3429546","DOIUrl":"10.1109/TCBB.2024.3429546","url":null,"abstract":"Identification of protein complex is an important issue in the field of system biology, which is crucial to understanding the cellular organization and inferring protein functions. Recently, many computational methods have been proposed to detect protein complexes from protein-protein interaction (PPI) networks. However, most of these methods only focus on local information of proteins in the PPI network, which are easily affected by the noise in the PPI network. Meanwhile, it's still challenging to detect protein complexes, especially for overlapping cases. To address these issues, we propose a new method, named Dopcc, to detect overlapping protein complexes by constructing a multi-metrics network according to different strategies. First, we adopt the Jaccard coefficient to measure the neighbor similarity between proteins and denoise the PPI network. Then, we propose a new strategy, integrating hierarchical compressing with network embedding, to capture the high-order structural similarity between proteins. Further, a new co-core attachment strategy is proposed to detect overlapping protein complexes from multi-metrics. The experimental results show that our proposed method, Dopcc, outperforms the other eight state-of-the-art methods in terms of F-measure, MMR, and Composite Score on two yeast datasets.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2000-2010"},"PeriodicalIF":3.6,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141633413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Generalizability in Biomedical Entity Recognition: Self-Attention PCA-CLS Model","authors":"Rajesh Kumar Mundotiya;Juhi Priya;Divya Kuwarbi;Teekam Singh","doi":"10.1109/TCBB.2024.3429234","DOIUrl":"10.1109/TCBB.2024.3429234","url":null,"abstract":"One of the primary tasks in the early stages of data mining involves the identification of entities from biomedical corpora. Traditional approaches relying on robust feature engineering face challenges when learning from available (un-)annotated data using data-driven models like deep learning-based architectures. Despite leveraging large corpora and advanced deep learning models, domain generalization remains an issue. Attention mechanisms are effective in capturing longer sentence dependencies and extracting semantic and syntactic information from limited annotated datasets. To address out-of-vocabulary challenges in biomedical text, the PCA-CLS (Position and Contextual Attention with CNN-LSTM-Softmax) model combines global self-attention and character-level convolutional neural network techniques. The model's performance is evaluated on eight distinct biomedical domain datasets encompassing entities such as genes, drugs, diseases, and species. The PCA-CLS model outperforms several state-of-the-art models, achieving notable F\u0000<inline-formula><tex-math>$_{1}$</tex-math></inline-formula>\u0000-scores, including 88.19% on BC2GM, 85.44% on JNLPBA, 90.80% on BC5CDR-chemical, 87.07% on BC5CDR-disease, 89.18% on BC4CHEMD, 88.81% on NCBI, and 91.59% on the s800 dataset.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1934-1941"},"PeriodicalIF":3.6,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141626660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Employing Machine Learning Techniques to Detect Protein Function: A Survey, Experimental, and Empirical Evaluations","authors":"Kamal Taha","doi":"10.1109/TCBB.2024.3427381","DOIUrl":"10.1109/TCBB.2024.3427381","url":null,"abstract":"This review article delves deeply into the various machine learning (ML) methods and algorithms employed in discerning protein functions. Each method discussed is assessed for its efficacy, limitations, potential improvements, and future prospects. We present an innovative hierarchical classification system that arranges algorithms into intricate categories and unique techniques. This taxonomy is based on a tri-level hierarchy, starting with the methodology category and narrowing down to specific techniques. Such a framework allows for a structured and comprehensive classification of algorithms, assisting researchers in understanding the interrelationships among diverse algorithms and techniques. The study incorporates both empirical and experimental evaluations to differentiate between the techniques. The empirical evaluation ranks the techniques based on four criteria. The experimental assessments rank: (1) individual techniques under the same methodology sub-category, (2) different sub-categories within the same category, and (3) the broad categories themselves. Integrating the innovative methodological classification, empirical findings, and experimental assessments, the article offers a well-rounded understanding of ML strategies in protein function identification. The paper also explores techniques for multi-task and multi-label detection of protein functions, in addition to focusing on single-task methods. Moreover, the paper sheds light on the future avenues of ML in protein function determination.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1965-1986"},"PeriodicalIF":3.6,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141619844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suzanne W. Dietrich;Wenli Ma;Yian Ding;Karen H. Watanabe;Mary B. Zelinski;James P. Sluka
{"title":"MOTHER-DB: A Database for Sharing Nonhuman Ovarian Histology Images","authors":"Suzanne W. Dietrich;Wenli Ma;Yian Ding;Karen H. Watanabe;Mary B. Zelinski;James P. Sluka","doi":"10.1109/TCBB.2024.3426999","DOIUrl":"10.1109/TCBB.2024.3426999","url":null,"abstract":"The goal of the Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) project is to establish a collection of nonhuman ovary histology images for multiple species as a resource for researchers and educators. An important component of sharing scientific data is the inclusion of the contextual metadata that describes the data. MOTHER extends the Ecological Metadata Language (EML) for documenting research data, leveraging its data provenance and usage license with the inclusion of metadata for ovary histology images. The design of the MOTHER metadata includes information on the donor animal, including reproductive cycle status, the slide and its preparation. MOTHER also extends the ezEML tool, called ezEML+MOTHER, for the specification of the metadata. The design of the MOTHER database (MOTHER-DB) captures the metadata about the histology images, providing a searchable resource for discovering relevant images. MOTHER also defines a curation process for the ingestion of a collection of images and its metadata, verifying the validity of the metadata before its inclusion in the MOTHER collection. A Web search provides the ability to identify relevant images based on various characteristics in the metadata itself, such as genus and species, using filters.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2598-2603"},"PeriodicalIF":3.6,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141599244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}