IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献_第8页

KGRLFF: Detecting Drug-Drug Interactions Based on Knowledge Graph Representation Learning and Feature Fusion KGRLFF：基于知识图谱表示学习和特征融合的药物相互作用检测。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-29 DOI: 10.1109/TCBB.2024.3434992

Xiaoli Lin;Zhuang Yin;Xiaolong Zhang;Jing Hu

{"title":"KGRLFF: Detecting Drug-Drug Interactions Based on Knowledge Graph Representation Learning and Feature Fusion","authors":"Xiaoli Lin;Zhuang Yin;Xiaolong Zhang;Jing Hu","doi":"10.1109/TCBB.2024.3434992","DOIUrl":"10.1109/TCBB.2024.3434992","url":null,"abstract":"Accurate prediction of drug-drug interactions (DDIs) plays an important role in improving the efficiency of drug development and ensuring the safety of combination therapy. Most existing models rely on a single source of information to predict DDIs, and few models can perform tasks on biomedical knowledge graphs. This paper proposes a new hybrid method, namely Knowledge Graph Representation Learning and Feature Fusion (KGRLFF), to fully exploit the information from the biomedical knowledge graph and molecular structure of drugs to better predict DDIs. KGRLFF first uses a Bidirectional Random Walk sampling method based on the PageRank algorithm (BRWP) to obtain higher-order neighborhood information of drugs in the knowledge graph, including neighboring nodes, semantic relations, and higher-order information associated with triple facts. Then, an embedded representation learning model named Knowledge Graph-based Cyclic Recursive Aggregation (KGCRA) is used to learn the embedded representations of drugs by recursively propagating and aggregating messages with drugs as both the source and destination. In addition, the model learns the molecular structures of the drugs to obtain the structured features. Finally, a Feature Representation Fusion Strategy (FRFS) was developed to integrate embedded representations and structured feature representations. Experimental results showed that KGRLFF is feasible for predicting potential DDIs.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2035-2049"},"PeriodicalIF":3.6,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10613488","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141792317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HGLA: Biomolecular Interaction Prediction Based on Mixed High-Order Graph Convolution With Filter Network via LSTM and Channel Attention HGLA：通过 LSTM 和通道注意，基于混合高阶图卷积与滤波网络的生物分子相互作用预测。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-26 DOI: 10.1109/TCBB.2024.3434399

Zhen Zhang;Zhaohong Deng;Ruibo Li;Wei Zhang;Qiongdan Lou;Kup-Sze Choi;Shitong Wang

{"title":"HGLA: Biomolecular Interaction Prediction Based on Mixed High-Order Graph Convolution With Filter Network via LSTM and Channel Attention","authors":"Zhen Zhang;Zhaohong Deng;Ruibo Li;Wei Zhang;Qiongdan Lou;Kup-Sze Choi;Shitong Wang","doi":"10.1109/TCBB.2024.3434399","DOIUrl":"10.1109/TCBB.2024.3434399","url":null,"abstract":"Predicting biomolecular interactions is significant for understanding biological systems. Most existing methods for link prediction are based on graph convolution. Although graph convolution methods are advantageous in extracting structure information of biomolecular interactions, two key challenges still remain. One is how to consider both the immediate and high-order neighbors. Another is how to reduce noise when aggregating high-order neighbors. To address these challenges, we propose a novel method, called mixed high-order graph convolution with filter network via LSTM and channel attention (HGLA), to predict biomolecular interactions. Firstly, the basic and high-order features are extracted respectively through the traditional graph convolutional network (GCN) and the two-layer Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing (MixHop). Secondly, these features are mixed and input into the filter network composed of LayerNorm, SENet and LSTM to generate filtered features, which are concatenated and used for link prediction. The advantages of HGLA are: 1) HGLA processes high-order features separately, rather than simply concatenating them; 2) HGLA better balances the basic features and high-order features; 3) HGLA effectively filters the noise from high-order neighbors. It outperforms state-of-the-art networks on four benchmark datasets.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2011-2024"},"PeriodicalIF":3.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine Learning-Assisted High-Throughput Screening for Anti-MRSA Compounds 机器学习辅助高通量筛选抗 MRSA 化合物。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-26 DOI: 10.1109/TCBB.2024.3434340

Fadi Shehadeh;LewisOscar Felix;Markos Kalligeros;Adnan Shehadeh;Beth Burgwyn Fuchs;Frederick M. Ausubel;Paul P. Sotiriadis;Eleftherios Mylonakis

{"title":"Machine Learning-Assisted High-Throughput Screening for Anti-MRSA Compounds","authors":"Fadi Shehadeh;LewisOscar Felix;Markos Kalligeros;Adnan Shehadeh;Beth Burgwyn Fuchs;Frederick M. Ausubel;Paul P. Sotiriadis;Eleftherios Mylonakis","doi":"10.1109/TCBB.2024.3434340","DOIUrl":"10.1109/TCBB.2024.3434340","url":null,"abstract":"Background: Antimicrobial resistance is a major public health threat, and new agents are needed. Computational approaches have been proposed to reduce the cost and time needed for compound screening. Aims: A machine learning (ML) model was developed for the \u0000<italic>in silico\u0000 screening of low molecular weight molecules. Methods: We used the results of a high-throughput \u0000<italic>Caenorhabditis elegans\u0000 methicillin-resistant \u0000<italic>Staphylococcus aureus\u0000 (MRSA) liquid infection assay to develop ML models for compound prioritization and quality control. Results: The compound prioritization model achieved an AUC of 0.795 with a sensitivity of 81% and a specificity of 70%. When applied to a validation set of 22,768 compounds, the model identified 81% of the active compounds identified by high-throughput screening (HTS) among only 30.6% of the total 22,768 compounds, resulting in a 2.67-fold increase in hit rate. When we retrained the model on all the compounds of the HTS dataset, it further identified 45 discordant molecules classified as non-hits by the HTS, with 42/45 (93%) having known antimicrobial activity. Conclusion: Our ML approach can be used to increase HTS efficiency by reducing the number of compounds that need to be physically screened and identifying potential missed hits, making HTS more accessible and reducing barriers to entry.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1911-1921"},"PeriodicalIF":3.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules With Desirable Properties 两级扩散和优化多种特性：生成具有理想特性的分子的新方法。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-26 DOI: 10.1109/TCBB.2024.3434461

Siyuan Guo;Jihong Guan;Shuigeng Zhou

{"title":"Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules With Desirable Properties","authors":"Siyuan Guo;Jihong Guan;Shuigeng Zhou","doi":"10.1109/TCBB.2024.3434461","DOIUrl":"10.1109/TCBB.2024.3434461","url":null,"abstract":"In the past decade, Artificial Intelligence (AI) driven drug design and discovery has been a hot research topic in the AI area, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue mainly the basic properties like \u0000<italic>validity\u0000 and \u0000<italic>uniqueness\u0000 of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g. QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g. pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel \u0000<italic>electronic effect\u0000 based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250 k show that the molecules generated by our proposed method have better \u0000<italic>validity, uniqueness, novelty, Fréchet ChemNet Distance (FCD), QED, and PlogP\u0000 than those generated by current SOTA models.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2050-2063"},"PeriodicalIF":3.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MLRR-ATV: A Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization for scRNA-seq Data Clustering. MLRR-ATV：用于 scRNA-seq 数据聚类的具有自适应总变异正则化功能的稳健歧面非负低方根表示。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-24 DOI: 10.1109/TCBB.2024.3432740

Gao-Fei Wang, Juan Wang, Shasha Yuan, Chun-Hou Zheng, Jin-Xing Liu

{"title":"MLRR-ATV: A Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization for scRNA-seq Data Clustering.","authors":"Gao-Fei Wang, Juan Wang, Shasha Yuan, Chun-Hou Zheng, Jin-Xing Liu","doi":"10.1109/TCBB.2024.3432740","DOIUrl":"10.1109/TCBB.2024.3432740","url":null,"abstract":"Since genomics was proposed, the exploration of genes has been the focus of research. The emergence of single-cell RNA sequencing (scRNA-seq) technology makes it possible to explore gene expression at the single-cell level. Due to the limitations of sequencing technology, the data contains a lot of noise. At the same time, it also has the characteristics of highdimensional and sparse. Clustering is a common method of analyzing scRNA-seq data. This paper proposes a novel singlecell clustering method called Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization (MLRR-ATV). The Adaptive Total-Variation (ATV) regularization is introduced into Low-Rank Representation (LRR) model to reduce the influence of noise through gradient learning. Then, the linear and nonlinear manifold structures in the data are learned through Euclidean distance and cosine similarity, and more valuable information is retained. Because the model is non-convex, we use the Alternating Direction Method of Multipliers (ADMM) to optimize the model. We tested the performance of the MLRRATV model on eight real scRNA-seq datasets and selected nine state-of-the-art methods as comparison methods. The experimental results show that the performance of the MLRRATV model is better than the other nine methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141758476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

dwMLCS: An Efficient MLCS Algorithm Based on Dynamic and Weighted Directed Acyclic Graph dwMLCS：基于动态加权有向无环图的高效 MLCS 算法。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-22 DOI: 10.1109/TCBB.2024.3431558

Changyong Yu;Dekuan Gao;Xu Guo;Haitao Ma;Yuhai Zhao;Guoren Wang

{"title":"dwMLCS: An Efficient MLCS Algorithm Based on Dynamic and Weighted Directed Acyclic Graph","authors":"Changyong Yu;Dekuan Gao;Xu Guo;Haitao Ma;Yuhai Zhao;Guoren Wang","doi":"10.1109/TCBB.2024.3431558","DOIUrl":"10.1109/TCBB.2024.3431558","url":null,"abstract":"The problem of finding the longest common subsequence (MLCS) for multiple sequences is a computationally intensive and challenging problem that has significant applications in various fields such as text comparison, pattern recognition, and gene diagnosis. Currently, the dominant point-based MLCS algorithms have become popular and extensively studied. Generally, they construct the directed acyclic graph (DAG) of matching points and convert the MLCS problem into a search for the longest paths in the DAG. Several improvements have been made, focusing on decreasing model size and reducing redundant computations. These include 1) hash methods for eliminating duplicated nodes, 2) dynamic structures for supporting smaller DAG and 3) path pruning strategy and so on. However, the algorithms are still too limited when facing large-scale MLCS problem due to 1) the dynamic structures are too time-consuming to maintain and 2) the path pruning relies heavily on the tightness of the lower and upper bound of the MLCS. These factors contribute to the large-scale MLCS problem remaining a challenge. We propose a novel algorithm for the large-scale MLCS problem, named dwMLCS. It is based on two models: one is a dynamic DAG model which is both space and time efficient. It can decrease the size of the DAG significantly. The other is a weighted DAG model with new successor strategies. With this model, we design the algorithm for finding a tighter lower bound of the MLCS. Then, the path pruning is conducted to further reduce the size of the DAG and eliminate redundant computation. Additionally, we propose an upper bound method for improving the efficiency of the path pruning strategy. The experimental results demonstrate that the effectiveness and efficiency of the models and algorithms proposed are better than state-of-the-art algorithms.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1987-1999"},"PeriodicalIF":3.6,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141748107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generative Adversarial Network-Based Augmentation With Noval 2-Step Authentication for Anti-Coronavirus Peptide Prediction 基于生成式对抗网络和 Noval 两步验证的抗冠状病毒多肽预测。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-22 DOI: 10.1109/TCBB.2024.3431688

Aditya Kumar;Deepak Singh

{"title":"Generative Adversarial Network-Based Augmentation With Noval 2-Step Authentication for Anti-Coronavirus Peptide Prediction","authors":"Aditya Kumar;Deepak Singh","doi":"10.1109/TCBB.2024.3431688","DOIUrl":"10.1109/TCBB.2024.3431688","url":null,"abstract":"The virus poses a longstanding and enduring danger to various forms of life. Despite the ongoing endeavors to combat viral diseases, there exists a necessity to explore and develop novel therapeutic options. Antiviral peptides are bioactive molecules with a favorable toxicity profile, making them promising alternatives for viral infection treatment. Therefore, this article employed a generative adversarial network for antiviral peptide augmentation and a novel two-step authentication process for augmented synthetic peptides to enhance antiviral activity prediction. Additionally, five widely utilized deep learning models were employed for classification purposes. Initially, a GAN was used to augment the antiviral peptide. In a two-step authentication process, the NCBI-BLAST was utilized to identify the antiviral activity resemblance between the synthetic and real peptide. Subsequently, the hydrophobicity, hydrophilicity, hydroxylic nature, positive charge, and negative charge of synthetic and authentic antiviral peptides were compared before their utilization. Later, to examine the impact of authenticated peptide augmentation in the prediction of antiviral peptides, a comparison is conducted with the outcomes of non-peptide augmented prediction. The study demonstrates that the 1-D convolution neural network with augmented peptide exhibits superior performance compared to other employed classifiers and state-of-the-art models. The network attains a mean classification accuracy of 95.41%, an AUC value of 0.95, and an MCC value of 0.90 on the benchmark antiviral and anti-corona peptides dataset. Thus, the performance of the proposed model indicates its efficacy in predicting the antiviral activity of peptides.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1942-1954"},"PeriodicalIF":3.6,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141748108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dopcc: Detecting Overlapping Protein Complexes via Multi-Metrics and Co-Core Attachment Method Dopcc：通过多指标和共核附着法检测重叠蛋白质复合物。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-17 DOI: 10.1109/TCBB.2024.3429546

Wenkang Wang;Xiangmao Meng;Ju Xiang;Hayat Dino Bedru;Min Li

{"title":"Dopcc: Detecting Overlapping Protein Complexes via Multi-Metrics and Co-Core Attachment Method","authors":"Wenkang Wang;Xiangmao Meng;Ju Xiang;Hayat Dino Bedru;Min Li","doi":"10.1109/TCBB.2024.3429546","DOIUrl":"10.1109/TCBB.2024.3429546","url":null,"abstract":"Identification of protein complex is an important issue in the field of system biology, which is crucial to understanding the cellular organization and inferring protein functions. Recently, many computational methods have been proposed to detect protein complexes from protein-protein interaction (PPI) networks. However, most of these methods only focus on local information of proteins in the PPI network, which are easily affected by the noise in the PPI network. Meanwhile, it's still challenging to detect protein complexes, especially for overlapping cases. To address these issues, we propose a new method, named Dopcc, to detect overlapping protein complexes by constructing a multi-metrics network according to different strategies. First, we adopt the Jaccard coefficient to measure the neighbor similarity between proteins and denoise the PPI network. Then, we propose a new strategy, integrating hierarchical compressing with network embedding, to capture the high-order structural similarity between proteins. Further, a new co-core attachment strategy is proposed to detect overlapping protein complexes from multi-metrics. The experimental results show that our proposed method, Dopcc, outperforms the other eight state-of-the-art methods in terms of F-measure, MMR, and Composite Score on two yeast datasets.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2000-2010"},"PeriodicalIF":3.6,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141633413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Generalizability in Biomedical Entity Recognition: Self-Attention PCA-CLS Model 增强生物医学实体识别的通用性：自我关注 PCA-CLS 模型。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-16 DOI: 10.1109/TCBB.2024.3429234

Rajesh Kumar Mundotiya;Juhi Priya;Divya Kuwarbi;Teekam Singh

{"title":"Enhancing Generalizability in Biomedical Entity Recognition: Self-Attention PCA-CLS Model","authors":"Rajesh Kumar Mundotiya;Juhi Priya;Divya Kuwarbi;Teekam Singh","doi":"10.1109/TCBB.2024.3429234","DOIUrl":"10.1109/TCBB.2024.3429234","url":null,"abstract":"One of the primary tasks in the early stages of data mining involves the identification of entities from biomedical corpora. Traditional approaches relying on robust feature engineering face challenges when learning from available (un-)annotated data using data-driven models like deep learning-based architectures. Despite leveraging large corpora and advanced deep learning models, domain generalization remains an issue. Attention mechanisms are effective in capturing longer sentence dependencies and extracting semantic and syntactic information from limited annotated datasets. To address out-of-vocabulary challenges in biomedical text, the PCA-CLS (Position and Contextual Attention with CNN-LSTM-Softmax) model combines global self-attention and character-level convolutional neural network techniques. The model's performance is evaluated on eight distinct biomedical domain datasets encompassing entities such as genes, drugs, diseases, and species. The PCA-CLS model outperforms several state-of-the-art models, achieving notable F\u0000<inline-formula><tex-math>$_{1}$</tex-math></inline-formula>\u0000-scores, including 88.19% on BC2GM, 85.44% on JNLPBA, 90.80% on BC5CDR-chemical, 87.07% on BC5CDR-disease, 89.18% on BC4CHEMD, 88.81% on NCBI, and 91.59% on the s800 dataset.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1934-1941"},"PeriodicalIF":3.6,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141626660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Employing Machine Learning Techniques to Detect Protein Function: A Survey, Experimental, and Empirical Evaluations 利用机器学习技术检测蛋白质功能：调查、实验和经验评估。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-15 DOI: 10.1109/TCBB.2024.3427381

Kamal Taha

{"title":"Employing Machine Learning Techniques to Detect Protein Function: A Survey, Experimental, and Empirical Evaluations","authors":"Kamal Taha","doi":"10.1109/TCBB.2024.3427381","DOIUrl":"10.1109/TCBB.2024.3427381","url":null,"abstract":"This review article delves deeply into the various machine learning (ML) methods and algorithms employed in discerning protein functions. Each method discussed is assessed for its efficacy, limitations, potential improvements, and future prospects. We present an innovative hierarchical classification system that arranges algorithms into intricate categories and unique techniques. This taxonomy is based on a tri-level hierarchy, starting with the methodology category and narrowing down to specific techniques. Such a framework allows for a structured and comprehensive classification of algorithms, assisting researchers in understanding the interrelationships among diverse algorithms and techniques. The study incorporates both empirical and experimental evaluations to differentiate between the techniques. The empirical evaluation ranks the techniques based on four criteria. The experimental assessments rank: (1) individual techniques under the same methodology sub-category, (2) different sub-categories within the same category, and (3) the broad categories themselves. Integrating the innovative methodological classification, empirical findings, and experimental assessments, the article offers a well-rounded understanding of ML strategies in protein function identification. The paper also explores techniques for multi-task and multi-label detection of protein functions, in addition to focusing on single-task methods. Moreover, the paper sheds light on the future avenues of ML in protein function determination.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1965-1986"},"PeriodicalIF":3.6,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141619844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0