Qingzhou Shi;Kai Zheng;Haoyuan Li;Bo Wang;Xiao Liang;Xinyu Li;Jianxin Wang
{"title":"LKLPDA: A Low-Rank Fast Kernel Learning Approach for Predicting piRNA-Disease Associations","authors":"Qingzhou Shi;Kai Zheng;Haoyuan Li;Bo Wang;Xiao Liang;Xinyu Li;Jianxin Wang","doi":"10.1109/TCBB.2024.3452055","DOIUrl":"10.1109/TCBB.2024.3452055","url":null,"abstract":"Piwi-interacting RNAs (piRNAs) are increasingly recognized as potential biomarkers for various diseases. Investig-ating the complex relationship between piRNAs and diseases through computational methods can reduce the costs and risks associated with biological experiments. Fast kernel learning (FKL) is a classical method for multi-source data fusion that is widely employed in association prediction research. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper the effectiveness of the network-based ideal kernel. The conventional FKL method does not address this issue. In this study, we propose a low-rank fast kernel learning (LRFKL) algorithm, which consists of low-rank representation (LRR) and the FKL algorithm. The LRFKL algorithm is designed to mitigate the effects of noise on the network-based ideal kernel. Using LRFKL, we propose a novel approach for predicting piRNA-disease associations called LKLPDA. Specifically, we first compute the similarity matrices for piRNAs and diseases. Then we use the LRFKL to fuse the similarity matrices for piRNAs and diseases separately. Finally, the LKLPDA employs AutoGluon-Tabular for predictive analysis. Computational results show that LKLPDA effectively predicts piRNA-disease associations with higher accuracy compared to previous methods. In addition, case studies confirm the reliability of the model in predicting piRNA-disease associations.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2179-2187"},"PeriodicalIF":3.6,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142106990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MMD-DTA: A Multi-Modal Deep Learning Framework for Drug-Target Binding Affinity and Binding Region Prediction","authors":"Qi Zhang;Yuxiao Wei;Bo Liao;Liwei Liu;Shengli Zhang","doi":"10.1109/TCBB.2024.3451985","DOIUrl":"10.1109/TCBB.2024.3451985","url":null,"abstract":"The prediction of drug-target affinity (DTA) plays a crucial role in drug development and the identification of potential drug targets. In recent years, computer-assisted DTA prediction has emerged as a significant approach in this field. In this study, we propose a multi-modal deep learning framework called MMD-DTA for predicting drug-target binding affinity and binding regions. The model can predict DTA while simultaneously learning the binding regions of drug-target interactions through unsupervised learning. To achieve this, MMD-DTA first uses graph neural networks and target structural feature extraction network to extract multi-modal information from the sequences and structures of drugs and targets. It then utilizes the feature interaction and fusion modules to generate interaction descriptors for predicting DTA and interaction strength for binding region prediction. Our experimental results demonstrate that MMD-DTA outperforms existing models based on key evaluation metrics. Furthermore, external validation results indicate that MMD-DTA enhances the generalization capability of the model by integrating sequence and structural information of drugs and targets. The model trained on the benchmark dataset can effectively generalize to independent virtual screening tasks. The visualization of drug-target binding region prediction showcases the interpretability of MMD-DTA, providing valuable insights into the functional regions of drug molecules that interact with proteins.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2200-2211"},"PeriodicalIF":3.6,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142106991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chu-Ting Yu, Bo Tian, Qian-Qian Meng, Zhe-Ran Chen, Ya-Nan Pang, Xun Zhang, Yan Bian, Si-Wei Zhou, Mei-Juan Hao, Ye Gao, Lei Xin, Han Lin, Wei Wang, Luo-Wei Wang
{"title":"Development and Validation of a Comprehensive Analysis of the Competing Endogenous circRNA/miRNA/mRNA Network for the Identification of Immune-Related Targets in Esophageal Squamous Cell Carcinoma.","authors":"Chu-Ting Yu, Bo Tian, Qian-Qian Meng, Zhe-Ran Chen, Ya-Nan Pang, Xun Zhang, Yan Bian, Si-Wei Zhou, Mei-Juan Hao, Ye Gao, Lei Xin, Han Lin, Wei Wang, Luo-Wei Wang","doi":"10.1109/TCBB.2024.3443854","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3443854","url":null,"abstract":"<p><p>Immunotherapy for esophageal squamous cell carcinoma (ESCC) exhibits notable variability in efficacy. Concurrently, recent research emphasizes circRNAs' impact on the ESCC tumor microenvironment. To further explore the relationship, we leveraged circRNA, microRNA, and mRNA sequence datasets to construct a comprehensive immune-related circRNA-microRNA-mRNA network, revealing competing endogenous RNA (ceRNA) roles in ESCC. The network comprises 16 circular RNAs, 13 microRNAs, and 1,560 mRNAs. Weighted gene co-expression analysis identified immune-related modules, notably cancer-associated fibroblast (CAF) and myeloid-derived suppressor cell modules, correlating significantly with immune and stemness scores. Among them, the CAF module plays a crucial role in extracellular matrix function and effectively discriminates ESCC patients. Four hub collagen family genes within CAF correlated robustly with CAF, macrophage infiltration, and T-cell exclusion. In-house sequencing and RT-qPCR validated their elevated expression. We also identified CAF module-targeting drugs as potential ESCC treatments. In summary, we established an immune-related circRNA-miRNA-mRNA network that not only illuminates ceRNA functionality but also highlights circRNAs' involvement in the CAF through collagen gene targeting. These findings hold promise to predict ESCC immune landscapes and therapy responses, ultimately aiding in more personalized and effective clinical decision-making.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142106989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huiwei Zhou;Wenchu Li;Weihong Yao;Yingyu Lin;Lei Du
{"title":"Contrasting Multi-Source Temporal Knowledge Graphs for Biomedical Hypothesis Generation","authors":"Huiwei Zhou;Wenchu Li;Weihong Yao;Yingyu Lin;Lei Du","doi":"10.1109/TCBB.2024.3451051","DOIUrl":"10.1109/TCBB.2024.3451051","url":null,"abstract":"Hypothesis Generation (HG) aims to expedite biomedical researches by generating novel hypotheses from existing scientific literature. Most existing studies focused on modeling static snapshots of the corpus, neglecting the temporal evolution of scientific terms. Despite recent efforts to learn term evolution from Knowledge Bases (KBs) for HG, the temporal information from multi-source KBs is still overlooked, which contains important, up-to-date knowledge. In this paper, an innovative Temporal Contrastive Learning (TCL) framework is introduced to uncover latent associations between entities by jointly modeling their co-evolution across multi-source temporal KBs. Specifically, we first construct a temporal relation graph based on PubMed papers and a biomedical relation database (such as Comparative Toxicogenomics Database (CTD)). Then the constructed temporal relation graph and a temporal concept graph (such as Medical Subject Headings (MeSH)) are used to train two GCN-based recurrent networks for learning the entity temporal evolutional embeddings, respectively. Finally, a cross-view temporal prediction task is designed for learning knowledge enriched temporal embeddings by contrasting the temporal embeddings learned from the two Temporal Knowledge Graphs (TKGs). Findings from experiments conducted on three real-world biomedical term relationship datasets demonstrate that the proposed approach is clearly superior to approaches based on single TKG, achieving the state-of-the-art performance.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2102-2112"},"PeriodicalIF":3.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142086095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compact Class-Conditional Attribute Category Clustering: Amino Acid Grouping for Enhanced HIV-1 Protease Cleavage Classification","authors":"José A. Sáez;J. Fernando Vera","doi":"10.1109/TCBB.2024.3448617","DOIUrl":"10.1109/TCBB.2024.3448617","url":null,"abstract":"Categorical attributes are common in many classification tasks, presenting certain challenges as the number of categories grows. This situation can affect data handling, negatively impacting the building time of models, their complexity and, ultimately, their classification performance. In order to mitigate these issues, this research proposes a novel preprocessing technique for grouping attribute categories in classification datasets. This approach combines the exact representation of the association between categorical values in a Euclidean space, clustering methods and attribute quality metrics to group similar attribute categories based on their contribution to the classification task. To estimate its effectiveness, the proposal is evaluated within the context of HIV-1 protease cleavage site prediction, where each attribute represents an amino acid that can take multiple possible values. The results obtained on HIV-1 real-world datasets show a significant reduction in the number of categories per attribute, with an average reduction percentage ranging from 74% to 81%. This reduction leads to simplified data representations and improved classification performances compared to not preprocessing. Specifically, improvements of up to 0.07 in accuracy and 0.19 in geometric mean are observed across different datasets and classification algorithms. Additionally, extensive simulations on synthetic datasets with varied characteristics are carried out, providing consistent and reliable results that validate the robustness of the proposal. These findings highlight the capability of the developed method to enhance cleavage prediction, which could potentially contribute to understanding viral processes and developing targeted therapeutic strategies.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2167-2178"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10645313","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142043936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NeoMS: Mass Spectrometry-based Method for Uncovering Mutated MHC-I Neoantigens.","authors":"Shaokai Wang, Ming Zhu, Bin Ma","doi":"10.1109/TCBB.2024.3447746","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3447746","url":null,"abstract":"<p><p>Major Histocompatibility Complex (MHC) molecules play a critical role in the immune system by presenting peptides on the cell surface for recognition by T-cells. Tumor cells often produce MHC peptides with amino acid mutations, known as neoantigens, which evade T-cell recognition, leading to rapid tumor growth. In immunotherapies such as TCR-T and CAR-T, identifying these mutated MHC peptide sequences is crucial. Current mass spectrometry-based peptide identification methods primarily rely on database searching, which fails to detect mutated peptides not present in human databases. In this paper, we propose a novel workflow called NeoMS, designed to efficiently identify both non-mutated and mutated MHC-I peptides from mass spectrometry data. NeoMS utilizes a tagging algorithm to generate an expanded sequence database that includes potential mutated proteins for each sample. Furthermore, it employs a machine learning-based scoring function for each peptide-spectrum match (PSM) to maximize search sensitivity. Finally, a rigorous target-decoy approach is implemented to control the false discovery rates (FDR) of the peptides with and without mutations separately. Experimental results for regular peptides demonstrate that NeoMS outperforms four benchmark methods. For mutated peptides, NeoMS successfully identifies hundreds of high-quality mutated peptides in a melanoma-associated sample, with their validity confirmed by further studies.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryota Ido;Shengjuan Cao;Jianshen Zhu;Naveed Ahmed Azam;Kazuya Haraguchi;Liang Zhao;Hiroshi Nagamochi;Tatsuya Akutsu
{"title":"A Method for Inferring Polymers Based on Linear Regression and Integer Programming","authors":"Ryota Ido;Shengjuan Cao;Jianshen Zhu;Naveed Ahmed Azam;Kazuya Haraguchi;Liang Zhao;Hiroshi Nagamochi;Tatsuya Akutsu","doi":"10.1109/TCBB.2024.3447780","DOIUrl":"10.1109/TCBB.2024.3447780","url":null,"abstract":"A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property using both artificial neural networks and mixed integer linear programming. In this paper, we design a new method for inferring a polymer based on the framework. For this, we introduce a new way of representing a polymer as a form of monomer and define new descriptors that feature the structure of polymers. We also use linear regression as a building block of constructing a prediction function in the framework. The results of our computational experiments reveal a set of chemical properties on polymers to which a prediction function constructed with linear regression performs well. We also observe that the proposed method can infer polymers with up to 50 non-hydrogen atoms in a monomer form.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1623-1632"},"PeriodicalIF":3.6,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Wang;Maoyuan Ma;Yanxin Xie;Qinke Peng;Hongqiang Lyu;Hequan Sun;Laiyi Fu
{"title":"KGRACDA: A Model Based on Knowledge Graph from Recursion and Attention Aggregation for CircRNA-Disease Association Prediction","authors":"Ying Wang;Maoyuan Ma;Yanxin Xie;Qinke Peng;Hongqiang Lyu;Hequan Sun;Laiyi Fu","doi":"10.1109/TCBB.2024.3447110","DOIUrl":"10.1109/TCBB.2024.3447110","url":null,"abstract":"CircRNA is closely related to human disease, so it is important to predict circRNA-disease association (CDA). However, the traditional biological detection methods have high difficulty and low accuracy, and computational methods represented by deep learning ignore the ability of the model to explicitly extract local depth information of the CDA. We propose a model based on knowledge graph from recursion and attention aggregation for circRNA-disease association prediction (KGRACDA). This model combines explicit structural features and implicit embedding information of graphs, optimizing graph embedding vectors. First, we built large-scale, multi-source heterogeneous datasets and construct a knowledge graph of multiple RNAs and diseases. After that, we use a recursive method to build multi-hop subgraphs and optimize graph attention mechanism by gating mechanism, mining local depth information. At the same time, the model uses multi-head attention mechanism to balance global and local depth features of graphs, and generate CDA prediction scores. KGRACDA surpasses other methods by capturing local and global depth information related to CDA. We update an interactive web platform HNRBase v2.0, which visualizes circRNA data, and allows users to download data and predict CDA using model.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2133-2144"},"PeriodicalIF":3.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142017376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xindi Yu;Shusen Zhou;Mujun Zang;Qingjun Wang;Chanjuan Liu;Tong Liu
{"title":"Parallel Convolutional Contrastive Learning Method for Enzyme Function Prediction","authors":"Xindi Yu;Shusen Zhou;Mujun Zang;Qingjun Wang;Chanjuan Liu;Tong Liu","doi":"10.1109/TCBB.2024.3447037","DOIUrl":"10.1109/TCBB.2024.3447037","url":null,"abstract":"The function labeling of enzymes has a wide range of application value in the medical field, industrial biology and other fields. Scientists define enzyme categories by enzyme commission (EC) numbers. At present, although there are some tools for enzyme function prediction, their effects have not reached the application level. To improve the precision of enzyme function prediction, we propose a parallel convolutional contrastive learning (PCCL) method to predict enzyme functions. First, we use the advanced protein language model ESM-2 to preprocess the protein sequences. Second, PCCL combines convolutional neural networks (CNNs) and contrastive learning to improve the prediction precision of multifunctional enzymes. Contrastive learning can make the model better deal with the problem of class imbalance. Finally, the deep learning framework is mainly composed of three parallel CNNs for fully extracting sample features. we compare PCCL with state-of-art enzyme function prediction methods based on three evaluation metrics. The performance of our model improves on both two test sets. Especially on the smaller test set, PCCL improves the AUC by 2.57%.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2604-2609"},"PeriodicalIF":3.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142017377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yufei Li;Xiaoyong Ma;Xiangyu Zhou;Penghzhen Cheng;Kai He;Tieliang Gong;Chen Li
{"title":"Integrating K+ Entities Into Coreference Resolution on Biomedical Texts","authors":"Yufei Li;Xiaoyong Ma;Xiangyu Zhou;Penghzhen Cheng;Kai He;Tieliang Gong;Chen Li","doi":"10.1109/TCBB.2024.3447273","DOIUrl":"10.1109/TCBB.2024.3447273","url":null,"abstract":"Biomedical Coreference Resolution focuses on identifying the coreferences in biomedical texts, which normally consists of two parts: (i) mention detection to identify textual representation of biological entities and (ii) finding their coreference links. Recently, a popular approach to enhance the task is to embed knowledge base into deep neural networks. However, the way in which these methods integrate knowledge leads to the shortcoming that such knowledge may play a larger role in mention detection than coreference resolution. Specifically, they tend to integrate knowledge prior to mention detection, as part of the embeddings. Besides, they primarily focus on mention-dependent knowledge (KBase), i.e., knowledge entities directly related to mentions, while ignores the correlated knowledge (K+) between mentions in the mention-pair. For mentions with significant differences in word form, this may limit their ability to extract potential correlations between those mentions. Thus, this paper develops a novel model to integrate both KBase and K+ entities and achieves the state-of-the-art performance on BioNLP and CRAFT-CR datasets. Empirical studies on mention detection with different length reveals the effectiveness of the KBase entities. The evaluation on cross-sentence and match/mismatch coreference further demonstrate the superiority of the K+ entities in extracting background potential correlation between mentions.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2145-2155"},"PeriodicalIF":3.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142017375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}