IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献_第5页

Improving Antifreeze Proteins Prediction With Protein Language Models and Hybrid Feature Extraction Networks 利用蛋白质语言模型和混合特征提取网络改进抗冻蛋白预测。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-24 DOI: 10.1109/TCBB.2024.3467261

Jiashun Wu;Yan Liu;Yiheng Zhu;Dong-Jun Yu

{"title":"Improving Antifreeze Proteins Prediction With Protein Language Models and Hybrid Feature Extraction Networks","authors":"Jiashun Wu;Yan Liu;Yiheng Zhu;Dong-Jun Yu","doi":"10.1109/TCBB.2024.3467261","DOIUrl":"10.1109/TCBB.2024.3467261","url":null,"abstract":"Accurate identification of antifreeze proteins (AFPs) is crucial in developing biomimetic synthetic anti-icing materials and low-temperature organ preservation materials. Although numerous machine learning-based methods have been proposed for AFPs prediction, the complex and diverse nature of AFPs limits the prediction performance of existing methods. In this study, we propose AFP-Deep, a new deep learning method to predict antifreeze proteins by integrating embedding from protein sequences with pre-trained protein language models and evolutionary contexts with hybrid feature extraction networks. The experimental results demonstrated that the main advantage of AFP-Deep is its utilization of pre-trained protein language models, which can extract discriminative global contextual features from protein sequences. Additionally, the hybrid deep neural networks designed for protein language models and evolutionary context feature extraction enhance the correlation between embeddings and antifreeze pattern. The performance evaluation results show that AFP-Deep achieves superior performance compared to state-of-the-art models on benchmark datasets, achieving an AUPRC of 0.724 and 0.924, respectively.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2349-2358"},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GenoM7GNet: An Efficient N7-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model GenoM7GNet：基于核苷酸语言模型的高效 N7-甲基鸟苷位点预测方法

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-20 DOI: 10.1109/TCBB.2024.3459870

Chuang Li;Heshi Wang;Yanhua Wen;Rui Yin;Xiangxiang Zeng;Keqin Li

{"title":"GenoM7GNet: An Efficient N7-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model","authors":"Chuang Li;Heshi Wang;Yanhua Wen;Rui Yin;Xiangxiang Zeng;Keqin Li","doi":"10.1109/TCBB.2024.3459870","DOIUrl":"10.1109/TCBB.2024.3459870","url":null,"abstract":"N\u0000<inline-formula><tex-math>$^{7}$</tex-math></inline-formula>\u0000-methylguanosine (m7G), one of the mainstream post-transcriptional RNA modifications, occupies an exceedingly significant place in medical treatments. However, classic approaches for identifying m7G sites are costly both in time and equipment. Meanwhile, the existing machine learning methods extract limited hidden information from RNA sequences, thus making it difficult to improve the accuracy. Therefore, we put forward to a deep learning network, called “GenoM7GNet,” for m7G site identification. This model utilizes a Bidirectional Encoder Representation from Transformers (BERT) and is pretrained on nucleotide sequences data to capture hidden patterns from RNA sequences for m7G site prediction. Moreover, through detailed comparative experiments with various deep learning models, we discovered that the one-dimensional convolutional neural network (CNN) exhibits outstanding performance in sequence feature learning and classification. The proposed GenoM7GNet model achieved 0.953in accuracy, 0.932in sensitivity, 0.976in specificity, 0.907in Matthews Correlation Coefficient and 0.984in Area Under the receiver operating characteristic Curve on performance evaluation. Extensive experimental results further prove that our GenoM7GNet model markedly surpasses other state-of-the-art models in predicting m7G sites, exhibiting high computing performance.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2258-2268"},"PeriodicalIF":3.6,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142286167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IF 4.5 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-17 DOI: 10.1109/tcbb.2024.3462730

Mengzhen Li, Mustafa Coşkun, Mehmet Koyutürk

引用次数: 0

Accurate Flow Decomposition via Robust Integer Linear Programming 通过稳健整数线性规划实现精确流量分解

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-13 DOI: 10.1109/TCBB.2024.3433523

Fernando H. C. Dias;Alexandru I. Tomescu

{"title":"Accurate Flow Decomposition via Robust Integer Linear Programming","authors":"Fernando H. C. Dias;Alexandru I. Tomescu","doi":"10.1109/TCBB.2024.3433523","DOIUrl":"10.1109/TCBB.2024.3433523","url":null,"abstract":"Minimum flow decomposition (MFD) is a common problem across various fields of Computer Science, where a flow is decomposed into a minimum set of weighted paths. However, in Bioinformatics applications, such as RNA transcript or quasi-species assembly, the flow is erroneous since it is obtained from noisy read coverages. Typical generalizations of the MFD problem to handle errors are based on least-squares formulations or modelling the erroneous flow values as ranges. All of these are thus focused on error handling at the level of individual edges. In this paper, we interpret the flow decomposition problem as a robust optimization problem and lift error-handling from individual edges to \u0000<italic>solution paths</i>\u0000. As such, we introduce a new \u0000<italic>minimum path-error flow decomposition</i>\u0000 problem, for which we give an Integer Linear Programming formulation. Our experimental results reveal that our formulation can account for errors significantly better, by lowering the inaccuracy rate by 30–50% compared to previous error-handling formulations, with computational requirements that remain practical.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1955-1964"},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A New Graph Autoencoder-Based Multi-Level Kernel Subspace Fusion Framework for Single-Cell Type Identification 基于图自动编码器的单细胞类型识别多级核子空间融合新框架

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-12 DOI: 10.1109/TCBB.2024.3459960

Juan Wang;Tian-Jing Qiao;Chun-Hou Zheng;Jin-Xing Liu;Jun-Liang Shang

{"title":"A New Graph Autoencoder-Based Multi-Level Kernel Subspace Fusion Framework for Single-Cell Type Identification","authors":"Juan Wang;Tian-Jing Qiao;Chun-Hou Zheng;Jin-Xing Liu;Jun-Liang Shang","doi":"10.1109/TCBB.2024.3459960","DOIUrl":"10.1109/TCBB.2024.3459960","url":null,"abstract":"The advent of single-cell RNA sequencing (scRNA-seq) technology offers the opportunity to conduct biological research at the cellular level. Single-cell type identification based on unsupervised clustering is one of the fundamental tasks of scRNA-seq data analysis. Although many single-cell clustering methods have been developed recently, few can fully exploit the deep potential relationships between cells, resulting in suboptimal clustering. In this paper, we propose scGAMF, a graph autoencoder-based multi-level kernel subspace fusion framework for scRNA-seq data analysis. Based on multiple top feature sets, scGAMF unifies deep feature embedding and kernel space analysis into a single framework to learn an accurate clustering affinity matrix. First, we construct multiple top feature sets to avoid the high variability caused by single feature set learning. Second, scGAMF uses a graph autoencoder (GAEs) to extract deep information embedded in the data, and learn embeddings including gene expression patterns and cell-cell relationships. Third, to fully explore the deep potential relationships between cells, we design a multi-level kernel space fusion strategy. This strategy uses a kernel expression model with adaptive similarity preservation to learn a self-expression matrix shared by all embedding spaces of a given feature set, and a consensus affinity matrix across multiple top feature sets. Finally, the consensus affinity matrix is used for spectral clustering, visualization, and identification of gene markers. Extensive validation on real datasets shows that scGAMF achieves higher clustering accuracy than many popular single-cell analysis methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2292-2303"},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Multi-Encoder Semi-Implicit Graph Variational Autoencoder to Analyze Single-Cell RNA Sequencing Data 使用多编码器半隐式图变自动编码器分析单细胞 RNA 测序数据

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-10 DOI: 10.1109/TCBB.2024.3458170

Shengwen Tian;Cunmei Ji;Jiancheng Ni;Yutian Wang;Chunhou Zheng

{"title":"Using Multi-Encoder Semi-Implicit Graph Variational Autoencoder to Analyze Single-Cell RNA Sequencing Data","authors":"Shengwen Tian;Cunmei Ji;Jiancheng Ni;Yutian Wang;Chunhou Zheng","doi":"10.1109/TCBB.2024.3458170","DOIUrl":"10.1109/TCBB.2024.3458170","url":null,"abstract":"Rapid advances in single-cell RNA sequencing (scRNA-seq) have made it possible to characterize cell states at a high resolution view for large scale library. scRNA-seq data contains a great deal of biological information, which can be mainly used to discover cell subtypes and track cell development. However, traditional methods face many challenges in addressing scRNA-seq data with high dimensions and high sparsity. For better analysis of scRNA-seq data, we propose a new framework called MSVGAE based on variational graph auto-encoder and graph attention networks. Specifically, we introduce multiple encoders to learn features at different scales and control for uninformative features. Moreover, different noises are added to encoders to promote the propagation of graph structural information and distribution uncertainty. Therefore, some complex posterior distributions can be captured by our model. MSVGAE maps scRNA-seq data with high dimensions and high noise into the low-dimensional latent space, which is beneficial for downstream tasks. In particular, MSVGAE can handle extremely sparse data. Before the experiment, we create 24 simulated datasets to simulate various biological scenarios and collect 8 real-world datasets. The experimental results of clustering, visualization and marker genes analysis indicate that MSVGAE model has excellent accuracy and robustness in analyzing scRNA-seq data.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2280-2291"},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

APMG: 3D Molecule Generation Driven by Atomic Chemical Properties APMG：由原子化学性质驱动的三维分子生成

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-10 DOI: 10.1109/TCBB.2024.3457807

Yang Hua;Zhenhua Feng;Xiaoning Song;Hui Li;Tianyang Xu;Xiao-Jun Wu;Dong-Jun Yu

{"title":"APMG: 3D Molecule Generation Driven by Atomic Chemical Properties","authors":"Yang Hua;Zhenhua Feng;Xiaoning Song;Hui Li;Tianyang Xu;Xiao-Jun Wu;Dong-Jun Yu","doi":"10.1109/TCBB.2024.3457807","DOIUrl":"10.1109/TCBB.2024.3457807","url":null,"abstract":"Recently, mask-fill-based 3D Molecular Generation (MG) methods have become very popular in virtual drug design. However, the existing MG methods ignore the chemical properties of atoms and contain inappropriate atomic position training data, which limits their generation capability. To mitigate the above issues, this paper presents a novel mask-fill-based 3D molecule generation model driven by atomic chemical properties (APMG). Specifically, we construct a new attention-MPNN-based encoder and introduce the electronic information into atom representations to enrich chemical properties. Also, a multi-functional classifier is designed to predict the electronic information of each generated atom, guiding the type prediction of elements and bonds. By design, the proposed method uses the chemical properties of atoms and their correlations for high-quality molecule generation. Second, to optimize the atomic position training data, we propose a novel atomic training position generation approach using the Chi-Square distribution. We evaluate our APMG method on the CrossDocked dataset and visualize the docking states of the pockets and generated molecules. The obtained results demonstrate the superiority and merits of APMG over the state-of-the-art approaches.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2269-2279"},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining Zhegalkin Polynomials and SAT Solving for Context-Specific Boolean Modeling of Biological Systems 结合哲加金多项式和 SAT 求解，建立生物系统的特定语境布尔模型

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-10 DOI: 10.1109/TCBB.2024.3456302

Vincent Deman;Marine Ciantar;Laurent Naudin;Philippe Castera;Anne-Sophie Beignon

{"title":"Combining Zhegalkin Polynomials and SAT Solving for Context-Specific Boolean Modeling of Biological Systems","authors":"Vincent Deman;Marine Ciantar;Laurent Naudin;Philippe Castera;Anne-Sophie Beignon","doi":"10.1109/TCBB.2024.3456302","DOIUrl":"10.1109/TCBB.2024.3456302","url":null,"abstract":"Large amounts of knowledge regarding biological processes are readily available in the literature and aggregated in diverse databases. Boolean networks are powerful tools to render that knowledge into models that can mimic and simulate biological phenomena at multiple scales. Yet, when a model is required to understand or predict the behavior of a biological system in given conditions, existing information often does not completely match this context. Networks built from only prior knowledge can overlook mechanisms, lack specificity, and just partially recapitulate experimental observations. To address this limitation, context-specific data needs to be integrated. However, the brute-force identification of qualitative rules matching these data becomes infeasible as the number of candidates explodes for increasingly complex systems. Here, we used Zhegalkin polynomials to transform this identification into a binary value assignment for exponentially fewer variables, which we addressed with a state-of-the-art SAT solver. We evaluated our implemented method alongside two widely recognized tools, CellNetOptimizer and Caspo-ts, on both artificial toy models and large-scale models based on experimental data from the HPN-DREAM challenge. Our approach demonstrated benchmark-leading capabilities on networks of significant size and intricate complexity. It thus appears promising for the \u0000<italic>in silico</i>\u0000 modeling of ever more comprehensive biological systems.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2188-2199"},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10671585","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Automated Convergence Diagnostic for Phylogenetic MCMC Analyses 系统发育 MCMC 分析的自动收敛诊断方法

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-10 DOI: 10.1109/TCBB.2024.3457875

Lars Berling;Remco Bouckaert;Alex Gavryushkin

{"title":"An Automated Convergence Diagnostic for Phylogenetic MCMC Analyses","authors":"Lars Berling;Remco Bouckaert;Alex Gavryushkin","doi":"10.1109/TCBB.2024.3457875","DOIUrl":"10.1109/TCBB.2024.3457875","url":null,"abstract":"Assessing convergence of Markov chain Monte Carlo (MCMC) based analyses is crucial but challenging, especially so in high dimensional and complex spaces such as the space of phylogenetic trees (treespace). In practice, it is assumed that the target distribution is the unique stationary distribution of the MCMC and convergence is achieved when samples appear to be stationary. Here we leverage recent advances in computational geometry of the treespace and introduce a method that combines classical statistical techniques and algorithms with geometric properties of the treespace to automatically evaluate and assess practical convergence of phylogenetic MCMC analyses. Our method monitors convergence across multiple MCMC chains and achieves high accuracy in detecting both practical convergence and convergence issues within treespace. Furthermore, our approach is developed to allow for real-time evaluation during the MCMC algorithm run, eliminating any of the chain post-processing steps that are currently required. Our tool therefore improves reliability and efficiency of MCMC based phylogenetic inference methods and makes analyses easier to reproduce and compare. We demonstrate the efficacy of our diagnostic via a well-calibrated simulation study and provide examples of its performance on real data sets. Although our method performs well in practice, a significant part of the underlying treespace probability theory is still missing, which creates an excellent opportunity for future mathematical research in this area.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2246-2257"},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10675342","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bridging Between Deviation Indices for Non-Tree-Based Phylogenetic Networks 非基于树的系统发育网络偏差指数之间的衔接

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-09-09 DOI: 10.1109/TCBB.2024.3456575

Takatora Suzuki;Han Guo;Momoko Hayamizu

{"title":"Bridging Between Deviation Indices for Non-Tree-Based Phylogenetic Networks","authors":"Takatora Suzuki;Han Guo;Momoko Hayamizu","doi":"10.1109/TCBB.2024.3456575","DOIUrl":"10.1109/TCBB.2024.3456575","url":null,"abstract":"Phylogenetic networks are a useful model that can represent reticulate evolution and complex biological data. In recent years, mathematical and computational aspects of tree-based networks have been well studied. However, not all phylogenetic networks are tree-based, so it is meaningful to consider how close a given network is to being tree-based; Francis–Steel–Semple (2018) proposed several different indices to measure the degree of deviation of a phylogenetic network from being tree-based. One is the minimum number of leaves that need to be added to convert a given network to tree-based, and another is the number of vertices that are not included in the largest subtree covering its leaf-set. Both values are zero if and only if the network is tree-based. Both deviation indices can be computed efficiently, but the relationship between the above two is unknown, as each has been studied using different approaches. In this study, we derive a tight inequality for the values of the two measures and also give a characterisation of phylogenetic networks such that they coincide. This characterisation yields a new efficient algorithm for the Maximum Covering Subtree Problem based on the maximal zig-zag trail decomposition.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2226-2234"},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10670207","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142182857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0