Journal of Computational Biology最新文献_第8页

HTFSMMA: Higher-Order Topological Guided Small Molecule-MicroRNA Associations Prediction. HTFSMMA：高阶拓扑引导的小分子-microRNA 关联预测。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-09-01 Epub Date: 2024-08-07 DOI: 10.1089/cmb.2024.0587

Xiao-Yan Sun, Zhen-Jie Hou, Wen-Guang Zhang, Yan Chen, Hai-Bin Yao

{"title":"HTFSMMA: Higher-Order Topological Guided Small Molecule-MicroRNA Associations Prediction.","authors":"Xiao-Yan Sun, Zhen-Jie Hou, Wen-Guang Zhang, Yan Chen, Hai-Bin Yao","doi":"10.1089/cmb.2024.0587","DOIUrl":"10.1089/cmb.2024.0587","url":null,"abstract":"Small molecules (SMs) play a pivotal role in regulating microRNAs (miRNAs). Existing prediction methods for associations between SM-miRNA have overlooked crucial aspects: the incorporation of local topological features between nodes, which represent either SMs or miRNAs, and the effective fusion of node features with topological features. This study introduces a novel approach, termed high-order topological features for SM-miRNA association prediction (HTFSMMA), which specifically addresses these limitations. Initially, an association graph is formed by integrating SM-miRNA association data, SM similarity, and miRNA similarity. Subsequently, we focus on the local information of links and propose target neighborhood graph convolutional network for extracting local topological features. Then, HTFSMMA employs graph attention networks to amalgamate these local features, thereby establishing a platform for the acquisition of high-order features through random walks. Finally, the extracted features are integrated into the multilayer perceptron to derive the association prediction scores. To demonstrate the performance of HTFSMMA, we conducted comprehensive evaluations including five-fold cross-validation, leave-one-out cross-validation (LOOCV), SM-fixed local LOOCV, and miRNA-fixed local LOOCV. The area under receiver operating characteristic curve values were 0.9958 ± 0.0024 (0.8722 ± 0.0021), 0.9986 (0.9504), 0.9974 (0.9111), and 0.9977 (0.9074), respectively. Our findings demonstrate the superior performance of HTFSMMA over existing approaches. In addition, three case studies and the DeLong test have confirmed the effectiveness of the proposed method. These results collectively underscore the significance of HTFSMMA in facilitating the inference of associations between SMs and miRNAs.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"886-906"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141897568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nearly Instantaneous Time-Varying Reproduction Number for Contagious Diseases-a Direct Approach Based on Nonlinear Regression. 传染病的近瞬时时变繁殖数--基于非线性回归的直接方法

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-08-01 Epub Date: 2024-06-26 DOI: 10.1089/cmb.2023.0414

JūratĖ ŠaltytĖ Benth, Fred Espen Benth, Espen Rostrup Nakstad

{"title":"Nearly Instantaneous Time-Varying Reproduction Number for Contagious Diseases-a Direct Approach Based on Nonlinear Regression.","authors":"JūratĖ ŠaltytĖ Benth, Fred Espen Benth, Espen Rostrup Nakstad","doi":"10.1089/cmb.2023.0414","DOIUrl":"10.1089/cmb.2023.0414","url":null,"abstract":"While the world recovers from the COVID-19 pandemic, another outbreak of contagious disease remains the most likely future risk to public safety. Now is therefore the time to equip health authorities with effective tools to ensure they are operationally prepared for future events. We propose a direct approach to obtain reliable nearly instantaneous time-varying reproduction numbers for contagious diseases, using only the number of infected individuals as input and utilising the dynamics of the susceptible-infected-recovered (SIR) model. Our approach is based on a multivariate nonlinear regression model simultaneously assessing parameters describing the transmission and recovery rate as a function of the SIR model. Shortly after start of a pandemic, our approach enables estimation of daily reproduction numbers. It avoids numerous sources of additional variation and provides a generic tool for monitoring the instantaneous reproduction numbers. We use Norwegian COVID-19 data as case study and demonstrate that our results are well aligned with changes in the number of infected individuals and the change points following policy interventions. Our estimated reproduction numbers are notably less volatile, provide more credible short-time predictions for the number of infected individuals, and are thus clearly favorable compared with the results obtained by two other popular approaches used for monitoring a pandemic. The proposed approach contributes to increased preparedness to future pandemics of contagious diseases, as it can be used as a simple yet powerful tool to monitor the pandemics, provide short-term predictions, and thus support decision making regarding timely and targeted control measures.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"727-741"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141457118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining Complementarity and Binding Energetics in the Assessment of Protein Interactions: EnCPdock-A Practical Manual. 结合互补性和结合能评估蛋白质相互作用：EnCPdock - 实用手册

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-08-01 Epub Date: 2024-06-17 DOI: 10.1089/cmb.2024.0554

Gargi Biswas, Debasish Mukherjee, Sankar Basu

引用次数: 0

NPI-DCGNN: An Accurate Tool for Identifying ncRNA-Protein Interactions Using a Dual-Channel Graph Neural Network. NPI-DCGNN：利用双通道图神经网络识别 ncRNA 与蛋白质相互作用的精确工具

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-08-01 Epub Date: 2024-06-26 DOI: 10.1089/cmb.2023.0449

Xin Zhang, Liangwei Zhao, Ziyi Chai, Hao Wu, Wei Yang, Chen Li, Yu Jiang, Quanzhong Liu

{"title":"NPI-DCGNN: An Accurate Tool for Identifying ncRNA-Protein Interactions Using a Dual-Channel Graph Neural Network.","authors":"Xin Zhang, Liangwei Zhao, Ziyi Chai, Hao Wu, Wei Yang, Chen Li, Yu Jiang, Quanzhong Liu","doi":"10.1089/cmb.2023.0449","DOIUrl":"10.1089/cmb.2023.0449","url":null,"abstract":"Noncoding RNA (NcRNA)-protein interactions (NPIs) play fundamentally important roles in carrying out cellular activities. Although various predictors based on molecular features and graphs have been published to boost the identification of NPIs, most of them often ignore the information between known NPIs or exhibit insufficient learning ability from graphs, posing a significant challenge in effectively identifying NPIs. To develop a more reliable and accurate predictor for NPIs, in this article, we propose NPI-DCGNN, an end-to-end NPI predictor based on a dual-channel graph neural network (DCGNN). NPI-DCGNN initially treats the known NPIs as an ncRNA-protein bipartite graph. Subsequently, for each ncRNA-protein pair, NPI-DCGNN extracts two local subgraphs centered around the ncRNA and protein, respectively, from the bipartite graph. After that, it utilizes a dual-channel graph representation learning layer based on GNN to generate high-level feature representations for the ncRNA-protein pair. Finally, it employs a fully connected network and output layer to predict whether an interaction exists between the pair of ncRNA and protein. Experimental results on four experimentally validated datasets demonstrate that NPI-DCGNN outperforms several state-of-the-art NPI predictors. Our case studies on the NPInter database further demonstrate the prediction power of NPI-DCGNN in predicting NPIs. With the availability of the source codes (https://github.com/zhangxin11111/NPI-DCGNN), we anticipate that NPI-DCGNN could facilitate the studies of ncRNA interactome by providing highly reliable NPI candidates for further experimental validation.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"742-756"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141457119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

QMix: An Efficient Program to Automatically Estimate Multi-Matrix Mixture Models for Amino Acid Substitution Process. QMix：自动估算氨基酸替代过程多矩阵混合物模型的高效程序

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-08-01 Epub Date: 2024-06-11 DOI: 10.1089/cmb.2023.0403

Nguyen Huy Tinh, Cuong Cao Dang, Le Sy Vinh

{"title":"QMix: An Efficient Program to Automatically Estimate Multi-Matrix Mixture Models for Amino Acid Substitution Process.","authors":"Nguyen Huy Tinh, Cuong Cao Dang, Le Sy Vinh","doi":"10.1089/cmb.2023.0403","DOIUrl":"10.1089/cmb.2023.0403","url":null,"abstract":"The single-matrix amino acid (AA) substitution models are widely used in phylogenetic analyses; however, they are unable to properly model the heterogeneity of AA substitution rates among sites. The multi-matrix mixture models can handle the site rate heterogeneity and outperform the single-matrix models. Estimating multi-matrix mixture models is a complex process and no computer program is available for this task. In this study, we implemented a computer program of the so-called QMix based on the algorithm of LG4X and LG4M with several enhancements to automatically estimate multi-matrix mixture models from large datasets. QMix employs QMaker algorithm instead of XRATE algorithm to accurately and rapidly estimate the parameters of models. It is able to estimate mixture models with different number of matrices and supports multi-threading computing to efficiently estimate models from thousands of genes. We re-estimate mixture models LG4X and LG4M from 1471 HSSP alignments. The re-estimated models (HP4X and HP4M) are slightly better than LG4X and LG4M in building maximum likelihood trees from HSSP and TreeBASE datasets. QMix program required about 10 hours on a computer with 18 cores to estimate a mixture model with four matrices from 200 HSSP alignments. It is easy to use and freely available for researchers.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"703-707"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141300786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating Haplotype Structure and Frequencies: A Bayesian Approach to Unknown Design in Pooled Genomic Data. 估计单倍型结构和频率：在集合基因组数据中进行未知设计的贝叶斯方法。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-08-01 Epub Date: 2024-07-03 DOI: 10.1089/cmb.2023.0211

Yuexuan Wang, Ritabrata Dutta, Andreas Futschik

{"title":"Estimating Haplotype Structure and Frequencies: A Bayesian Approach to Unknown Design in Pooled Genomic Data.","authors":"Yuexuan Wang, Ritabrata Dutta, Andreas Futschik","doi":"10.1089/cmb.2023.0211","DOIUrl":"10.1089/cmb.2023.0211","url":null,"abstract":"The estimation of haplotype structure and frequencies provides crucial information about the composition of genomes. Techniques, such as single-individual haplotyping, aim to reconstruct individual haplotypes from diploid genome sequencing data. However, our focus is distinct. We address the challenge of reconstructing haplotype structure and frequencies from pooled sequencing samples where multiple individuals are sequenced simultaneously. A frequentist method to address this issue has recently been proposed. In contrast to this and other methods that compute point estimates, our proposed Bayesian hierarchical model delivers a posterior that permits us to also quantify uncertainty. Since matching permutations in both haplotype structure and corresponding frequency matrix lead to the same reconstruction of their product, we introduce an order-preserving shrinkage prior that ensures identifiability with respect to permutations. For inference, we introduce a blocked Gibbs sampler that enforces the required constraints. In a simulation study, we assessed the performance of our method. Furthermore, by using our approach on two distinct sets of real data, we demonstrate that our Bayesian approach can reconstruct the dominant haplotypes in a challenging, high-dimensional set-up.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"708-726"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141492186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sketching Methods with Small Window Guarantee Using Minimum Decycling Sets. 使用最小解旋集保证小窗口的草图绘制方法

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-07-01 Epub Date: 2024-07-09 DOI: 10.1089/cmb.2024.0544

Guillaume Marçais, Dan DeBlasio, Carl Kingsford

{"title":"Sketching Methods with Small Window Guarantee Using Minimum Decycling Sets.","authors":"Guillaume Marçais, Dan DeBlasio, Carl Kingsford","doi":"10.1089/cmb.2024.0544","DOIUrl":"10.1089/cmb.2024.0544","url":null,"abstract":"Most sequence sketching methods work by selecting specific k-mers from sequences so that the similarity between two sequences can be estimated using only the sketches. Because estimating sequence similarity is much faster using sketches than using sequence alignment, sketching methods are used to reduce the computational requirements of computational biology software. Applications using sketches often rely on properties of the k-mer selection procedure to ensure that using a sketch does not degrade the quality of the results compared with using sequence alignment. Two important examples of such properties are locality and window guarantees, the latter of which ensures that no long region of the sequence goes unrepresented in the sketch. A sketching method with a window guarantee, implicitly or explicitly, corresponds to a decycling set of the de Bruijn graph, which is a set of unavoidable k-mers. Any long enough sequence, by definition, must contain a k-mer from any decycling set (hence, the unavoidable property). Conversely, a decycling set also defines a sketching method by choosing the k-mers from the set as representatives. Although current methods use one of a small number of sketching method families, the space of decycling sets is much larger and largely unexplored. Finding decycling sets with desirable characteristics (e.g., small remaining path length) is a promising approach to discovering new sketching methods with improved performance (e.g., with small window guarantee). The Minimum Decycling Sets (MDSs) are of particular interest because of their minimum size. Only two algorithms, by Mykkeltveit and Champarnaud, are previously known to generate two particular MDSs, although there are typically a vast number of alternative MDSs. We provide a simple method to enumerate MDSs. This method allows one to explore the space of MDSs and to find MDSs optimized for desirable properties. We give evidence that the Mykkeltveit sets are close to optimal regarding one particular property, the remaining path length. A number of conjectures and computational and theoretical evidence to support them are presented. Code available at https://github.com/Kingsford-Group/mdsscope.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"597-615"},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304339/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141563456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Attention-UNet Models to Predict Protein Contact Maps. 利用注意力网络模型预测蛋白质接触图。

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-07-01 Epub Date: 2024-07-09 DOI: 10.1089/cmb.2023.0102

V A Jisna, Abhaysing Pawar Ajay, P B Jayaraj

引用次数: 0

GraphSlimmer: Preserving Read Mappability with the Minimum Number of Variants. GraphSlimmer：以最少的变体数保持读取映射能力

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-07-01 Epub Date: 2024-07-11 DOI: 10.1089/cmb.2024.0601

Neda Tavakoli, Daniel Gibney, Srinivas Aluru

{"title":"GraphSlimmer: Preserving Read Mappability with the Minimum Number of Variants.","authors":"Neda Tavakoli, Daniel Gibney, Srinivas Aluru","doi":"10.1089/cmb.2024.0601","DOIUrl":"10.1089/cmb.2024.0601","url":null,"abstract":"Modern genomic datasets, like those generated under the 1000 Genome Project, contain millions of variants belonging to known haplotypes. Although these datasets are more representative than a single reference sequence and can alleviate issues like reference bias, they are significantly more computationally burdensome to work with, often involving large-indexed genome graph data structures for tasks such as read mapping. The construction, preprocessing, and mapping algorithms can require substantial computational resources depending on the size of these variant sets. Moreover, the accuracy of mapping algorithms has been shown to decrease when working with complete variant sets. Therefore, a drastically reduced set of variants that preserves important properties of the original set is desirable. This work provides a technique for finding a minimal subset of variants <math><mi>S</mi></math> such that for given parameters α and δ, all substrings up to length α in the haplotypes are guaranteed to be still alignable to the appropriate locations with either Hamming or edit distance at most δ, using only <math><mi>S</mi></math>. Our contributions include showing the NP-hardness and inapproximability of these optimization problems and providing Integer Linear Programming (ILP) formulations. Our edit distance ILP formulation carefully decomposes the problem according to variant locations, which allows it to scale to support all of chromosome 22's variants from the 1000 Genome Project. Our experiments also demonstrate a significant reduction in the number of variants. For example, for moderately long reads, e.g., α = 1000, over 75% of the variants can be removed while preserving read mappability with edit distance at most one.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"616-637"},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141590395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pairwise Distances and the Problem of Multiple Optima. 成对距离和多重最优问题

IF 1.4 4区生物学

Journal of Computational Biology Pub Date : 2024-07-01 Epub Date: 2024-07-10 DOI: 10.1089/cmb.2023.0382

Ran Libeskind-Hadas

引用次数: 0