Journal of Computational Biology最新文献

筛选
英文 中文
MMG4: Recognition of G4-Forming Sequences Based on Markov Model. MMG4:基于马尔可夫模型的 G4 形成序列识别。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-12-01 Epub Date: 2024-10-17 DOI: 10.1089/cmb.2024.0523
Boyuan Yu, Hao Zhang, Cong Pian, Yuanyuan Chen
{"title":"MMG4: Recognition of G4-Forming Sequences Based on Markov Model.","authors":"Boyuan Yu, Hao Zhang, Cong Pian, Yuanyuan Chen","doi":"10.1089/cmb.2024.0523","DOIUrl":"10.1089/cmb.2024.0523","url":null,"abstract":"<p><p>G-quadruplexes (G4s) are special nucleic acid structures with various important biological functions. Existing tools and technologies for G4-forming sequences recognition are limited to time-consuming and costly methods such as circular dichroism and nuclear magnetic resonance. Developing a fast and accurate model for G4-forming sequences recognition has far-reaching significance. In this study, MMG4, a novel model to recognize G4-forming sequences based on Markov model (MM), was developed and the phenomenon of high recognition accuracy in the central region of the sequence and low accuracy in the two end regions was discovered. It was further found that the differences in base transfer probabilities, ratio distribution, and G4-motif structural content in different regions may be the causes of this phenomenon. The study also explored the impact of sequence length on recognition accuracy and found the optimal recognition interval to be [910-1049], with the highest recognition accuracy reaching 85.95%. By extracting sequence features, the study constructed three types of machine learning models: random forest (RF), support vector machine, and back-propagation neural network. It was found that recognition performance of MM was significantly better than that of the other three machine learning models, proving that the recognition method based on MM can effectively capture the correlation information between adjacent nucleotides of G4. By combining MM with the three machine learning models, the predictive performance of MMG4 improved. Among them, the RF model combined with MM has the best performance, achieving an area under the receiver operating characteristic curve value of 0.93 and an area under the precision-recall curve value of 0.9. Finally, the study validated the model robustness and generalization ability through independent testing dataset.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1211-1223"},"PeriodicalIF":1.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142466641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Statistics of Parametrized Syncmers in a Simple Mutation Process Without Spurious Matches. 无假匹配的简单突变过程中参数化同步器的统计。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-12-01 Epub Date: 2024-11-12 DOI: 10.1089/cmb.2024.0508
John L Spouge, Pijush Das, Ye Chen, Martin Frith
{"title":"The Statistics of Parametrized Syncmers in a Simple Mutation Process Without Spurious Matches.","authors":"John L Spouge, Pijush Das, Ye Chen, Martin Frith","doi":"10.1089/cmb.2024.0508","DOIUrl":"10.1089/cmb.2024.0508","url":null,"abstract":"<p><p><b><i>Introduction:</i></b> Often, bioinformatics uses summary sketches to analyze next-generation sequencing data, but most sketches are not well understood statistically. Under a simple mutation model, Blanca et al. analyzed complete sketches, that is, the complete set of unassembled <i>k</i>-mers, from two closely related sequences. The analysis extracted a point mutation parameter θ quantifying the evolutionary distance between the two sequences. <b><i>Methods:</i></b> We extend the results of Blanca et al. for complete sketches to parametrized syncmer sketches with downsampling. A syncmer sketch can sample <i>k</i>-mers much more sparsely than a complete sketch. Consider the following simple mutation model disallowing insertions or deletions. Consider a reference sequence <i>A</i> (e.g., a subsequence from a reference genome), and mutate each nucleotide in it independently with probability θ to produce a mutated sequence <i>B</i> (corresponding to, e.g., a set of reads or draft assembly of a related genome). Then, syncmer counts alone yield an approximate Gaussian distribution for estimating θ. The assumption disallowing insertions and deletions motivates a check on the lengths of <i>A</i> and <i>B</i>. The syncmer count from <i>B</i> yields an approximate Gaussian distribution for its length, and a <i>p</i>-value can test the length of <i>B</i> against the length of <i>A</i> using syncmer counts alone. <b><i>Results:</i></b> The Gaussian distributions permit syncmer counts alone to estimate θ and mutated sequence length with a known sampling error. Under some circumstances, the results provide the sampling error for the Mash containment index when applied to syncmer counts. <b><i>Conclusions:</i></b> The approximate Gaussian distributions provide hypothesis tests and confidence intervals for phylogenetic distance and sequence length. Our methods are likely to generalize to sketches other than syncmers and may be useful in assembling reads and related applications.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1195-1210"},"PeriodicalIF":1.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11698668/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142621433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stochastic Analysis for the Dual Virus Parallel Transmission Model with Immunity Delay. 带免疫延迟的双病毒平行传播模型的随机分析
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-12-01 Epub Date: 2024-10-18 DOI: 10.1089/cmb.2024.0662
Jing Yang, Shaojuan Ma, Juan Ma, Jinhua Ran, Xinyu Bai
{"title":"Stochastic Analysis for the Dual Virus Parallel Transmission Model with Immunity Delay.","authors":"Jing Yang, Shaojuan Ma, Juan Ma, Jinhua Ran, Xinyu Bai","doi":"10.1089/cmb.2024.0662","DOIUrl":"10.1089/cmb.2024.0662","url":null,"abstract":"<p><p>In this article, the qualitative properties of a stochastic dual virus parallel transmission model with immunity delay are analyzed. First, we use Lyapunov theory to study the existence and uniqueness of the global positive solution of the proposed model. Second, the threshold values of the persistence and extinction of two viruses were obtained. Finally, the numerical simulation verifies the theoretical results. The results show that the immunity delay and the intensity of noise have important effects on the two diseases spreading in parallel.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1291-1304"},"PeriodicalIF":1.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142466642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An R Package for Nonparametric Inference on Dynamic Populations with Infinitely Many Types. 无限多类型动态种群的非参数推断 R 软件包
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-12-01 Epub Date: 2024-10-22 DOI: 10.1089/cmb.2024.0600
Filippo Ascolani, Stefano Damato, Matteo Ruggiero
{"title":"An R Package for Nonparametric Inference on Dynamic Populations with Infinitely Many Types.","authors":"Filippo Ascolani, Stefano Damato, Matteo Ruggiero","doi":"10.1089/cmb.2024.0600","DOIUrl":"10.1089/cmb.2024.0600","url":null,"abstract":"<p><p>Fleming-Viot diffusions are widely used stochastic models for population dynamics that extend the celebrated Wright-Fisher diffusions. They describe the temporal evolution of the relative frequencies of the allelic types in an ideally infinite panmictic population, whose individuals undergo random genetic drift and at birth can mutate to a new allelic type drawn from a possibly infinite potential pool, independently of their parent. Recently, Bayesian nonparametric inference has been considered for this model when a finite sample of individuals is drawn from the population at several discrete time points. Previous works have fully described the relevant estimators for this problem, but current software is available only for the Wright-Fisher finite-dimensional case. Here, we provide software for the general case, overcoming some nontrivial computational challenges posed by this setting. The R package FVDDPpkg efficiently approximates the filtering and smoothing distribution for Fleming-Viot diffusions, given finite samples of individuals collected at different times. A suitable Monte Carlo approximation is also introduced in order to reduce the computational cost.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1305-1311"},"PeriodicalIF":1.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142466624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Positivity and Boundedness Preserving Numerical Scheme for a Stochastic Multigroup Susceptible-Infected-Recovering Epidemic Model with Age Structure. 具有年龄结构的随机多群体易感-感染-恢复流行病模型的正向性和有界性保留数值方案。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-12-01 Epub Date: 2024-09-27 DOI: 10.1089/cmb.2023.0443
Han Ma, Yanyan Du, Zong Wang, Qimin Zhang
{"title":"Positivity and Boundedness Preserving Numerical Scheme for a Stochastic Multigroup Susceptible-Infected-Recovering Epidemic Model with Age Structure.","authors":"Han Ma, Yanyan Du, Zong Wang, Qimin Zhang","doi":"10.1089/cmb.2023.0443","DOIUrl":"10.1089/cmb.2023.0443","url":null,"abstract":"<p><p>Since the stochastic age-structured multigroup susceptible-infected-recovering (SIR) epidemic model is nonlinear, the solution of this model is hard to be explicitly represented. It is necessary to construct effective numerical methods so as to predict the number of infections. In addition, the stochastic age-structured multigroup SIR model has features of positivity and boundedness of the solution. Therefore, in this article, in order to ensure that the numerical and analytical solutions must have the same properties, by modifying the classical Euler-Maruyama (EM) scheme, we generate a positivity and boundedness preserving EM (PBPEM) method on temporal space for stochastic age-structured multigroup SIR model, which is proved to have a strong convergence to the true solution over finite time intervals. Moreover, by combining the standard finite element method and the PBPEM method, we propose a full-discrete scheme to show the numerical solutions, as well as analyze the error estimations. Finally, the full-discrete scheme is applied to a general stochastic two-group SIR model and the Chlamydia epidemic model, which shows the superiority of the numerical method.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1259-1290"},"PeriodicalIF":1.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142347649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Arithmetic Coding-Based Encoding Method Toward High-Density DNA Storage. 基于自适应算术编码的编码方法,迈向高密度 DNA 存储。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-11-15 DOI: 10.1089/cmb.2024.0697
Yingxin Hu, Yanjun Liu, Yuefei Yang
{"title":"Adaptive Arithmetic Coding-Based Encoding Method Toward High-Density DNA Storage.","authors":"Yingxin Hu, Yanjun Liu, Yuefei Yang","doi":"10.1089/cmb.2024.0697","DOIUrl":"10.1089/cmb.2024.0697","url":null,"abstract":"<p><p>With the rapid advancement of big data and artificial intelligence technologies, the limitations inherent in traditional storage media for accommodating vast amounts of data have become increasingly evident. DNA storage is an innovative approach harnessing DNA and other biomolecules as storage mediums, endowed with superior characteristics including expansive capacity, remarkable density, minimal energy requirements, and unparalleled longevity. Central to the efficient DNA storage is the process of DNA coding, whereby digital information is converted into sequences of DNA bases. A novel encoding method based on adaptive arithmetic coding (AAC) has been introduced, delineating the encoding process into three distinct phases: compression, error correction, and mapping. Prediction by Partial Matching (PPM)-based AAC in the compression phase serves to compress data and enhance storage density. Subsequently, the error correction phase relies on octal Hamming code to rectify errors and safeguard data integrity. The mapping phase employs a \"3-2 code\" mapping relationship to ensure adherence to biochemical constraints. The proposed method was verified by encoding different formats of files such as text, pictures, and audio. The results indicated that the average coding density of bases can be up to 3.25 per nucleotide, the GC content (which includes guanine [G] and cytosine [C]) can be stabilized at 50% and the homopolymer length is restricted to no more than 2. Simulation experimental results corroborate the method's efficacy in preserving data integrity during both reading and writing operations, augmenting storage density, and exhibiting robust error correction capabilities.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142621431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models. 从政策到预测:利用机器学习和疾病模型评估综合框架中的预测准确性。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-11-01 Epub Date: 2024-08-02 DOI: 10.1089/cmb.2023.0377
Amit K Chakraborty, Hao Wang, Pouria Ramazi
{"title":"From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models.","authors":"Amit K Chakraborty, Hao Wang, Pouria Ramazi","doi":"10.1089/cmb.2023.0377","DOIUrl":"10.1089/cmb.2023.0377","url":null,"abstract":"<p><p>To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>3.10</mn><mo>,</mo><mi>p</mi><mo>=</mo><mn>0.38</mn></mrow></math>]. In two provinces, a significant difference was observed [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.77</mn><mo>,</mo><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.07</mn><mo>,</mo><mi>p</mi><mo><</mo><mn>0.05</mn></mrow></math>], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1104-1117"},"PeriodicalIF":1.4,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141874974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Noise to Knowledge: Diffusion Probabilistic Model-Based Neural Inference of Gene Regulatory Networks. 从噪声到知识:基于扩散概率模型的基因调控网络神经推断。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-11-01 Epub Date: 2024-10-10 DOI: 10.1089/cmb.2024.0607
Hao Zhu, Donna Slonim
{"title":"From Noise to Knowledge: Diffusion Probabilistic Model-Based Neural Inference of Gene Regulatory Networks.","authors":"Hao Zhu, Donna Slonim","doi":"10.1089/cmb.2024.0607","DOIUrl":"10.1089/cmb.2024.0607","url":null,"abstract":"<p><p>Understanding gene regulatory networks (GRNs) is crucial for elucidating cellular mechanisms and advancing therapeutic interventions. Original methods for GRN inference from bulk expression data often struggled with the high dimensionality and inherent noise in the data. Here we introduce RegDiffusion, a new class of Denoising Diffusion Probabilistic Models focusing on the regulatory effects among feature variables. RegDiffusion introduces Gaussian noise to the input data following a diffusion schedule and uses a neural network with a parameterized adjacency matrix to predict the added noise. We show that using this process, GRNs can be learned effectively with a surprisingly simple model architecture. In our benchmark experiments, RegDiffusion shows superior performance compared to several baseline methods in multiple datasets. We also demonstrate that RegDiffusion can infer biologically meaningful regulatory networks from real-world single-cell data sets with over 15,000 genes in under 5 minutes. This work not only introduces a fresh perspective on GRN inference but also highlights the promising capacity of diffusion-based models in the area of single-cell analysis. The RegDiffusion software package and experiment data are available at https://github.com/TuftsBCB/RegDiffusion.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1087-1103"},"PeriodicalIF":1.4,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11698671/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142466640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network-Constrained Eigen-Single-Cell Profile Estimation for Uncovering Crucial Immunogene Regulatory Systems in Human Bone Marrow. 网络约束特征单细胞轮廓估计法揭示人类骨髓中关键的免疫基因调控系统
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-11-01 Epub Date: 2024-09-06 DOI: 10.1089/cmb.2024.0539
Heewon Park, Satoru Miyano
{"title":"Network-Constrained Eigen-Single-Cell Profile Estimation for Uncovering Crucial Immunogene Regulatory Systems in Human Bone Marrow.","authors":"Heewon Park, Satoru Miyano","doi":"10.1089/cmb.2024.0539","DOIUrl":"10.1089/cmb.2024.0539","url":null,"abstract":"<p><p>We focus on characterizing cell lines from young and aged-healthy and -AML (acute myeloid leukemia) cell lines, and our goal is to identify the key markers associated with the progression of AML. To characterize the age-related phenotypes in AML cell lines, we consider eigenCell analysis that effectively encapsulates the primary expression level patterns across the cell lines. However, earlier investigations utilizing eigenGenes and eigenCells analysis were based on linear combination of all features, leading to the disturbance from noise features. Moreover, the analysis based on a fully dense loading matrix makes it challenging to interpret the results of eigenCells analysis. In order to address these challenges, we develop a novel computational approach termed network-constrained eigenCells profile estimation, which employs a sparse learning strategy. The proposed method estimates eigenCell based on not only the lasso but also network constrained penalization. The use of the network-constrained penalization enables us to simultaneously select neighborhood genes. Furthermore, the hub genes and their regulator/target genes are easily selected as crucial markers for eigenCells estimation. That is, our method can incorporate insights from network biology into the process of sparse loading estimation. Through our methodology, we estimate sparse eigenCells profiles, where only critical markers exhibit expression levels. This allows us to identify the key markers associated with a specific phenotype. Monte Carlo simulations demonstrate the efficacy of our method in reconstructing the sparse structure of eigenCells profiles. We employed our approach to unveil the regulatory system of immunogenes in both young/aged-healthy and -AML cell lines. The markers we have identified for the age-related phenotype in both healthy and AML cell lines have garnered strong support from previous studies. Specifically, our findings, in conjunction with the existing literature, indicate that the activities within this subnetwork of CD79A could be pivotal in elucidating the mechanism driving AML progression, particularly noting the significant role played by the diminished activities in the CD79A subnetwork. We expect that the proposed method will be a useful tool for characterizing disease-related subsets of cell lines, encompassing phenotypes and clones.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1158-1178"},"PeriodicalIF":1.4,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142140187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid GNN Approach for Improved Molecular Property Prediction. 改进分子特性预测的混合 GNN 方法。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-11-01 Epub Date: 2024-07-31 DOI: 10.1089/cmb.2023.0452
Pedro Quesado, Luis H M Torres, Bernardete Ribeiro, Joel P Arrais
{"title":"A Hybrid GNN Approach for Improved Molecular Property Prediction.","authors":"Pedro Quesado, Luis H M Torres, Bernardete Ribeiro, Joel P Arrais","doi":"10.1089/cmb.2023.0452","DOIUrl":"10.1089/cmb.2023.0452","url":null,"abstract":"<p><p>The development of new drugs is a vital effort that has the potential to improve human health, well-being and life expectancy. Molecular property prediction is a crucial step in drug discovery, as it helps to identify potential therapeutic compounds. However, experimental methods for drug development can often be time-consuming and resource-intensive, with a low probability of success. To address such limitations, deep learning (DL) methods have emerged as a viable alternative due to their ability to identify high-discriminating patterns in molecular data. In particular, graph neural networks (GNNs) operate on graph-structured data to identify promising drug candidates with desirable molecular properties. These methods represent molecules as a set of node (atoms) and edge (chemical bonds) features to aggregate local information for molecular graph representation learning. Despite the availability of several GNN frameworks, each approach has its own shortcomings. Although, some GNNs may excel in certain tasks, they may not perform as well in others. In this work, we propose a hybrid approach that incorporates different graph-based methods to combine their strengths and mitigate their limitations to accurately predict molecular properties. The proposed approach consists in a multi-layered hybrid GNN architecture that integrates multiple GNN frameworks to compute graph embeddings for molecular property prediction. Furthermore, we conduct extensive experiments on multiple benchmark datasets to demonstrate that our hybrid approach significantly outperforms the state-of-the-art graph-based models. The data and code scripts to reproduce the results are available in the repository, https://github.com/pedro-quesado/HybridGNN.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1146-1157"},"PeriodicalIF":1.4,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141855704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信