Journal of Chemical Information and Modeling 最新文献_第6页

Investigating Statistical Conditions of Coevolutionary Signals that Enable Algorithmic Predictions of Protein Partners 研究协同进化信号的统计条件，使算法预测蛋白质伙伴

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-15 DOI: 10.1021/acs.jcim.5c0005210.1021/acs.jcim.5c00052

José Fiorote, João Alves, Letícia Stock and Werner Treptow*,

{"title":"Investigating Statistical Conditions of Coevolutionary Signals that Enable Algorithmic Predictions of Protein Partners","authors":"José Fiorote, João Alves, Letícia Stock and Werner Treptow*, ","doi":"10.1021/acs.jcim.5c0005210.1021/acs.jcim.5c00052","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00052https://doi.org/10.1021/acs.jcim.5c00052","url":null,"abstract":"This study examines the statistical conditions of coevolutionary signals that allow algorithmic predictions of protein partners based on amino acid sequences rather than 3D structures. It introduces a Markov stochastic model that predicts the number of correct protein partners based on coevolutionary information. The model defines state probabilities using a Poisson mixture of normal distributions, with key parameters including the total number of protein sequences M, the coevolutionary information gap α, and variance σ02. The model suggests that algorithmic approaches that maximize coevolutionary information cannot effectively resolve partners in protein families with a large number of sequences M ≥ 100. The model shows that true-positive (TP) rates can be enhanced by disregarding mismatches among similar sequences. This approach allows a distinction, in terms of {α, σ02}, between optimized solutions with trivial errors and other degenerate solutions. Our findings enable the a priori classification of protein families where partners can be reliably predicted by ignoring trivial errors between similar sequences, advancing the understanding of coevolutionary models for large protein data sets.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"4107–4115 4107–4115"},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.5c00052","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML 利用随机森林、支持向量机、AutoGluon-Tabular和H2O AutoML解决药物发现和开发中的不平衡分类问题

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-15 DOI: 10.1021/acs.jcim.5c0002310.1021/acs.jcim.5c00023

Ayush Garg, Narayanan Ramamurthi and Shyam Sundar Das*,

{"title":"Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML","authors":"Ayush Garg, Narayanan Ramamurthi and Shyam Sundar Das*, ","doi":"10.1021/acs.jcim.5c0002310.1021/acs.jcim.5c00023","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00023https://doi.org/10.1021/acs.jcim.5c00023","url":null,"abstract":"The classification models built on class imbalanced data sets tend to prioritize the accuracy of the majority class, and thus, the minority class generally has a higher misclassification rate. Different techniques are available to address the class imbalance in classification models and can be categorized as data-level, algorithm-level, and hybrid methods. But to the best of our knowledge, an in-depth analysis of the performance of these techniques against the class ratio is not available in the literature. We have addressed these shortcomings in this study and have performed a detailed analysis of the performance of four different techniques to address imbalanced class distribution using machine learning (ML) methods and AutoML tools. To carry out our study, we have selected four such techniques─(a) threshold optimization using (i) GHOST and (ii) the area under the precision–recall curve (AUPR) curve, (b) internal balancing method of AutoML and class-weight of machine learning methods, and (c) data balancing using SMOTETomek─and generated 27 data sets considering nine different class ratios (i.e., the ratio of the positive class and total samples) from three data sets that belong to the drug discovery and development field. We have employed random forest (RF) and support vector machine (SVM) as representatives of ML classifier and AutoGluon-Tabular (version 0.6.1) and H2O AutoML (version 3.40.0.4) as representatives of AutoML tools. The important findings of our studies are as follows: (i) there is no effect of threshold optimization on ranking metrics such as AUC and AUPR, but AUC and AUPR get affected by class-weighting and SMOTTomek; (ii) for ML methods RF and SVM, significant percentage improvement up to 375, 33.33, and 450 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy, which are suitable for performance evaluation of imbalanced data sets; (iii) for AutoML libraries AutoGluon-Tabular and H2O AutoML, significant percentage improvement up to 383.33, 37.25, and 533.33 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy; (iv) the general pattern of percentage improvement in balanced accuracy is that the percentage improvement increases when the class ratio is systematically decreased from 0.5 to 0.1; in the case of F1 score and MCC, maximum improvement is achieved at the class ratio of 0.3; (v) for both ML and AutoML with balancing, it is observed that any individual class-balancing technique does not outperform all other methods on a significantly higher number of data sets based on F1 score; (vi) the three external balancing techniques combined outperformed the internal balancing methods of the ML and AutoML; (vii) AutoML tools perform as good as the ML models and in some cases perform even better for handling imbalanced classification when applied with imbalance handling techniques. In summary, exploration of multiple data balancing techniques is r","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"3976–3989 3976–3989"},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating Statistical Conditions of Coevolutionary Signals that Enable Algorithmic Predictions of Protein Partners. 研究协同进化信号的统计条件，使算法预测蛋白质伙伴。

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-15 DOI: 10.1021/acs.jcim.5c00052

José Fiorote,João Alves,Letícia Stock,Werner Treptow

{"title":"Investigating Statistical Conditions of Coevolutionary Signals that Enable Algorithmic Predictions of Protein Partners.","authors":"José Fiorote,João Alves,Letícia Stock,Werner Treptow","doi":"10.1021/acs.jcim.5c00052","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00052","url":null,"abstract":"This study examines the statistical conditions of coevolutionary signals that allow algorithmic predictions of protein partners based on amino acid sequences rather than 3D structures. It introduces a Markov stochastic model that predicts the number of correct protein partners based on coevolutionary information. The model defines state probabilities using a Poisson mixture of normal distributions, with key parameters including the total number of protein sequences M, the coevolutionary information gap α, and variance σ02. The model suggests that algorithmic approaches that maximize coevolutionary information cannot effectively resolve partners in protein families with a large number of sequences M ≥ 100. The model shows that true-positive (TP) rates can be enhanced by disregarding mismatches among similar sequences. This approach allows a distinction, in terms of {α, σ02}, between optimized solutions with trivial errors and other degenerate solutions. Our findings enable the a priori classification of protein families where partners can be reliably predicted by ignoring trivial errors between similar sequences, advancing the understanding of coevolutionary models for large protein data sets.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"26 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143836596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Three-Dimensional CH/π and CH/N Interactions from Quantum-Mechanical and Database Analyses 基于量子力学和数据库分析的三维CH/π和CH/N相互作用

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.5c0012410.1021/acs.jcim.5c00124

Daichi Hayakawa*, and , Hiroaki Gouda,

引用次数: 0

Crystal Structure Prediction Using a Self-Attention Neural Network and Semantic Segmentation. 基于自注意神经网络和语义分割的晶体结构预测。

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.4c02345

Wuling Zhao,Minxia Zhou,Jialin Shao,Jingzheng Ren,Yusha Hu,Yulin Han,Yi Man

引用次数: 0

De Novo Design of Cyclic Peptide Binders Based on Fragment Docking and Assembling. 基于片段对接与组装的环状肽结合物从头设计。

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.5c00088

Changsheng Zhang,Fanhao Wang,Tiantian Zhang,Yang Yang,Liying Wang,Xiaoling Zhang,Luhua Lai

{"title":"De Novo Design of Cyclic Peptide Binders Based on Fragment Docking and Assembling.","authors":"Changsheng Zhang,Fanhao Wang,Tiantian Zhang,Yang Yang,Liying Wang,Xiaoling Zhang,Luhua Lai","doi":"10.1021/acs.jcim.5c00088","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00088","url":null,"abstract":"Cyclic peptides offer distinct advantages in modulating protein-protein interactions (PPIs), including enhanced target specificity, structural stability, reduced toxicity, and minimal immunogenicity. However, most cyclic peptide therapeutics currently in clinical development are derived from natural products or the cyclization of protein loops, with few methodologies available for de novo cyclic peptide design based on target protein structures. To fill this gap, we introduce CycDockAssem, an integrative computational platform that facilitates the systematic generation of head-to-tail cyclic peptides made entirely of natural - or -amino acid residues. The cyclic peptide binders are constructed from oligopeptide fragments containing 3-5 amino acids. A fragment library comprising 15 million fragments was created from the Protein Data Bank. The assembly workflow involves dividing the targeted protein surface into two docking boxes; the updated protein-protein docking program SDOCK2.0 is then utilized to identify the best binding fragments for these boxes. The fragments binding in different boxes are concatenated into a ring using two additional peptide fragments as linkers. A ROSETTA script is employed for sequence redesign, while molecular dynamics simulations and MM-PBSA calculations assess the conformational stability and binding free energy. To enhance docking performance, cation-π interactions, backbone hydrogen bonding potential, and explicit water exclusion energy were incorporated into the docking score function of SDOCK2.0, resulting in a significantly improved performance on the updated test set. A mirror design strategy was developed for cyclic peptides composed of -amino acids, where natural amino acid cyclic peptide binders are first designed for the mirror image of the target protein and the resulting complexes are then mirrored back. CycDockAssem was experimentally validated using tumor necrosis factor α (TNFα) as the target. Surface plasmon resonance experiments demonstrated that six of the seven designed cyclic peptides bind TNFα with micromolar affinity, two of which significantly inhibit TNFα downstream gene expression. Overall, CycDockAssem provides a robust strategy for targeted de novo cyclic peptide drug discovery.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"41 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143831550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Three-Dimensional CH/π and CH/N Interactions from Quantum-Mechanical and Database Analyses. 基于量子力学和数据库分析的三维CH/π和CH/N相互作用。

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.5c00124

Daichi Hayakawa,Hiroaki Gouda

引用次数: 0

Topology-Enhanced Machine Learning Model (Top-ML) for Anticancer Peptide Prediction 用于抗癌肽预测的拓扑增强机器学习模型（Top-ML）

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.5c0047610.1021/acs.jcim.5c00476

Joshua Zhi En Tan, JunJie Wee*, Xue Gong* and Kelin Xia*,

{"title":"Topology-Enhanced Machine Learning Model (Top-ML) for Anticancer Peptide Prediction","authors":"Joshua Zhi En Tan, JunJie Wee*, Xue Gong* and Kelin Xia*, ","doi":"10.1021/acs.jcim.5c0047610.1021/acs.jcim.5c00476","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00476https://doi.org/10.1021/acs.jcim.5c00476","url":null,"abstract":"Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence “connection” information characterized by spectral descriptors. Our Top-ML model, employing an Extra-Trees classifier, has been validated on the AntiCP 2.0 and mACPpred 2.0 benchmark data sets, achieving state-of-the-art performance or results comparable to existing deep learning models, while providing greater interpretability. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"4232–4242 4232–4242"},"PeriodicalIF":5.6,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine Learning and Structural Dynamics-Based Approach to Reveal Molecular Mechanism of PTEN Missense Mutations Shared by Cancer and Autism Spectrum Disorder 基于机器学习和结构动力学的方法揭示癌症和自闭症谱系障碍共有PTEN错义突变的分子机制

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.5c0013410.1021/acs.jcim.5c00134

Miao Yang, Jingran Wang, Ziyun Zhou, Wentian Li, Gennady Verkhivker, Fei Xiao* and Guang Hu*,

{"title":"Machine Learning and Structural Dynamics-Based Approach to Reveal Molecular Mechanism of PTEN Missense Mutations Shared by Cancer and Autism Spectrum Disorder","authors":"Miao Yang, Jingran Wang, Ziyun Zhou, Wentian Li, Gennady Verkhivker, Fei Xiao* and Guang Hu*, ","doi":"10.1021/acs.jcim.5c0013410.1021/acs.jcim.5c00134","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00134https://doi.org/10.1021/acs.jcim.5c00134","url":null,"abstract":"Missense mutations in oncogenic proteins that are concurrently associated with neurodevelopmental disorders have garnered significant attention. Phosphatase and tensin homologue (PTEN) serves as a paradigmatic model for mapping its mutational landscape and identifying genotypic predictors of distinct phenotypic outcomes, including cancer and autism spectrum disorder (ASD). Despite extensive research into the genotype-phenotype correlations of PTEN mutations, the mechanisms underlying the dual association of specific PTEN mutations with both cancer and ASD (PTEN-cancer/ASD mutations) remain elusive. This study introduces an integrative approach that combines machine learning (ML) with structural dynamics to elucidate the molecular effects of PTEN-cancer/ASD mutations. Analysis of biophysical and network-biology-based signatures reveals a complex energetic and functional landscape. Subsequently, an ML model and corresponding integrated score were developed to classify and predict PTEN-cancer/ASD mutations, underscoring the significance of protein dynamics in predicting cellular phenotypes. Further molecular dynamics simulations demonstrated that PTEN-cancer/ASD mutations induce dynamic alterations characterized by open conformational changes restricted to the P loop and coupled with interdomain allosteric regulation. This research aims to enhance the genotypic and phenotypic understanding of PTEN-cancer/ASD mutations through an interpretable ML model integrated with structural dynamics analysis. By identifying shared mechanisms between cancer and ASD, the findings pave the way for the development of novel therapeutic strategies.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"4173–4188 4173–4188"},"PeriodicalIF":5.6,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Crystal Structure Prediction Using a Self-Attention Neural Network and Semantic Segmentation 基于自注意神经网络和语义分割的晶体结构预测

IF 5.6 2区化学

Journal of Chemical Information and Modeling Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.4c0234510.1021/acs.jcim.4c02345

Wuling Zhao, Minxia Zhou, Jialin Shao, Jingzheng Ren, Yusha Hu, Yulin Han* and Yi Man*,

{"title":"Crystal Structure Prediction Using a Self-Attention Neural Network and Semantic Segmentation","authors":"Wuling Zhao, Minxia Zhou, Jialin Shao, Jingzheng Ren, Yusha Hu, Yulin Han* and Yi Man*, ","doi":"10.1021/acs.jcim.4c0234510.1021/acs.jcim.4c02345","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02345https://doi.org/10.1021/acs.jcim.4c02345","url":null,"abstract":"The development of new materials is a time-consuming and resource-intensive process. Deep learning has emerged as a promising approach to accelerate this process. However, accurately predicting crystal structures using deep learning remains a significant challenge due to the complex, high-dimensional nature of atomic interactions and the scarcity of comprehensive training data that captures the full diversity of possible crystal configurations. This work developed a neural network model based on a data set comprising thousands of crystallographic information files from existing crystal structure databases. The model incorporates a self-attention mechanism to enhance prediction accuracy by learning and extracting both local and global features of three-dimensional structures, treating the atoms in each crystal as point sets. This approach enables effective semantic segmentation and accurate unit cell prediction. Experimental results demonstrate that for unit cells containing up to 500 atoms, the model achieves a structure prediction accuracy of 89.78%.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"3928–3943 3928–3943"},"PeriodicalIF":5.6,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0