adi - qsar:基于化合物生物活性差异的机器学习模型

IF 3 3区生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY

Journal of Computer-Aided Molecular Design Pub Date : 2023-06-29 DOI:10.1007/s10822-023-00517-1

Gyoung Jin Park, Nam Sook Kang

{"title":"adi - qsar:基于化合物生物活性差异的机器学习模型","authors":"Gyoung Jin Park, Nam Sook Kang","doi":"10.1007/s10822-023-00517-1","DOIUrl":null,"url":null,"abstract":"<div><p>Drug candidates identified by the pharmaceutical industry typically have unique structural characteristics to ensure they interact strongly and specifically with their biological targets. Identifying these characteristics is a key challenge for developing new drugs, and quantitative structure-activity relationship (QSAR) analysis has generally been used to perform this task. QSAR models with good predictive power improve the cost and time efficiencies invested in compound development. Generating these good models depends on how well differences between “active” and “inactive” compound groups can be conveyed to the model to be learned. Efforts to solve this difference issue have been made, including generating a “molecular descriptor” that compressively expresses the structural characteristics of compounds. From the same perspective, we succeeded in developing the Activity Differences-Quantitative Structure-Activity Relationship (ADis-QSAR) model by generating molecular descriptors that more explicitly convey features of the group through a pair system that performs direct connections between active and inactive groups. We used popular machine learning algorithms, such as Support Vector Machine, Random Forest, XGBoost and Multi-Layer Perceptron for model learning and evaluated the model using scores such as accuracy, area under curve, precision and specificity. The results showed that the Support Vector Machine performed better than the others. Notably, the ADis-QSAR model showed significant improvements in meaningful scores such as precision and specificity compared to the baseline model, even in datasets with dissimilar chemical spaces. This model reduces the risk of selecting false positive compounds, improving the efficiency of drug development.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"37 9","pages":"435 - 451"},"PeriodicalIF":3.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ADis-QSAR: a machine learning model based on biological activity differences of compounds\",\"authors\":\"Gyoung Jin Park, Nam Sook Kang\",\"doi\":\"10.1007/s10822-023-00517-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Drug candidates identified by the pharmaceutical industry typically have unique structural characteristics to ensure they interact strongly and specifically with their biological targets. Identifying these characteristics is a key challenge for developing new drugs, and quantitative structure-activity relationship (QSAR) analysis has generally been used to perform this task. QSAR models with good predictive power improve the cost and time efficiencies invested in compound development. Generating these good models depends on how well differences between “active” and “inactive” compound groups can be conveyed to the model to be learned. Efforts to solve this difference issue have been made, including generating a “molecular descriptor” that compressively expresses the structural characteristics of compounds. From the same perspective, we succeeded in developing the Activity Differences-Quantitative Structure-Activity Relationship (ADis-QSAR) model by generating molecular descriptors that more explicitly convey features of the group through a pair system that performs direct connections between active and inactive groups. We used popular machine learning algorithms, such as Support Vector Machine, Random Forest, XGBoost and Multi-Layer Perceptron for model learning and evaluated the model using scores such as accuracy, area under curve, precision and specificity. The results showed that the Support Vector Machine performed better than the others. Notably, the ADis-QSAR model showed significant improvements in meaningful scores such as precision and specificity compared to the baseline model, even in datasets with dissimilar chemical spaces. This model reduces the risk of selecting false positive compounds, improving the efficiency of drug development.</p></div>\",\"PeriodicalId\":621,\"journal\":{\"name\":\"Journal of Computer-Aided Molecular Design\",\"volume\":\"37 9\",\"pages\":\"435 - 451\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer-Aided Molecular Design\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10822-023-00517-1\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer-Aided Molecular Design","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s10822-023-00517-1","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

制药行业确定的候选药物通常具有独特的结构特征，以确保它们与生物靶点强烈而特异性地相互作用。识别这些特征是开发新药的关键挑战，定量构效关系(QSAR)分析通常用于完成这项任务。具有良好预测能力的QSAR模型提高了化合物开发的成本和时间效率。生成这些好的模型取决于“活跃的”和“不活跃的”复合组之间的差异能在多大程度上传达给要学习的模型。解决这一差异问题的努力已经完成，包括生成压缩表达化合物结构特征的“分子描述符”。从同样的角度来看，我们成功地开发了活性差异-定量结构-活性关系(adi - qsar)模型，通过对系统生成更明确地传达基团特征的分子描述符，在活性基团和非活性基团之间执行直接连接。我们使用流行的机器学习算法，如支持向量机、随机森林、XGBoost和多层感知器进行模型学习，并使用准确性、曲线下面积、精度和特异性等分数来评估模型。结果表明，支持向量机的性能优于其他方法。值得注意的是，即使在具有不同化学空间的数据集中，与基线模型相比，adi - qsar模型在精度和特异性等有意义的得分方面也有显着改善。该模型降低了选择假阳性化合物的风险，提高了药物开发的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

ADis-QSAR: a machine learning model based on biological activity differences of compounds

查看原文本刊更多论文

ADis-QSAR: a machine learning model based on biological activity differences of compounds

Drug candidates identified by the pharmaceutical industry typically have unique structural characteristics to ensure they interact strongly and specifically with their biological targets. Identifying these characteristics is a key challenge for developing new drugs, and quantitative structure-activity relationship (QSAR) analysis has generally been used to perform this task. QSAR models with good predictive power improve the cost and time efficiencies invested in compound development. Generating these good models depends on how well differences between “active” and “inactive” compound groups can be conveyed to the model to be learned. Efforts to solve this difference issue have been made, including generating a “molecular descriptor” that compressively expresses the structural characteristics of compounds. From the same perspective, we succeeded in developing the Activity Differences-Quantitative Structure-Activity Relationship (ADis-QSAR) model by generating molecular descriptors that more explicitly convey features of the group through a pair system that performs direct connections between active and inactive groups. We used popular machine learning algorithms, such as Support Vector Machine, Random Forest, XGBoost and Multi-Layer Perceptron for model learning and evaluated the model using scores such as accuracy, area under curve, precision and specificity. The results showed that the Support Vector Machine performed better than the others. Notably, the ADis-QSAR model showed significant improvements in meaningful scores such as precision and specificity compared to the baseline model, even in datasets with dissimilar chemical spaces. This model reduces the risk of selecting false positive compounds, improving the efficiency of drug development.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Computer-Aided Molecular Design 生物-计算机：跨学科应用

CiteScore

8.00

自引率

8.60%

发文量

审稿时长

3 months

期刊介绍： The Journal of Computer-Aided Molecular Design provides a form for disseminating information on both the theory and the application of computer-based methods in the analysis and design of molecules. The scope of the journal encompasses papers which report new and original research and applications in the following areas: - theoretical chemistry; - computational chemistry; - computer and molecular graphics; - molecular modeling; - protein engineering; - drug design; - expert systems; - general structure-property relationships; - molecular dynamics; - chemical database development and usage.