利用机器学习方法预测Sigma受体配体的活性和选择性

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2025-09-01 DOI:10.1021/acs.jcim.5c01091

Lisa Lombardo, , , Verena Battisti, , , Thierry Langer, , , Rosaria Gitto, , and , Laura De Luca*,

{"title":"利用机器学习方法预测Sigma受体配体的活性和选择性","authors":"Lisa Lombardo, , , Verena Battisti, , , Thierry Langer, , , Rosaria Gitto, , and , Laura De Luca*, ","doi":"10.1021/acs.jcim.5c01091","DOIUrl":null,"url":null,"abstract":"Sigma (σ) receptors (SRs) have emerged as important therapeutic targets due to their roles in various biological pathways. They are classified into two subtypes: S1R, primarily distributed in the central nervous system and related to neuroprotection and neurodegenerative diseases, and S2R mainly expressed in cancer cells and associated with cell proliferation and apoptosis, as well as in neurons. Although S1R and S2R exhibit structural differences in receptor architecture and assembly, they share similar binding site features and ligand recognition mechanisms. This similarity underscores the importance of identifying selective ligands for therapeutic design, especially given the distinct physiological functions of these receptors. In this project, we developed three distinct machine learning (ML) approaches based on classification, regression, and multiclassification models to predict the activity and selectivity profiles of SR ligands. High-quality data sets were curated from public and in-house source; in turn, the data sets were systematically organized and processed for each workflow. Models were built using molecular descriptors and fingerprints, including Mordred, RDKit, ECFP4, ECFP6, and MACCS keys, and trained with various ML algorithms such as extra trees, random forest, support vector machine, k-nearest neighbors, and XGBoost. Rigorous nested and classical 5-fold cross-validation protocols were applied for model selection and validation. At the end, identification of the best workflow was performed by an external validation procedure. Among the workflows, the one-step multiclassification approach, based on extra trees combined with Mordred descriptors, showed the best predictive performance in external validation, offering a robust tool for the identification of selective S1R and S2R ligands.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 18","pages":"9697–9712"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01091","citationCount":"0","resultStr":"{\"title\":\"Prediction of Activity and Selectivity Profiles of Sigma Receptor Ligands Using Machine Learning Approaches\",\"authors\":\"Lisa Lombardo, , , Verena Battisti, , , Thierry Langer, , , Rosaria Gitto, , and , Laura De Luca*, \",\"doi\":\"10.1021/acs.jcim.5c01091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sigma (σ) receptors (SRs) have emerged as important therapeutic targets due to their roles in various biological pathways. They are classified into two subtypes: S1R, primarily distributed in the central nervous system and related to neuroprotection and neurodegenerative diseases, and S2R mainly expressed in cancer cells and associated with cell proliferation and apoptosis, as well as in neurons. Although S1R and S2R exhibit structural differences in receptor architecture and assembly, they share similar binding site features and ligand recognition mechanisms. This similarity underscores the importance of identifying selective ligands for therapeutic design, especially given the distinct physiological functions of these receptors. In this project, we developed three distinct machine learning (ML) approaches based on classification, regression, and multiclassification models to predict the activity and selectivity profiles of SR ligands. High-quality data sets were curated from public and in-house source; in turn, the data sets were systematically organized and processed for each workflow. Models were built using molecular descriptors and fingerprints, including Mordred, RDKit, ECFP4, ECFP6, and MACCS keys, and trained with various ML algorithms such as extra trees, random forest, support vector machine, k-nearest neighbors, and XGBoost. Rigorous nested and classical 5-fold cross-validation protocols were applied for model selection and validation. At the end, identification of the best workflow was performed by an external validation procedure. Among the workflows, the one-step multiclassification approach, based on extra trees combined with Mordred descriptors, showed the best predictive performance in external validation, offering a robust tool for the identification of selective S1R and S2R ligands.\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\"65 18\",\"pages\":\"9697–9712\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01091\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jcim.5c01091\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c01091","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

摘要

Sigma （σ）受体（SRs）由于其在多种生物学途径中的作用而成为重要的治疗靶点。S2R主要分布于中枢神经系统，与神经保护和神经退行性疾病有关；S2R主要表达于癌细胞中，与细胞增殖和凋亡有关，也存在于神经元中。尽管S1R和S2R在受体结构和组装上表现出结构上的差异，但它们具有相似的结合位点特征和配体识别机制。这种相似性强调了识别选择性配体对治疗设计的重要性，特别是考虑到这些受体的不同生理功能。在这个项目中，我们基于分类、回归和多分类模型开发了三种不同的机器学习（ML）方法来预测SR配体的活性和选择性。从公共和内部资源中整理出高质量的数据集；然后，系统地组织和处理每个工作流的数据集。使用分子描述符和指纹（包括Mordred、RDKit、ECFP4、ECFP6和MACCS密钥）构建模型，并使用额外树、随机森林、支持向量机、k近邻和XGBoost等各种ML算法进行训练。采用严格的嵌套和经典的五重交叉验证协议进行模型选择和验证。最后，通过外部验证程序确定最佳工作流程。其中，基于额外树和Mordred描述符的一步多分类方法在外部验证中表现出最好的预测性能，为选择性S1R和S2R配体的识别提供了一个强大的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Prediction of Activity and Selectivity Profiles of Sigma Receptor Ligands Using Machine Learning Approaches

查看原文本刊更多论文

Prediction of Activity and Selectivity Profiles of Sigma Receptor Ligands Using Machine Learning Approaches

Sigma (σ) receptors (SRs) have emerged as important therapeutic targets due to their roles in various biological pathways. They are classified into two subtypes: S1R, primarily distributed in the central nervous system and related to neuroprotection and neurodegenerative diseases, and S2R mainly expressed in cancer cells and associated with cell proliferation and apoptosis, as well as in neurons. Although S1R and S2R exhibit structural differences in receptor architecture and assembly, they share similar binding site features and ligand recognition mechanisms. This similarity underscores the importance of identifying selective ligands for therapeutic design, especially given the distinct physiological functions of these receptors. In this project, we developed three distinct machine learning (ML) approaches based on classification, regression, and multiclassification models to predict the activity and selectivity profiles of SR ligands. High-quality data sets were curated from public and in-house source; in turn, the data sets were systematically organized and processed for each workflow. Models were built using molecular descriptors and fingerprints, including Mordred, RDKit, ECFP4, ECFP6, and MACCS keys, and trained with various ML algorithms such as extra trees, random forest, support vector machine, k-nearest neighbors, and XGBoost. Rigorous nested and classical 5-fold cross-validation protocols were applied for model selection and validation. At the end, identification of the best workflow was performed by an external validation procedure. Among the workflows, the one-step multiclassification approach, based on extra trees combined with Mordred descriptors, showed the best predictive performance in external validation, offering a robust tool for the identification of selective S1R and S2R ligands.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.