L1-正则化方法在QSAR问题中的应用。线性回归与人工神经网络

IF 0.7 Q4 CHEMISTRY, ANALYTICAL
M. I. Berdnyk, A. B. Zakharov, V. Ivanov
{"title":"L1-正则化方法在QSAR问题中的应用。线性回归与人工神经网络","authors":"M. I. Berdnyk, A. B. Zakharov, V. Ivanov","doi":"10.17721/moca.2019.79-90","DOIUrl":null,"url":null,"abstract":"One of the primary tasks of analytical chemistry and QSAR/QSPR researches is building of prognostic regression equations based on descriptors sets. The one of the most important problems here is to decrease the number of descriptors in the initial descriptor set which is usually way too big. In current investigation the descriptor set is proposed to be reduced employing the least absolute shrinkage and selection operator (LASSO) approach. Decreased descriptor sets were used for calculations with application of the following QSAR/QSPR methods: ordinary least squares (OLS), the least absolute deviation (LAD) regressions and artificial neural networks (ANN). Contrary to aforementioned methods principal component regression (PCR) and partial least squares (PLS) approaches can produce solutions containing numerous descriptors. In this article we compared the viability of these two different descriptor handling ideologies in application to molecular chemical and physical properties prediction. From the obtained results it is possible to see that there are tasks for which PCR and PLS approaches can fail to produce accurate regression equations. At the same time, methods OLS and LAD that use small amount of descriptors can provide viable solutions for the same cases. It was shown that these small sets of descriptors selected with LASSO approach can be used in ANN to obtain models with even better internal validation characteristics.","PeriodicalId":18626,"journal":{"name":"Methods and Objects of Chemical Analysis","volume":"1 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application Of L1- Regularization Approach In QSAR Problem. Linear Regression And Artificial Neural Networks\",\"authors\":\"M. I. Berdnyk, A. B. Zakharov, V. Ivanov\",\"doi\":\"10.17721/moca.2019.79-90\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the primary tasks of analytical chemistry and QSAR/QSPR researches is building of prognostic regression equations based on descriptors sets. The one of the most important problems here is to decrease the number of descriptors in the initial descriptor set which is usually way too big. In current investigation the descriptor set is proposed to be reduced employing the least absolute shrinkage and selection operator (LASSO) approach. Decreased descriptor sets were used for calculations with application of the following QSAR/QSPR methods: ordinary least squares (OLS), the least absolute deviation (LAD) regressions and artificial neural networks (ANN). Contrary to aforementioned methods principal component regression (PCR) and partial least squares (PLS) approaches can produce solutions containing numerous descriptors. In this article we compared the viability of these two different descriptor handling ideologies in application to molecular chemical and physical properties prediction. From the obtained results it is possible to see that there are tasks for which PCR and PLS approaches can fail to produce accurate regression equations. At the same time, methods OLS and LAD that use small amount of descriptors can provide viable solutions for the same cases. It was shown that these small sets of descriptors selected with LASSO approach can be used in ANN to obtain models with even better internal validation characteristics.\",\"PeriodicalId\":18626,\"journal\":{\"name\":\"Methods and Objects of Chemical Analysis\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods and Objects of Chemical Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17721/moca.2019.79-90\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods and Objects of Chemical Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17721/moca.2019.79-90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

摘要

基于描述符集的预测回归方程是分析化学和QSAR/QSPR研究的主要任务之一。这里最重要的问题之一是减少初始描述符集中的描述符数量,因为初始描述符通常太大了。在目前的研究中,提出了使用最小绝对收缩和选择算子(LASSO)方法来减少描述符集。减少描述符集用于应用以下QSAR/QSPR方法进行计算:普通最小二乘(OLS),最小绝对偏差(LAD)回归和人工神经网络(ANN)。与上述方法相反,主成分回归(PCR)和偏最小二乘(PLS)方法可以产生包含许多描述符的解决方案。本文比较了这两种描述符处理思想在分子化学和物理性质预测中的应用可行性。从所获得的结果可以看出,有任务的PCR和PLS方法可能无法产生准确的回归方程。同时,使用少量描述符的OLS和LAD方法可以为相同的情况提供可行的解决方案。结果表明,用LASSO方法选择的小描述符集可以在人工神经网络中得到具有更好内部验证特征的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application Of L1- Regularization Approach In QSAR Problem. Linear Regression And Artificial Neural Networks
One of the primary tasks of analytical chemistry and QSAR/QSPR researches is building of prognostic regression equations based on descriptors sets. The one of the most important problems here is to decrease the number of descriptors in the initial descriptor set which is usually way too big. In current investigation the descriptor set is proposed to be reduced employing the least absolute shrinkage and selection operator (LASSO) approach. Decreased descriptor sets were used for calculations with application of the following QSAR/QSPR methods: ordinary least squares (OLS), the least absolute deviation (LAD) regressions and artificial neural networks (ANN). Contrary to aforementioned methods principal component regression (PCR) and partial least squares (PLS) approaches can produce solutions containing numerous descriptors. In this article we compared the viability of these two different descriptor handling ideologies in application to molecular chemical and physical properties prediction. From the obtained results it is possible to see that there are tasks for which PCR and PLS approaches can fail to produce accurate regression equations. At the same time, methods OLS and LAD that use small amount of descriptors can provide viable solutions for the same cases. It was shown that these small sets of descriptors selected with LASSO approach can be used in ANN to obtain models with even better internal validation characteristics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.00
自引率
14.30%
发文量
12
期刊介绍: The journal "Methods and objects of chemical analysis" is peer-review journal and publishes original articles of theoretical and experimental analysis on topical issues and bio-analytical chemistry, chemical and pharmaceutical analysis, as well as chemical metrology. Submitted works shall cover the results of completed studies and shall make scientific contributions to the relevant area of expertise. The journal publishes review articles, research articles and articles related to latest developments of analytical instrumentations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信