{"title":"Optimal selection of learning data for highly accurate QSAR prediction of chemical biodegradability: a machine learning-based approach.","authors":"K Takeda, K Takeuchi, Y Sakuratani, K Kimbara","doi":"10.1080/1062936X.2023.2251889","DOIUrl":null,"url":null,"abstract":"<p><p>Prior to the manufacture of new chemicals, regulations mandate a thorough review of the chemicals under risk management. This review involves evaluating their effects on the environment and human health. To assess these effects, a review report that conforms to the OECD Test Guidelines must be submitted to the regulatory body. One of the essential components of the report is an assessment of the biodegradability of chemicals in the environment. In addition to conventional methods, quantitative structure-activity relationship (QSAR) models have been developed to predict the properties of chemicals based on their structural features. Although a greater number of chemicals in the learning set may enhance the prediction accuracy, it may also lead to a decrease in accuracy due to the mixing of different structural features and properties of the chemicals. To improve the prediction performance, it is recommended to use only the appropriate data for biodegradability prediction as a training set. In this study, we propose a novel approach for the optimal selection of training set that enables a highly accurate prediction of the biodegradability of chemicals by QSAR. Our findings indicate that the proposed method effectively reduces the root mean squared error and improves the prediction accuracy.</p>","PeriodicalId":21446,"journal":{"name":"SAR and QSAR in Environmental Research","volume":"34 9","pages":"729-743"},"PeriodicalIF":2.3000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAR and QSAR in Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1080/1062936X.2023.2251889","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/7 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Prior to the manufacture of new chemicals, regulations mandate a thorough review of the chemicals under risk management. This review involves evaluating their effects on the environment and human health. To assess these effects, a review report that conforms to the OECD Test Guidelines must be submitted to the regulatory body. One of the essential components of the report is an assessment of the biodegradability of chemicals in the environment. In addition to conventional methods, quantitative structure-activity relationship (QSAR) models have been developed to predict the properties of chemicals based on their structural features. Although a greater number of chemicals in the learning set may enhance the prediction accuracy, it may also lead to a decrease in accuracy due to the mixing of different structural features and properties of the chemicals. To improve the prediction performance, it is recommended to use only the appropriate data for biodegradability prediction as a training set. In this study, we propose a novel approach for the optimal selection of training set that enables a highly accurate prediction of the biodegradability of chemicals by QSAR. Our findings indicate that the proposed method effectively reduces the root mean squared error and improves the prediction accuracy.
期刊介绍:
SAR and QSAR in Environmental Research is an international journal welcoming papers on the fundamental and practical aspects of the structure-activity and structure-property relationships in the fields of environmental science, agrochemistry, toxicology, pharmacology and applied chemistry. A unique aspect of the journal is the focus on emerging techniques for the building of SAR and QSAR models in these widely varying fields. The scope of the journal includes, but is not limited to, the topics of topological and physicochemical descriptors, mathematical, statistical and graphical methods for data analysis, computer methods and programs, original applications and comparative studies. In addition to primary scientific papers, the journal contains reviews of books and software and news of conferences. Special issues on topics of current and widespread interest to the SAR and QSAR community will be published from time to time.