{"title":"Pre-processing aspects for complexity reduction of the QSAR problem","authors":"L. Dumitriu, C. Segal, M. Craciun, A. Cocu","doi":"10.1109/IS.2008.4670547","DOIUrl":null,"url":null,"abstract":"Predictive Toxicology (PT) is one of the newest targets of the Knowledge Discovery in Databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. In real PT problems there is a very important topic to be considered: the huge number of the chemical descriptors. Irrelevant, redundant, noisy and unreliable data have a negative impact, therefore one of the main goals in KDD is to detect these undesirable proprieties and to eliminate or correct them. This assumes data cleaning, noise reduction and feature selection because the performance of the applied Machine Learning algorithms is strongly related with the quality of the data used. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.","PeriodicalId":305750,"journal":{"name":"2008 4th International IEEE Conference Intelligent Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 4th International IEEE Conference Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IS.2008.4670547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Predictive Toxicology (PT) is one of the newest targets of the Knowledge Discovery in Databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. In real PT problems there is a very important topic to be considered: the huge number of the chemical descriptors. Irrelevant, redundant, noisy and unreliable data have a negative impact, therefore one of the main goals in KDD is to detect these undesirable proprieties and to eliminate or correct them. This assumes data cleaning, noise reduction and feature selection because the performance of the applied Machine Learning algorithms is strongly related with the quality of the data used. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.