{"title":"QSAR问题的预处理技术","authors":"L. Dumitriu, M. Craciun, A. Cocu, C. Segal","doi":"10.3233/978-1-58603-904-2-107","DOIUrl":null,"url":null,"abstract":"Predictive Toxicology (PT) attempts to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. The most important issue related to real-world PT problems is the huge number of the chemical descriptors. A secondary issue is the quality of the data since irrelevant, redundant, noisy, and unreliable data have a negative impact on the prediction results. The pre-processing step of Data Mining deals with complexity reduction as well as data quality improvement through feature selection, data cleaning, and noise reduction. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.","PeriodicalId":190102,"journal":{"name":"New Trends in Multimedia and Network Information Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Pre-processing Techniques for the QSAR Problem\",\"authors\":\"L. Dumitriu, M. Craciun, A. Cocu, C. Segal\",\"doi\":\"10.3233/978-1-58603-904-2-107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Predictive Toxicology (PT) attempts to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. The most important issue related to real-world PT problems is the huge number of the chemical descriptors. A secondary issue is the quality of the data since irrelevant, redundant, noisy, and unreliable data have a negative impact on the prediction results. The pre-processing step of Data Mining deals with complexity reduction as well as data quality improvement through feature selection, data cleaning, and noise reduction. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.\",\"PeriodicalId\":190102,\"journal\":{\"name\":\"New Trends in Multimedia and Network Information Systems\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"New Trends in Multimedia and Network Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/978-1-58603-904-2-107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Trends in Multimedia and Network Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/978-1-58603-904-2-107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Predictive Toxicology (PT) attempts to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. The most important issue related to real-world PT problems is the huge number of the chemical descriptors. A secondary issue is the quality of the data since irrelevant, redundant, noisy, and unreliable data have a negative impact on the prediction results. The pre-processing step of Data Mining deals with complexity reduction as well as data quality improvement through feature selection, data cleaning, and noise reduction. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.