{"title":"预处理方面降低QSAR问题的复杂性","authors":"L. Dumitriu, C. Segal, M. Craciun, A. Cocu","doi":"10.1109/IS.2008.4670547","DOIUrl":null,"url":null,"abstract":"Predictive Toxicology (PT) is one of the newest targets of the Knowledge Discovery in Databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. In real PT problems there is a very important topic to be considered: the huge number of the chemical descriptors. Irrelevant, redundant, noisy and unreliable data have a negative impact, therefore one of the main goals in KDD is to detect these undesirable proprieties and to eliminate or correct them. This assumes data cleaning, noise reduction and feature selection because the performance of the applied Machine Learning algorithms is strongly related with the quality of the data used. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.","PeriodicalId":305750,"journal":{"name":"2008 4th International IEEE Conference Intelligent Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Pre-processing aspects for complexity reduction of the QSAR problem\",\"authors\":\"L. Dumitriu, C. Segal, M. Craciun, A. Cocu\",\"doi\":\"10.1109/IS.2008.4670547\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Predictive Toxicology (PT) is one of the newest targets of the Knowledge Discovery in Databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. In real PT problems there is a very important topic to be considered: the huge number of the chemical descriptors. Irrelevant, redundant, noisy and unreliable data have a negative impact, therefore one of the main goals in KDD is to detect these undesirable proprieties and to eliminate or correct them. This assumes data cleaning, noise reduction and feature selection because the performance of the applied Machine Learning algorithms is strongly related with the quality of the data used. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.\",\"PeriodicalId\":305750,\"journal\":{\"name\":\"2008 4th International IEEE Conference Intelligent Systems\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 4th International IEEE Conference Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IS.2008.4670547\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 4th International IEEE Conference Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IS.2008.4670547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
预测毒理学(Predictive Toxicology, PT)是数据库知识发现(Knowledge Discovery in Databases, KDD)领域的最新研究目标之一。其目的是描述化合物的化学结构与生物和毒理学过程之间的关系。在实际PT问题中,有一个非常重要的问题需要考虑:大量的化学描述符。不相关的、冗余的、嘈杂的和不可靠的数据具有负面影响,因此KDD的主要目标之一是检测这些不需要的属性并消除或纠正它们。这假设了数据清理、降噪和特征选择,因为应用机器学习算法的性能与所使用数据的质量密切相关。在本文中,我们提出了在执行实际知识发现之前准备数据时可以考虑的一些问题。
Pre-processing aspects for complexity reduction of the QSAR problem
Predictive Toxicology (PT) is one of the newest targets of the Knowledge Discovery in Databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. In real PT problems there is a very important topic to be considered: the huge number of the chemical descriptors. Irrelevant, redundant, noisy and unreliable data have a negative impact, therefore one of the main goals in KDD is to detect these undesirable proprieties and to eliminate or correct them. This assumes data cleaning, noise reduction and feature selection because the performance of the applied Machine Learning algorithms is strongly related with the quality of the data used. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.