{"title":"数据库设计选择的遗传算法和数据挖掘技术","authors":"C. Koukouvinos, C. Parpoula, D. Simos","doi":"10.1109/ARES.2013.98","DOIUrl":null,"url":null,"abstract":"Nowadays, variable selection is fundamental to large dimensional statistical modelling problems, since large databases exist in diverse fields of science. In this paper, we benefit from the use of data mining tools and experimental designs in databases in order to select the most relevant variables for classification in regression problems in cases where observations and labels of a real-world dataset are available. Specifically, this study is of particular interest to use health data to identify the most significant variables containing all the necessary important information for classification and prediction of new data with respect to a certain effect (survival or death). The main goal is to determine the most important variables using methods that arise from the field of design of experiments combined with algorithmic concepts derived from data mining and metaheuristics. Our approach seems promising, since we are able to retrieve an optimal plan using only 6 runs of the available 8862 runs.","PeriodicalId":302747,"journal":{"name":"2013 International Conference on Availability, Reliability and Security","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Genetic Algorithm and Data Mining Techniques for Design Selection in Databases\",\"authors\":\"C. Koukouvinos, C. Parpoula, D. Simos\",\"doi\":\"10.1109/ARES.2013.98\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, variable selection is fundamental to large dimensional statistical modelling problems, since large databases exist in diverse fields of science. In this paper, we benefit from the use of data mining tools and experimental designs in databases in order to select the most relevant variables for classification in regression problems in cases where observations and labels of a real-world dataset are available. Specifically, this study is of particular interest to use health data to identify the most significant variables containing all the necessary important information for classification and prediction of new data with respect to a certain effect (survival or death). The main goal is to determine the most important variables using methods that arise from the field of design of experiments combined with algorithmic concepts derived from data mining and metaheuristics. Our approach seems promising, since we are able to retrieve an optimal plan using only 6 runs of the available 8862 runs.\",\"PeriodicalId\":302747,\"journal\":{\"name\":\"2013 International Conference on Availability, Reliability and Security\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Availability, Reliability and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ARES.2013.98\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Availability, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARES.2013.98","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Genetic Algorithm and Data Mining Techniques for Design Selection in Databases
Nowadays, variable selection is fundamental to large dimensional statistical modelling problems, since large databases exist in diverse fields of science. In this paper, we benefit from the use of data mining tools and experimental designs in databases in order to select the most relevant variables for classification in regression problems in cases where observations and labels of a real-world dataset are available. Specifically, this study is of particular interest to use health data to identify the most significant variables containing all the necessary important information for classification and prediction of new data with respect to a certain effect (survival or death). The main goal is to determine the most important variables using methods that arise from the field of design of experiments combined with algorithmic concepts derived from data mining and metaheuristics. Our approach seems promising, since we are able to retrieve an optimal plan using only 6 runs of the available 8862 runs.