Akira Hara, Haruko Tanaka, T. Ichimura, T. Takahama
{"title":"基于聚类终端符号的遗传规划多属性数据知识获取","authors":"Akira Hara, Haruko Tanaka, T. Ichimura, T. Takahama","doi":"10.1504/IJKWI.2012.050286","DOIUrl":null,"url":null,"abstract":"Rule extraction from database by soft computing methods is important for knowledge acquisition. For example, knowledge from the web pages can be useful for information retrieval. When genetic programming (GP) is applied to rule extraction from a database, the attributes of data are often used for the terminal symbols. However, the real databases have a large number of attributes. Therefore, the size of the terminal set increases and the search space becomes vast. For improving the search performance, we propose new methods for dealing with the large-scale terminal set. In the methods, the terminal symbols are clustered based on the similarities of the attributes. In the beginning of search, by using the clusters for terminals instead of original attributes, the number of terminal symbols can be reduced. Therefore, the search space can be reduced. In the latter stage of search, by using the original attributes for terminal symbols, the local search is performed. We applied our proposed methods to two many-attribute datasets, the classification of molecules as a benchmark problem and the page rank learning for information retrieval. By comparison with the conventional GP, the proposed methods showed the faster evolutional speed and extracted more accurate rules.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Knowledge acquisition from many-attribute data by genetic programming with clustered terminal symbols\",\"authors\":\"Akira Hara, Haruko Tanaka, T. Ichimura, T. Takahama\",\"doi\":\"10.1504/IJKWI.2012.050286\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rule extraction from database by soft computing methods is important for knowledge acquisition. For example, knowledge from the web pages can be useful for information retrieval. When genetic programming (GP) is applied to rule extraction from a database, the attributes of data are often used for the terminal symbols. However, the real databases have a large number of attributes. Therefore, the size of the terminal set increases and the search space becomes vast. For improving the search performance, we propose new methods for dealing with the large-scale terminal set. In the methods, the terminal symbols are clustered based on the similarities of the attributes. In the beginning of search, by using the clusters for terminals instead of original attributes, the number of terminal symbols can be reduced. Therefore, the search space can be reduced. In the latter stage of search, by using the original attributes for terminal symbols, the local search is performed. We applied our proposed methods to two many-attribute datasets, the classification of molecules as a benchmark problem and the page rank learning for information retrieval. By comparison with the conventional GP, the proposed methods showed the faster evolutional speed and extracted more accurate rules.\",\"PeriodicalId\":113936,\"journal\":{\"name\":\"Int. J. Knowl. Web Intell.\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Knowl. Web Intell.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJKWI.2012.050286\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Web Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJKWI.2012.050286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Knowledge acquisition from many-attribute data by genetic programming with clustered terminal symbols
Rule extraction from database by soft computing methods is important for knowledge acquisition. For example, knowledge from the web pages can be useful for information retrieval. When genetic programming (GP) is applied to rule extraction from a database, the attributes of data are often used for the terminal symbols. However, the real databases have a large number of attributes. Therefore, the size of the terminal set increases and the search space becomes vast. For improving the search performance, we propose new methods for dealing with the large-scale terminal set. In the methods, the terminal symbols are clustered based on the similarities of the attributes. In the beginning of search, by using the clusters for terminals instead of original attributes, the number of terminal symbols can be reduced. Therefore, the search space can be reduced. In the latter stage of search, by using the original attributes for terminal symbols, the local search is performed. We applied our proposed methods to two many-attribute datasets, the classification of molecules as a benchmark problem and the page rank learning for information retrieval. By comparison with the conventional GP, the proposed methods showed the faster evolutional speed and extracted more accurate rules.