{"title":"面向属性归纳法中的非结构化文本数据规则泛化","authors":"Yi-Ning Tu, Cheng-Yi Kuo","doi":"10.1109/ICKII55100.2022.9983603","DOIUrl":null,"url":null,"abstract":"With the advancement of science and technology, algorithms for data mining are constantly being introduced but most of the algorithms focus on solving the problem of how to reduce the dimension of variables. Nowadays, the amount of data is accumulating and increasing rapidly, and how to compress the amount of data becomes an important topic. Attribute-oriented induction (AOI) is a method of longitudinally compressing data. At present, there are many derivative methods to solve the problem of multi-value-attributes and ordered data. However, few researchers have focused on unstructured text data. Nowadays, the most common way to process text data for classification and induction is to use Latent Dirichlet Allocation (LDA) and generate attributes of the topic. However, the attributes of the topic do not consider whether they exist or not a potential hierarchical relationship. As the topics of text data are hierarchical, we focus on unstructured data such as comment content, transform the comment content into concept datasets, and use AOI to construct a reasonable hierarchical concept. Through induction cost, the final level of induction is determined, too. Finally, the induction rate of the topics generated by the LDA and AOI methods is evaluated to solve the problem of data compression with multiple events and hierarchical relationships among concepts.","PeriodicalId":352222,"journal":{"name":"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unstructured Text Data Rule Generalization in Attribute-Oriented Induction\",\"authors\":\"Yi-Ning Tu, Cheng-Yi Kuo\",\"doi\":\"10.1109/ICKII55100.2022.9983603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advancement of science and technology, algorithms for data mining are constantly being introduced but most of the algorithms focus on solving the problem of how to reduce the dimension of variables. Nowadays, the amount of data is accumulating and increasing rapidly, and how to compress the amount of data becomes an important topic. Attribute-oriented induction (AOI) is a method of longitudinally compressing data. At present, there are many derivative methods to solve the problem of multi-value-attributes and ordered data. However, few researchers have focused on unstructured text data. Nowadays, the most common way to process text data for classification and induction is to use Latent Dirichlet Allocation (LDA) and generate attributes of the topic. However, the attributes of the topic do not consider whether they exist or not a potential hierarchical relationship. As the topics of text data are hierarchical, we focus on unstructured data such as comment content, transform the comment content into concept datasets, and use AOI to construct a reasonable hierarchical concept. Through induction cost, the final level of induction is determined, too. Finally, the induction rate of the topics generated by the LDA and AOI methods is evaluated to solve the problem of data compression with multiple events and hierarchical relationships among concepts.\",\"PeriodicalId\":352222,\"journal\":{\"name\":\"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICKII55100.2022.9983603\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKII55100.2022.9983603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unstructured Text Data Rule Generalization in Attribute-Oriented Induction
With the advancement of science and technology, algorithms for data mining are constantly being introduced but most of the algorithms focus on solving the problem of how to reduce the dimension of variables. Nowadays, the amount of data is accumulating and increasing rapidly, and how to compress the amount of data becomes an important topic. Attribute-oriented induction (AOI) is a method of longitudinally compressing data. At present, there are many derivative methods to solve the problem of multi-value-attributes and ordered data. However, few researchers have focused on unstructured text data. Nowadays, the most common way to process text data for classification and induction is to use Latent Dirichlet Allocation (LDA) and generate attributes of the topic. However, the attributes of the topic do not consider whether they exist or not a potential hierarchical relationship. As the topics of text data are hierarchical, we focus on unstructured data such as comment content, transform the comment content into concept datasets, and use AOI to construct a reasonable hierarchical concept. Through induction cost, the final level of induction is determined, too. Finally, the induction rate of the topics generated by the LDA and AOI methods is evaluated to solve the problem of data compression with multiple events and hierarchical relationships among concepts.