Unstructured Text Data Rule Generalization in Attribute-Oriented Induction

Yi-Ning Tu, Cheng-Yi Kuo
{"title":"Unstructured Text Data Rule Generalization in Attribute-Oriented Induction","authors":"Yi-Ning Tu, Cheng-Yi Kuo","doi":"10.1109/ICKII55100.2022.9983603","DOIUrl":null,"url":null,"abstract":"With the advancement of science and technology, algorithms for data mining are constantly being introduced but most of the algorithms focus on solving the problem of how to reduce the dimension of variables. Nowadays, the amount of data is accumulating and increasing rapidly, and how to compress the amount of data becomes an important topic. Attribute-oriented induction (AOI) is a method of longitudinally compressing data. At present, there are many derivative methods to solve the problem of multi-value-attributes and ordered data. However, few researchers have focused on unstructured text data. Nowadays, the most common way to process text data for classification and induction is to use Latent Dirichlet Allocation (LDA) and generate attributes of the topic. However, the attributes of the topic do not consider whether they exist or not a potential hierarchical relationship. As the topics of text data are hierarchical, we focus on unstructured data such as comment content, transform the comment content into concept datasets, and use AOI to construct a reasonable hierarchical concept. Through induction cost, the final level of induction is determined, too. Finally, the induction rate of the topics generated by the LDA and AOI methods is evaluated to solve the problem of data compression with multiple events and hierarchical relationships among concepts.","PeriodicalId":352222,"journal":{"name":"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKII55100.2022.9983603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the advancement of science and technology, algorithms for data mining are constantly being introduced but most of the algorithms focus on solving the problem of how to reduce the dimension of variables. Nowadays, the amount of data is accumulating and increasing rapidly, and how to compress the amount of data becomes an important topic. Attribute-oriented induction (AOI) is a method of longitudinally compressing data. At present, there are many derivative methods to solve the problem of multi-value-attributes and ordered data. However, few researchers have focused on unstructured text data. Nowadays, the most common way to process text data for classification and induction is to use Latent Dirichlet Allocation (LDA) and generate attributes of the topic. However, the attributes of the topic do not consider whether they exist or not a potential hierarchical relationship. As the topics of text data are hierarchical, we focus on unstructured data such as comment content, transform the comment content into concept datasets, and use AOI to construct a reasonable hierarchical concept. Through induction cost, the final level of induction is determined, too. Finally, the induction rate of the topics generated by the LDA and AOI methods is evaluated to solve the problem of data compression with multiple events and hierarchical relationships among concepts.
面向属性归纳法中的非结构化文本数据规则泛化
随着科学技术的进步,数据挖掘的算法不断被引入,但大多数算法都集中在解决变量的降维问题上。在数据量不断积累和快速增长的今天,如何压缩数据量成为一个重要的课题。面向属性的归纳(AOI)是一种纵向压缩数据的方法。目前,解决多值属性和有序数据问题的派生方法有很多。然而,很少有研究人员关注非结构化文本数据。目前,对文本数据进行分类和归纳最常用的方法是使用潜狄利克雷分配(Latent Dirichlet Allocation, LDA)并生成主题属性。然而,主题的属性并不考虑它们是否存在潜在的层次关系。由于文本数据的主题是层次化的,我们将重点放在评论内容等非结构化数据上,将评论内容转化为概念数据集,并利用AOI构造合理的层次化概念。通过诱导成本来确定最终的诱导水平。最后,对LDA和AOI方法生成的主题的归纳率进行了评价,以解决多事件和概念间层次关系的数据压缩问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信