使用杜威十进分类法的文本分类关系

J. Watthananon
{"title":"使用杜威十进分类法的文本分类关系","authors":"J. Watthananon","doi":"10.1109/ICTKE.2014.7001538","DOIUrl":null,"url":null,"abstract":"Now a day, the massive amount of data and information (recently termed as “Big Data”) causes accessibility and retrieval problems if poorly managed. This is due to their relational structure which is more complicate, unexplainable, and unanalyzable with simple or traditional methods. The uniform display of these data and information is also difficult due to their diversified formats. Bag of Words (BOW), the mostly used data sorting method, is although simple but the significance of synonymity is overlooked. The objective of this research study is to propose method in determining massively scattered data (as electronic documents). The linking of related data is also supported by the application of Dewey Decimal Classification (DDC) technique. DDC was employed in data processing, analyzing, and displaying with appropriate method in form of Mind Map. The accuracy test was performed on the data from the “Wikipedia Selection for schools”, a sub version of Wikipedia, to determine the efficiency among four models: DDC: Dewey decimal classification, SVM: Support Vector Machine, K-Mean Clustering and Hierarchical Clustering. The results indicated that DDC yielded the most accuracy (75.02%), followed by the Hierarchical models (74.66%), while both K-Mean and SVM yielded the similar accuracy (72.66%). And the time in process is K-Mean Clustering was best time more than other models (16.09 second).","PeriodicalId":120743,"journal":{"name":"2014 Twelfth International Conference on ICT and Knowledge Engineering","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"The relationship of text categorization using Dewey Decimal Classification techniques\",\"authors\":\"J. Watthananon\",\"doi\":\"10.1109/ICTKE.2014.7001538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Now a day, the massive amount of data and information (recently termed as “Big Data”) causes accessibility and retrieval problems if poorly managed. This is due to their relational structure which is more complicate, unexplainable, and unanalyzable with simple or traditional methods. The uniform display of these data and information is also difficult due to their diversified formats. Bag of Words (BOW), the mostly used data sorting method, is although simple but the significance of synonymity is overlooked. The objective of this research study is to propose method in determining massively scattered data (as electronic documents). The linking of related data is also supported by the application of Dewey Decimal Classification (DDC) technique. DDC was employed in data processing, analyzing, and displaying with appropriate method in form of Mind Map. The accuracy test was performed on the data from the “Wikipedia Selection for schools”, a sub version of Wikipedia, to determine the efficiency among four models: DDC: Dewey decimal classification, SVM: Support Vector Machine, K-Mean Clustering and Hierarchical Clustering. The results indicated that DDC yielded the most accuracy (75.02%), followed by the Hierarchical models (74.66%), while both K-Mean and SVM yielded the similar accuracy (72.66%). And the time in process is K-Mean Clustering was best time more than other models (16.09 second).\",\"PeriodicalId\":120743,\"journal\":{\"name\":\"2014 Twelfth International Conference on ICT and Knowledge Engineering\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Twelfth International Conference on ICT and Knowledge Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTKE.2014.7001538\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Twelfth International Conference on ICT and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTKE.2014.7001538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

如今,如果管理不善,大量的数据和信息(最近被称为“大数据”)会导致可访问性和检索问题。这是由于它们的关系结构更加复杂,无法解释,无法用简单或传统的方法分析。这些数据和信息由于格式多样,难以统一显示。word Bag (BOW)是目前最常用的数据排序方法,虽然简单,但忽略了同义性的重要性。本研究的目的是提出确定大量分散数据(如电子文档)的方法。杜威十进分类法(Dewey Decimal Classification, DDC)的应用也支持了相关数据的链接。DDC以思维导图的形式对数据进行处理、分析和显示。对维基百科的子版本“Wikipedia Selection for schools”的数据进行准确性测试,以确定DDC: Dewey十进分类、SVM:支持向量机、K-Mean聚类和分层聚类四种模型的效率。结果表明,DDC模型的准确率最高(75.02%),其次是分层模型(74.66%),K-Mean和SVM的准确率相近(72.66%)。K-Mean聚类在处理时间上优于其他模型(16.09秒)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The relationship of text categorization using Dewey Decimal Classification techniques
Now a day, the massive amount of data and information (recently termed as “Big Data”) causes accessibility and retrieval problems if poorly managed. This is due to their relational structure which is more complicate, unexplainable, and unanalyzable with simple or traditional methods. The uniform display of these data and information is also difficult due to their diversified formats. Bag of Words (BOW), the mostly used data sorting method, is although simple but the significance of synonymity is overlooked. The objective of this research study is to propose method in determining massively scattered data (as electronic documents). The linking of related data is also supported by the application of Dewey Decimal Classification (DDC) technique. DDC was employed in data processing, analyzing, and displaying with appropriate method in form of Mind Map. The accuracy test was performed on the data from the “Wikipedia Selection for schools”, a sub version of Wikipedia, to determine the efficiency among four models: DDC: Dewey decimal classification, SVM: Support Vector Machine, K-Mean Clustering and Hierarchical Clustering. The results indicated that DDC yielded the most accuracy (75.02%), followed by the Hierarchical models (74.66%), while both K-Mean and SVM yielded the similar accuracy (72.66%). And the time in process is K-Mean Clustering was best time more than other models (16.09 second).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信