使用K-Means算法对文档主题组进行聚类:澳大利亚驻雅加达大使馆媒体发布2006-2016

Wishnu Hardi, W. Kusuma, Sulistyo Basuki
{"title":"使用K-Means算法对文档主题组进行聚类:澳大利亚驻雅加达大使馆媒体发布2006-2016","authors":"Wishnu Hardi, W. Kusuma, Sulistyo Basuki","doi":"10.22146/bip.36451","DOIUrl":null,"url":null,"abstract":"Introduction. The Australian Embassy in Jakarta is storing a wide array of media release document. Analyzing particular and vital patterns of the documents collection is imperative as it will result in new insights and knowledge of significant topic groups of the documents.Methodology. K-Means was used algorithm as a non-hierarchical clustering method which partitioning data objects into clusters. The method works through minimizing data variation within cluster and maximizing data variation between clusters. Data Analysis.  Of the documents issued between 2006 and 2016, 839 documents were examined in order to determine term frequencies and to generate clusters. Evaluation was conducted by nominating an expert to validate the cluster result.Results and discussions. The result showed that there were 57 meaningful terms grouped into 3 clusters. “People to people links”, “economic cooperation”, and “human development” were chosen to represent topics of the Australian Embassy Jakarta media releases from 2006 to 2016.Conclusions. Text mining can be used to cluster topic groups of documents. It provides a more systematic clustering process as the text analysis is conducted through a number of stages with specifically set parameters.  ","PeriodicalId":31595,"journal":{"name":"Berkala Ilmu Perpustakaan dan Informasi","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Clustering topic groups of documents using K-Means algorithm: Australian Embassy Jakarta media releases 2006-2016\",\"authors\":\"Wishnu Hardi, W. Kusuma, Sulistyo Basuki\",\"doi\":\"10.22146/bip.36451\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction. The Australian Embassy in Jakarta is storing a wide array of media release document. Analyzing particular and vital patterns of the documents collection is imperative as it will result in new insights and knowledge of significant topic groups of the documents.Methodology. K-Means was used algorithm as a non-hierarchical clustering method which partitioning data objects into clusters. The method works through minimizing data variation within cluster and maximizing data variation between clusters. Data Analysis.  Of the documents issued between 2006 and 2016, 839 documents were examined in order to determine term frequencies and to generate clusters. Evaluation was conducted by nominating an expert to validate the cluster result.Results and discussions. The result showed that there were 57 meaningful terms grouped into 3 clusters. “People to people links”, “economic cooperation”, and “human development” were chosen to represent topics of the Australian Embassy Jakarta media releases from 2006 to 2016.Conclusions. Text mining can be used to cluster topic groups of documents. It provides a more systematic clustering process as the text analysis is conducted through a number of stages with specifically set parameters.  \",\"PeriodicalId\":31595,\"journal\":{\"name\":\"Berkala Ilmu Perpustakaan dan Informasi\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Berkala Ilmu Perpustakaan dan Informasi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22146/bip.36451\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Berkala Ilmu Perpustakaan dan Informasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/bip.36451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

介绍澳大利亚驻雅加达大使馆正在存放一系列媒体发布的文件。分析文档收集的特定和重要模式是必不可少的,因为这将导致对文档的重要主题组的新见解和知识。方法论K-Means算法是一种将数据对象划分为多个聚类的非层次聚类方法。该方法通过最小化集群内的数据变化和最大化集群之间的数据变化来工作。数据分析。在2006年至2016年间发布的文件中,839份文件被审查,以确定术语频率并生成聚类。通过提名一名专家对集群结果进行验证来进行评估。结果和讨论。结果表明,共有57个有意义的词条被分为3个聚类。“人文链接”、“经济合作”和“人类发展”被选为2006年至2016年澳大利亚驻雅加达大使馆媒体发布的主题。结论。文本挖掘可用于对文档的主题组进行聚类。它提供了一个更系统的聚类过程,因为文本分析是通过特定设置的参数经过多个阶段进行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Clustering topic groups of documents using K-Means algorithm: Australian Embassy Jakarta media releases 2006-2016
Introduction. The Australian Embassy in Jakarta is storing a wide array of media release document. Analyzing particular and vital patterns of the documents collection is imperative as it will result in new insights and knowledge of significant topic groups of the documents.Methodology. K-Means was used algorithm as a non-hierarchical clustering method which partitioning data objects into clusters. The method works through minimizing data variation within cluster and maximizing data variation between clusters. Data Analysis.  Of the documents issued between 2006 and 2016, 839 documents were examined in order to determine term frequencies and to generate clusters. Evaluation was conducted by nominating an expert to validate the cluster result.Results and discussions. The result showed that there were 57 meaningful terms grouped into 3 clusters. “People to people links”, “economic cooperation”, and “human development” were chosen to represent topics of the Australian Embassy Jakarta media releases from 2006 to 2016.Conclusions. Text mining can be used to cluster topic groups of documents. It provides a more systematic clustering process as the text analysis is conducted through a number of stages with specifically set parameters.  
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
13
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信