基于改进正弦余弦算法的文本文档聚类方法

IF 2 4区 计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS
Branislav Radomirović, Vuk Jovanović, B. Nikolić, Sasa Stojanovic, K. Venkatachalam, M. Zivkovic, A. Njeguš, N. Bačanin, I. Strumberger
{"title":"基于改进正弦余弦算法的文本文档聚类方法","authors":"Branislav Radomirović, Vuk Jovanović, B. Nikolić, Sasa Stojanovic, K. Venkatachalam, M. Zivkovic, A. Njeguš, N. Bačanin, I. Strumberger","doi":"10.5755/j01.itc.52.2.33536","DOIUrl":null,"url":null,"abstract":"Due to the vast amounts of textual data available in various forms such as online content, social media comments, corporate data, public e-services and media data, text clustering has been experiencing rapid development. Text clustering involves categorizing and grouping similar content. It is a process of identifying significant patterns from unstructured textual data. Algorithms are being developed globally to extract useful and relevant information from large amounts of text data. Measuring the significance of content in documents to partition the collection of text data is one of the most important obstacles in text clustering. This study suggests utilizing an improved metaheuristics algorithm to fine-tune the K-means approach for text clustering task. The suggested technique is evaluated using the first 30 unconstrained test functions from the CEC2017 test-suite and six standard criterion text datasets. The simulation results and comparison with existing techniques demonstrate the robustness and supremacy of the suggested method.","PeriodicalId":54982,"journal":{"name":"Information Technology and Control","volume":"57 1","pages":"541-561"},"PeriodicalIF":2.0000,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Text Document Clustering Approach by Improved Sine Cosine Algorithm\",\"authors\":\"Branislav Radomirović, Vuk Jovanović, B. Nikolić, Sasa Stojanovic, K. Venkatachalam, M. Zivkovic, A. Njeguš, N. Bačanin, I. Strumberger\",\"doi\":\"10.5755/j01.itc.52.2.33536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the vast amounts of textual data available in various forms such as online content, social media comments, corporate data, public e-services and media data, text clustering has been experiencing rapid development. Text clustering involves categorizing and grouping similar content. It is a process of identifying significant patterns from unstructured textual data. Algorithms are being developed globally to extract useful and relevant information from large amounts of text data. Measuring the significance of content in documents to partition the collection of text data is one of the most important obstacles in text clustering. This study suggests utilizing an improved metaheuristics algorithm to fine-tune the K-means approach for text clustering task. The suggested technique is evaluated using the first 30 unconstrained test functions from the CEC2017 test-suite and six standard criterion text datasets. The simulation results and comparison with existing techniques demonstrate the robustness and supremacy of the suggested method.\",\"PeriodicalId\":54982,\"journal\":{\"name\":\"Information Technology and Control\",\"volume\":\"57 1\",\"pages\":\"541-561\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2023-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Technology and Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.5755/j01.itc.52.2.33536\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Technology and Control","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.5755/j01.itc.52.2.33536","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 1

摘要

由于在线内容、社交媒体评论、企业数据、公共电子服务和媒体数据等形式的文本数据数量庞大,文本聚类得到了快速发展。文本聚类涉及对相似内容进行分类和分组。它是从非结构化文本数据中识别重要模式的过程。全球正在开发算法,以便从大量文本数据中提取有用和相关的信息。度量文档中内容的重要程度来划分文本数据集合是文本聚类的一个重要障碍。本研究建议利用改进的元启发式算法对文本聚类任务的K-means方法进行微调。使用来自CEC2017测试套件的前30个无约束测试函数和6个标准标准文本数据集对建议的技术进行评估。仿真结果和与现有方法的比较表明了该方法的鲁棒性和优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Text Document Clustering Approach by Improved Sine Cosine Algorithm
Due to the vast amounts of textual data available in various forms such as online content, social media comments, corporate data, public e-services and media data, text clustering has been experiencing rapid development. Text clustering involves categorizing and grouping similar content. It is a process of identifying significant patterns from unstructured textual data. Algorithms are being developed globally to extract useful and relevant information from large amounts of text data. Measuring the significance of content in documents to partition the collection of text data is one of the most important obstacles in text clustering. This study suggests utilizing an improved metaheuristics algorithm to fine-tune the K-means approach for text clustering task. The suggested technique is evaluated using the first 30 unconstrained test functions from the CEC2017 test-suite and six standard criterion text datasets. The simulation results and comparison with existing techniques demonstrate the robustness and supremacy of the suggested method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Technology and Control
Information Technology and Control 工程技术-计算机:人工智能
CiteScore
2.70
自引率
9.10%
发文量
36
审稿时长
12 months
期刊介绍: Periodical journal covers a wide field of computer science and control systems related problems including: -Software and hardware engineering; -Management systems engineering; -Information systems and databases; -Embedded systems; -Physical systems modelling and application; -Computer networks and cloud computing; -Data visualization; -Human-computer interface; -Computer graphics, visual analytics, and multimedia systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信