基于改进正弦余弦算法的文本文档聚类方法

IF 2 4区计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS

Information Technology and Control Pub Date : 2023-07-15 DOI:10.5755/j01.itc.52.2.33536

Branislav Radomirović, Vuk Jovanović, B. Nikolić, Sasa Stojanovic, K. Venkatachalam, M. Zivkovic, A. Njeguš, N. Bačanin, I. Strumberger

{"title":"基于改进正弦余弦算法的文本文档聚类方法","authors":"Branislav Radomirović, Vuk Jovanović, B. Nikolić, Sasa Stojanovic, K. Venkatachalam, M. Zivkovic, A. Njeguš, N. Bačanin, I. Strumberger","doi":"10.5755/j01.itc.52.2.33536","DOIUrl":null,"url":null,"abstract":"Due to the vast amounts of textual data available in various forms such as online content, social media comments, corporate data, public e-services and media data, text clustering has been experiencing rapid development. Text clustering involves categorizing and grouping similar content. It is a process of identifying significant patterns from unstructured textual data. Algorithms are being developed globally to extract useful and relevant information from large amounts of text data. Measuring the significance of content in documents to partition the collection of text data is one of the most important obstacles in text clustering. This study suggests utilizing an improved metaheuristics algorithm to fine-tune the K-means approach for text clustering task. The suggested technique is evaluated using the first 30 unconstrained test functions from the CEC2017 test-suite and six standard criterion text datasets. The simulation results and comparison with existing techniques demonstrate the robustness and supremacy of the suggested method.","PeriodicalId":54982,"journal":{"name":"Information Technology and Control","volume":"57 1","pages":"541-561"},"PeriodicalIF":2.0000,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Text Document Clustering Approach by Improved Sine Cosine Algorithm\",\"authors\":\"Branislav Radomirović, Vuk Jovanović, B. Nikolić, Sasa Stojanovic, K. Venkatachalam, M. Zivkovic, A. Njeguš, N. Bačanin, I. Strumberger\",\"doi\":\"10.5755/j01.itc.52.2.33536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the vast amounts of textual data available in various forms such as online content, social media comments, corporate data, public e-services and media data, text clustering has been experiencing rapid development. Text clustering involves categorizing and grouping similar content. It is a process of identifying significant patterns from unstructured textual data. Algorithms are being developed globally to extract useful and relevant information from large amounts of text data. Measuring the significance of content in documents to partition the collection of text data is one of the most important obstacles in text clustering. This study suggests utilizing an improved metaheuristics algorithm to fine-tune the K-means approach for text clustering task. The suggested technique is evaluated using the first 30 unconstrained test functions from the CEC2017 test-suite and six standard criterion text datasets. The simulation results and comparison with existing techniques demonstrate the robustness and supremacy of the suggested method.\",\"PeriodicalId\":54982,\"journal\":{\"name\":\"Information Technology and Control\",\"volume\":\"57 1\",\"pages\":\"541-561\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2023-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Technology and Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.5755/j01.itc.52.2.33536\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Technology and Control","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.5755/j01.itc.52.2.33536","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 1

摘要

由于在线内容、社交媒体评论、企业数据、公共电子服务和媒体数据等形式的文本数据数量庞大，文本聚类得到了快速发展。文本聚类涉及对相似内容进行分类和分组。它是从非结构化文本数据中识别重要模式的过程。全球正在开发算法，以便从大量文本数据中提取有用和相关的信息。度量文档中内容的重要程度来划分文本数据集合是文本聚类的一个重要障碍。本研究建议利用改进的元启发式算法对文本聚类任务的K-means方法进行微调。使用来自CEC2017测试套件的前30个无约束测试函数和6个标准标准文本数据集对建议的技术进行评估。仿真结果和与现有方法的比较表明了该方法的鲁棒性和优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text Document Clustering Approach by Improved Sine Cosine Algorithm

Due to the vast amounts of textual data available in various forms such as online content, social media comments, corporate data, public e-services and media data, text clustering has been experiencing rapid development. Text clustering involves categorizing and grouping similar content. It is a process of identifying significant patterns from unstructured textual data. Algorithms are being developed globally to extract useful and relevant information from large amounts of text data. Measuring the significance of content in documents to partition the collection of text data is one of the most important obstacles in text clustering. This study suggests utilizing an improved metaheuristics algorithm to fine-tune the K-means approach for text clustering task. The suggested technique is evaluated using the first 30 unconstrained test functions from the CEC2017 test-suite and six standard criterion text datasets. The simulation results and comparison with existing techniques demonstrate the robustness and supremacy of the suggested method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Technology and Control 工程技术-计算机：人工智能

CiteScore

2.70

自引率

9.10%

发文量

审稿时长

12 months

期刊介绍： Periodical journal covers a wide field of computer science and control systems related problems including: -Software and hardware engineering; -Management systems engineering; -Information systems and databases; -Embedded systems; -Physical systems modelling and application; -Computer networks and cloud computing; -Data visualization; -Human-computer interface; -Computer graphics, visual analytics, and multimedia systems.