I. Sharma, Abhay Sharma, Rekha Chaturvedi, Jitendra Rajpurohit, M. Kumar
{"title":"SKIFF:用于文本文档聚类的具有迭代特征过滤的球形K-means","authors":"I. Sharma, Abhay Sharma, Rekha Chaturvedi, Jitendra Rajpurohit, M. Kumar","doi":"10.1177/01655515231165230","DOIUrl":null,"url":null,"abstract":"Text clustering has been an overlooked field of text mining that requires more attention. Several applications require automatic text organisation which relies on an information retrieval system based on organised search results. Spherical k-means is a successful adaptation of the classic k-means algorithm for text clustering. However, conventional methods to accelerate k-means may not apply to spherical k-means due to the different nature of text document data. The proposed work introduces an iterative feature filtering technique that reduces the data size during the process of clustering which further produces more feature-relevant clusters in less time compared to classic spherical k-means. The novelty of the proposed method is that feature assessment is distinct from the objective function of clustering and derived from the cluster structure. Experimental results show that the proposed scheme achieves computation speed without sacrificing cluster quality over popular text corpora. The demonstrated results are satisfactory and outperform compared to recent works in this domain.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SKIFF: Spherical K-means with iterative feature filtering for text document clustering\",\"authors\":\"I. Sharma, Abhay Sharma, Rekha Chaturvedi, Jitendra Rajpurohit, M. Kumar\",\"doi\":\"10.1177/01655515231165230\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text clustering has been an overlooked field of text mining that requires more attention. Several applications require automatic text organisation which relies on an information retrieval system based on organised search results. Spherical k-means is a successful adaptation of the classic k-means algorithm for text clustering. However, conventional methods to accelerate k-means may not apply to spherical k-means due to the different nature of text document data. The proposed work introduces an iterative feature filtering technique that reduces the data size during the process of clustering which further produces more feature-relevant clusters in less time compared to classic spherical k-means. The novelty of the proposed method is that feature assessment is distinct from the objective function of clustering and derived from the cluster structure. Experimental results show that the proposed scheme achieves computation speed without sacrificing cluster quality over popular text corpora. The demonstrated results are satisfactory and outperform compared to recent works in this domain.\",\"PeriodicalId\":54796,\"journal\":{\"name\":\"Journal of Information Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1177/01655515231165230\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/01655515231165230","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
SKIFF: Spherical K-means with iterative feature filtering for text document clustering
Text clustering has been an overlooked field of text mining that requires more attention. Several applications require automatic text organisation which relies on an information retrieval system based on organised search results. Spherical k-means is a successful adaptation of the classic k-means algorithm for text clustering. However, conventional methods to accelerate k-means may not apply to spherical k-means due to the different nature of text document data. The proposed work introduces an iterative feature filtering technique that reduces the data size during the process of clustering which further produces more feature-relevant clusters in less time compared to classic spherical k-means. The novelty of the proposed method is that feature assessment is distinct from the objective function of clustering and derived from the cluster structure. Experimental results show that the proposed scheme achieves computation speed without sacrificing cluster quality over popular text corpora. The demonstrated results are satisfactory and outperform compared to recent works in this domain.
期刊介绍:
The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.