{"title":"基于最优特征选择和高效软计算分类器的最优文档聚类方法","authors":"Perumal Pitchandi, R. Kingsy Grace","doi":"10.1016/j.eswa.2025.128762","DOIUrl":null,"url":null,"abstract":"<div><div>In general, document grouping is an important area of text extraction commonly used for document organization, browsing, abstraction, and categorization. This is an important process used for data recovery, data processing and document management. Recently several document grouping methods have been suggested to improve system performance. However, these document grouping methods face serious challenges. The main problem with document grouping is choosing the appropriate document features and similar tools. Moreover, due to the high computational cost and memory usage of those grouping methods, they are not suitable for many documents that need to be processed on a daily basis. This paper presents the optimal method of document clustering based on hybrid optimization selection and efficient computer classification. The proposed method consists three tire processes. First, we introduce a fuzzy density fruit fly optimization (FD-FFO) algorithm for data pre-processing which removes the unwanted artifacts and redundant content from the documents. Second, we illustrate the teaching–learning-based Harris Hawks optimization (TL-HHO) algorithm for optimal feature selection which computes best and optimal features among multiple features in document. Then, we offer a support vector regression probabilistic neural network (SVR-PNN) for optimal document clustering which improves the performance of clustering. Finally, the proposed SVR-PNN method which is evaluated by Reuters, 20 Press database and Web-snippets database. The performance of proposed SVR-PNN method can compare with existing methods such as Rider-Moth Flame optimization algorithm (RMFO), Correlation Based Incremental Clustering Algorithm (CBICA), Incremental Construction of GMM Tree (ICGT) and Weighted Probabilistic Latent Semantic Analysis (WPLSA) using Precision, Accuracy, F-Measure and Recall.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"294 ","pages":"Article 128762"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An optimal document clustering method using hybrid optimal feature selection and efficient soft computing classifier\",\"authors\":\"Perumal Pitchandi, R. Kingsy Grace\",\"doi\":\"10.1016/j.eswa.2025.128762\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In general, document grouping is an important area of text extraction commonly used for document organization, browsing, abstraction, and categorization. This is an important process used for data recovery, data processing and document management. Recently several document grouping methods have been suggested to improve system performance. However, these document grouping methods face serious challenges. The main problem with document grouping is choosing the appropriate document features and similar tools. Moreover, due to the high computational cost and memory usage of those grouping methods, they are not suitable for many documents that need to be processed on a daily basis. This paper presents the optimal method of document clustering based on hybrid optimization selection and efficient computer classification. The proposed method consists three tire processes. First, we introduce a fuzzy density fruit fly optimization (FD-FFO) algorithm for data pre-processing which removes the unwanted artifacts and redundant content from the documents. Second, we illustrate the teaching–learning-based Harris Hawks optimization (TL-HHO) algorithm for optimal feature selection which computes best and optimal features among multiple features in document. Then, we offer a support vector regression probabilistic neural network (SVR-PNN) for optimal document clustering which improves the performance of clustering. Finally, the proposed SVR-PNN method which is evaluated by Reuters, 20 Press database and Web-snippets database. The performance of proposed SVR-PNN method can compare with existing methods such as Rider-Moth Flame optimization algorithm (RMFO), Correlation Based Incremental Clustering Algorithm (CBICA), Incremental Construction of GMM Tree (ICGT) and Weighted Probabilistic Latent Semantic Analysis (WPLSA) using Precision, Accuracy, F-Measure and Recall.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"294 \",\"pages\":\"Article 128762\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425023802\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425023802","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
An optimal document clustering method using hybrid optimal feature selection and efficient soft computing classifier
In general, document grouping is an important area of text extraction commonly used for document organization, browsing, abstraction, and categorization. This is an important process used for data recovery, data processing and document management. Recently several document grouping methods have been suggested to improve system performance. However, these document grouping methods face serious challenges. The main problem with document grouping is choosing the appropriate document features and similar tools. Moreover, due to the high computational cost and memory usage of those grouping methods, they are not suitable for many documents that need to be processed on a daily basis. This paper presents the optimal method of document clustering based on hybrid optimization selection and efficient computer classification. The proposed method consists three tire processes. First, we introduce a fuzzy density fruit fly optimization (FD-FFO) algorithm for data pre-processing which removes the unwanted artifacts and redundant content from the documents. Second, we illustrate the teaching–learning-based Harris Hawks optimization (TL-HHO) algorithm for optimal feature selection which computes best and optimal features among multiple features in document. Then, we offer a support vector regression probabilistic neural network (SVR-PNN) for optimal document clustering which improves the performance of clustering. Finally, the proposed SVR-PNN method which is evaluated by Reuters, 20 Press database and Web-snippets database. The performance of proposed SVR-PNN method can compare with existing methods such as Rider-Moth Flame optimization algorithm (RMFO), Correlation Based Incremental Clustering Algorithm (CBICA), Incremental Construction of GMM Tree (ICGT) and Weighted Probabilistic Latent Semantic Analysis (WPLSA) using Precision, Accuracy, F-Measure and Recall.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.