{"title":"The unsupervised short text classification method based on GCN encoder–decoder and local enhancement","authors":"Yingying Wei , Ze Wang , Jianbin Li , Tao Li","doi":"10.1016/j.eswa.2025.127678","DOIUrl":null,"url":null,"abstract":"<div><div>Like all fields of data science, short text classification seeks to achieve high-quality results with limited data. Although supervised learning methods have made notable progress in this area, they require much-labeled data to achieve adequate accuracy. However, in many practical applications, labeled data is scarce, and manual labeling is not only time-consuming and labor-intensive but also expensive and may require specialized expertise. Therefore, this paper addresses the challenge of insufficient labeled data through unsupervised methods while ensuring the effective extraction of semantic features from the text. Building on this objective, we propose a novel unsupervised short text classification method within the framework of autoencoders. Specifically, we first design the MRFasGCN encoder and derive the relationships between nodes in its hidden layers, thereby enhancing the capture of text features and semantic information. Furthermore, we construct a dual-node-based decoder that reconstructs the topology and node attributes unsupervised. This approach compensates for feature deficiencies from multiple perspectives, alleviating the issue of insufficient features in short texts. Finally, we propose a localized enhancement method that integrates node features and topology, strengthening the connections between relevant nodes. This improves the model’s understanding of the text’s local context while mitigating the overfitting issues caused by feature sparsity in short texts. Extensive experimental results demonstrate the pronounced superiority of our proposed UEDE model over existing methods on the dataset, validating its effectiveness in short-text classification. Our code is submitted in <span><span>https://github.com/w123yy/UEDE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"282 ","pages":"Article 127678"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425013004","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Like all fields of data science, short text classification seeks to achieve high-quality results with limited data. Although supervised learning methods have made notable progress in this area, they require much-labeled data to achieve adequate accuracy. However, in many practical applications, labeled data is scarce, and manual labeling is not only time-consuming and labor-intensive but also expensive and may require specialized expertise. Therefore, this paper addresses the challenge of insufficient labeled data through unsupervised methods while ensuring the effective extraction of semantic features from the text. Building on this objective, we propose a novel unsupervised short text classification method within the framework of autoencoders. Specifically, we first design the MRFasGCN encoder and derive the relationships between nodes in its hidden layers, thereby enhancing the capture of text features and semantic information. Furthermore, we construct a dual-node-based decoder that reconstructs the topology and node attributes unsupervised. This approach compensates for feature deficiencies from multiple perspectives, alleviating the issue of insufficient features in short texts. Finally, we propose a localized enhancement method that integrates node features and topology, strengthening the connections between relevant nodes. This improves the model’s understanding of the text’s local context while mitigating the overfitting issues caused by feature sparsity in short texts. Extensive experimental results demonstrate the pronounced superiority of our proposed UEDE model over existing methods on the dataset, validating its effectiveness in short-text classification. Our code is submitted in https://github.com/w123yy/UEDE.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.