The unsupervised short text classification method based on GCN encoder–decoder and local enhancement

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yingying Wei , Ze Wang , Jianbin Li , Tao Li
{"title":"The unsupervised short text classification method based on GCN encoder–decoder and local enhancement","authors":"Yingying Wei ,&nbsp;Ze Wang ,&nbsp;Jianbin Li ,&nbsp;Tao Li","doi":"10.1016/j.eswa.2025.127678","DOIUrl":null,"url":null,"abstract":"<div><div>Like all fields of data science, short text classification seeks to achieve high-quality results with limited data. Although supervised learning methods have made notable progress in this area, they require much-labeled data to achieve adequate accuracy. However, in many practical applications, labeled data is scarce, and manual labeling is not only time-consuming and labor-intensive but also expensive and may require specialized expertise. Therefore, this paper addresses the challenge of insufficient labeled data through unsupervised methods while ensuring the effective extraction of semantic features from the text. Building on this objective, we propose a novel unsupervised short text classification method within the framework of autoencoders. Specifically, we first design the MRFasGCN encoder and derive the relationships between nodes in its hidden layers, thereby enhancing the capture of text features and semantic information. Furthermore, we construct a dual-node-based decoder that reconstructs the topology and node attributes unsupervised. This approach compensates for feature deficiencies from multiple perspectives, alleviating the issue of insufficient features in short texts. Finally, we propose a localized enhancement method that integrates node features and topology, strengthening the connections between relevant nodes. This improves the model’s understanding of the text’s local context while mitigating the overfitting issues caused by feature sparsity in short texts. Extensive experimental results demonstrate the pronounced superiority of our proposed UEDE model over existing methods on the dataset, validating its effectiveness in short-text classification. Our code is submitted in <span><span>https://github.com/w123yy/UEDE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"282 ","pages":"Article 127678"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425013004","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Like all fields of data science, short text classification seeks to achieve high-quality results with limited data. Although supervised learning methods have made notable progress in this area, they require much-labeled data to achieve adequate accuracy. However, in many practical applications, labeled data is scarce, and manual labeling is not only time-consuming and labor-intensive but also expensive and may require specialized expertise. Therefore, this paper addresses the challenge of insufficient labeled data through unsupervised methods while ensuring the effective extraction of semantic features from the text. Building on this objective, we propose a novel unsupervised short text classification method within the framework of autoencoders. Specifically, we first design the MRFasGCN encoder and derive the relationships between nodes in its hidden layers, thereby enhancing the capture of text features and semantic information. Furthermore, we construct a dual-node-based decoder that reconstructs the topology and node attributes unsupervised. This approach compensates for feature deficiencies from multiple perspectives, alleviating the issue of insufficient features in short texts. Finally, we propose a localized enhancement method that integrates node features and topology, strengthening the connections between relevant nodes. This improves the model’s understanding of the text’s local context while mitigating the overfitting issues caused by feature sparsity in short texts. Extensive experimental results demonstrate the pronounced superiority of our proposed UEDE model over existing methods on the dataset, validating its effectiveness in short-text classification. Our code is submitted in https://github.com/w123yy/UEDE.
基于GCN编解码器和局部增强的无监督短文本分类方法
与数据科学的所有领域一样,短文本分类寻求用有限的数据获得高质量的结果。尽管监督学习方法在这一领域取得了显著进展,但它们需要大量标记数据才能达到足够的准确性。然而,在许多实际应用中,标记的数据是稀缺的,人工标记不仅费时费力,而且价格昂贵,可能需要专门的专业知识。因此,本文在保证有效提取文本语义特征的同时,通过无监督方法解决了标注数据不足的难题。在此基础上,我们提出了一种新的自动编码器框架下的无监督短文本分类方法。具体而言,我们首先设计了MRFasGCN编码器,并推导了其隐藏层中节点之间的关系,从而增强了文本特征和语义信息的捕获。此外,我们构造了一个基于双节点的解码器,该解码器可以在无监督的情况下重构拓扑和节点属性。这种方法从多个角度弥补了特征不足,缓解了短文本中特征不足的问题。最后,我们提出了一种结合节点特征和拓扑的局部增强方法,增强了相关节点之间的联系。这提高了模型对文本局部上下文的理解,同时减轻了短文本中由特征稀疏性引起的过拟合问题。大量的实验结果表明,我们提出的UEDE模型明显优于现有的数据集方法,验证了其在短文本分类中的有效性。我们的代码提交到https://github.com/w123yy/UEDE。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信