A short text topic modeling method based on integrating Gaussian and Logistic coding networks with pre-trained word embeddings

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-11-22 DOI:10.1016/j.neucom.2024.128941

Si Zhang, Jiali Xu, Ning Hui, Peiyun Zhai

{"title":"A short text topic modeling method based on integrating Gaussian and Logistic coding networks with pre-trained word embeddings","authors":"Si Zhang, Jiali Xu, Ning Hui, Peiyun Zhai","doi":"10.1016/j.neucom.2024.128941","DOIUrl":null,"url":null,"abstract":"<div><div>The development of neural networks has provided a flexible learning framework for topic modeling. Currently, topic modeling based on neural networks has garnered wide attention. Despite its widespread application, the implementation of neural topic modeling still needs to be improved due to the complexity of short texts. Short texts usually contains only a few words and a small amount of feature information, lacking sufficient word co-occurrence and context sharing information. This results in challenges such as sparse features and poor interpretability in topic modeling. To alleviate this issue, an innovative model called <strong>T</strong>opic <strong>M</strong>odeling of <strong>E</strong>nhanced <strong>N</strong>eural <strong>N</strong>etwork with word <strong>E</strong>mbedding (ENNETM) was proposed. Firstly, we introduced an enhanced network into the inference network part, which integrated the Gaussian and Logistic coding networks to improve the performance and the interpretability of topic extraction. Secondly, we introduced the pre-trained word embedding into the Gaussian decoding network part of the model to enrich the contextual semantic information. Comprehensive experiments were carried out on three public datasets, 20NewGroups, AG_news and TagMyNews, and the results showed that the proposed method outperformed several state-of-the-art models in topic extraction and text classification.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128941"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224017120","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The development of neural networks has provided a flexible learning framework for topic modeling. Currently, topic modeling based on neural networks has garnered wide attention. Despite its widespread application, the implementation of neural topic modeling still needs to be improved due to the complexity of short texts. Short texts usually contains only a few words and a small amount of feature information, lacking sufficient word co-occurrence and context sharing information. This results in challenges such as sparse features and poor interpretability in topic modeling. To alleviate this issue, an innovative model called Topic Modeling of Enhanced Neural Network with word Embedding (ENNETM) was proposed. Firstly, we introduced an enhanced network into the inference network part, which integrated the Gaussian and Logistic coding networks to improve the performance and the interpretability of topic extraction. Secondly, we introduced the pre-trained word embedding into the Gaussian decoding network part of the model to enrich the contextual semantic information. Comprehensive experiments were carried out on three public datasets, 20NewGroups, AG_news and TagMyNews, and the results showed that the proposed method outperformed several state-of-the-art models in topic extraction and text classification.

查看原文本刊更多论文

基于高斯和逻辑编码网络与预训练词嵌入相结合的短文本主题建模方法

神经网络的发展为主题建模提供了一个灵活的学习框架。目前，基于神经网络的主题建模得到了广泛的关注。尽管应用广泛，但由于短文本的复杂性，神经主题建模的实现仍有待改进。短文本通常只包含少量的单词和少量的特征信息，缺乏足够的单词共现和上下文共享信息。这导致了主题建模中的稀疏特征和较差的可解释性等挑战。为了解决这一问题，提出了一种基于词嵌入的增强神经网络主题建模（ENNETM）模型。首先，我们在推理网络部分引入了一种增强网络，将高斯和逻辑编码网络相结合，提高了主题抽取的性能和可解释性。其次，在模型的高斯解码网络部分引入预训练词嵌入，丰富上下文语义信息；在20NewGroups、AG_news和TagMyNews三个公共数据集上进行了综合实验，结果表明该方法在主题提取和文本分类方面优于几种最先进的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.