Supervised Topic Modeling Using Word Embedding with Machine Learning Techniques

2019 International Conference on Advances in the Emerging Computing Technologies (AECT) Pub Date : 2020-02-01 DOI:10.1109/AECT47998.2020.9194177

Rana Nassif, Mohamed Waleed Fahkr

引用次数: 2

Abstract

Large amounts of text are collected on the internet every day. As more text documents become available, it becomes essential to categorize them for efficient archiving, retrieval and search. In this paper, we investigate both statistical and machine learning techniques like (HMM & Deep learning network) combined with two well-known word embedding models (word2vec & Glove) for supervised document classification. The investigated combinations are compared with state-of-the-art approaches applied on the same data. The main contribution of this paper is to demonstrate the importance of both the meaning and the order of the word on topic modeling. This has often been overlooked in previous work as neither were taken into consideration where in some others only one was taken. This paper shows that one of our proposed models; which employed a hybrid between LSTM and CNN neural networks, obtained better accuracy on the same dataset than all state-of-the-art models in the literature.

查看原文本刊更多论文

使用词嵌入和机器学习技术的监督主题建模

每天在互联网上收集大量的文本。随着越来越多的文本文档变得可用，对它们进行分类以进行有效的归档、检索和搜索变得至关重要。在本文中，我们研究了统计和机器学习技术，如HMM和深度学习网络，结合两个著名的词嵌入模型(word2vec和Glove)进行监督文档分类。将所研究的组合与应用于相同数据的最先进方法进行比较。本文的主要贡献在于论证了语意和语序对主题建模的重要性。这一点在以前的工作中经常被忽视，因为两者都没有考虑到，而在其他一些工作中只考虑了一个。本文展示了我们提出的一个模型;它采用了LSTM和CNN神经网络的混合，在相同的数据集上获得了比文献中所有最先进的模型更好的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Advances in the Emerging Computing Technologies (AECT)

自引率

0.00%

发文量