Supervised Topic Modeling Using Word Embedding with Machine Learning Techniques

Rana Nassif, Mohamed Waleed Fahkr
{"title":"Supervised Topic Modeling Using Word Embedding with Machine Learning Techniques","authors":"Rana Nassif, Mohamed Waleed Fahkr","doi":"10.1109/AECT47998.2020.9194177","DOIUrl":null,"url":null,"abstract":"Large amounts of text are collected on the internet every day. As more text documents become available, it becomes essential to categorize them for efficient archiving, retrieval and search. In this paper, we investigate both statistical and machine learning techniques like (HMM & Deep learning network) combined with two well-known word embedding models (word2vec & Glove) for supervised document classification. The investigated combinations are compared with state-of-the-art approaches applied on the same data. The main contribution of this paper is to demonstrate the importance of both the meaning and the order of the word on topic modeling. This has often been overlooked in previous work as neither were taken into consideration where in some others only one was taken. This paper shows that one of our proposed models; which employed a hybrid between LSTM and CNN neural networks, obtained better accuracy on the same dataset than all state-of-the-art models in the literature.","PeriodicalId":331415,"journal":{"name":"2019 International Conference on Advances in the Emerging Computing Technologies (AECT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Advances in the Emerging Computing Technologies (AECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AECT47998.2020.9194177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Large amounts of text are collected on the internet every day. As more text documents become available, it becomes essential to categorize them for efficient archiving, retrieval and search. In this paper, we investigate both statistical and machine learning techniques like (HMM & Deep learning network) combined with two well-known word embedding models (word2vec & Glove) for supervised document classification. The investigated combinations are compared with state-of-the-art approaches applied on the same data. The main contribution of this paper is to demonstrate the importance of both the meaning and the order of the word on topic modeling. This has often been overlooked in previous work as neither were taken into consideration where in some others only one was taken. This paper shows that one of our proposed models; which employed a hybrid between LSTM and CNN neural networks, obtained better accuracy on the same dataset than all state-of-the-art models in the literature.
使用词嵌入和机器学习技术的监督主题建模
每天在互联网上收集大量的文本。随着越来越多的文本文档变得可用,对它们进行分类以进行有效的归档、检索和搜索变得至关重要。在本文中,我们研究了统计和机器学习技术,如HMM和深度学习网络,结合两个著名的词嵌入模型(word2vec和Glove)进行监督文档分类。将所研究的组合与应用于相同数据的最先进方法进行比较。本文的主要贡献在于论证了语意和语序对主题建模的重要性。这一点在以前的工作中经常被忽视,因为两者都没有考虑到,而在其他一些工作中只考虑了一个。本文展示了我们提出的一个模型;它采用了LSTM和CNN神经网络的混合,在相同的数据集上获得了比文献中所有最先进的模型更好的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信