Neural Networks Based on Latent Dirichlet Allocation For News Web Page Classifications

Adel R. Alharbi, Shwaa D. Alharbi, Amer Aljaedi, Oluwatobi Akanbi
{"title":"Neural Networks Based on Latent Dirichlet Allocation For News Web Page Classifications","authors":"Adel R. Alharbi, Shwaa D. Alharbi, Amer Aljaedi, Oluwatobi Akanbi","doi":"10.1109/IICAIET49801.2020.9257842","DOIUrl":null,"url":null,"abstract":"Any popular news website in our modern life, offering details to millions of users every day. Although computer technology continues to grow, the number of disease data is rising. How to structure the document to enable data recognition dynamically has become one of the main challenges for sophisticated web services. Traditional systematic classification of news text requires not only a lot of human and financial assets but it also hardly accomplishes fast classification function. In this work, we introduce a new method relying on both the Latent Dirichlet Allocation and the Neural Networks that are used in the Arabic document classification. Our approach adopts the Vector Space Model to interpret documents in applications for the text classification. In this process, the text is represented as a term vector; n-grams. These methods can not distinguish semantic or textual content; this results in considerable space for features and semantic losses. In this research, the new proposal utilizes a “topics” sampled as text characteristics by the Latent Dirichlet Allocation method. Effectively it eliminates the problems described. We have extracted important themes (topics) of all the texts. Each theme is identified by a different descriptor distribution, and then each text is depicted on the vectors of certain themes. Our experiments indicate that the proposed solution is capable of achieving high efficiency with an accuracy rate of 85.11% for the Arabic text classification task.","PeriodicalId":300885,"journal":{"name":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET49801.2020.9257842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Any popular news website in our modern life, offering details to millions of users every day. Although computer technology continues to grow, the number of disease data is rising. How to structure the document to enable data recognition dynamically has become one of the main challenges for sophisticated web services. Traditional systematic classification of news text requires not only a lot of human and financial assets but it also hardly accomplishes fast classification function. In this work, we introduce a new method relying on both the Latent Dirichlet Allocation and the Neural Networks that are used in the Arabic document classification. Our approach adopts the Vector Space Model to interpret documents in applications for the text classification. In this process, the text is represented as a term vector; n-grams. These methods can not distinguish semantic or textual content; this results in considerable space for features and semantic losses. In this research, the new proposal utilizes a “topics” sampled as text characteristics by the Latent Dirichlet Allocation method. Effectively it eliminates the problems described. We have extracted important themes (topics) of all the texts. Each theme is identified by a different descriptor distribution, and then each text is depicted on the vectors of certain themes. Our experiments indicate that the proposed solution is capable of achieving high efficiency with an accuracy rate of 85.11% for the Arabic text classification task.
基于潜在Dirichlet分配的神经网络用于新闻网页分类
任何在我们现代生活中流行的新闻网站,每天为数百万用户提供细节。尽管计算机技术不断发展,但疾病数据的数量也在不断增加。如何构建文档以动态地实现数据识别已成为复杂web服务的主要挑战之一。传统的新闻文本系统分类不仅需要耗费大量的人力和财力,而且难以实现快速分类的功能。在这项工作中,我们引入了一种依赖于潜在狄利克雷分配和神经网络的新方法,这些方法用于阿拉伯语文档分类。我们的方法采用向量空间模型来解释文本分类应用中的文档。在这个过程中,文本被表示为一个术语向量;字格。这些方法不能区分语义或文本内容;这给特征和语义损失留下了相当大的空间。在本研究中,新提案利用潜在狄利克雷分配方法采样的“主题”作为文本特征。它有效地消除了所描述的问题。我们摘录了所有文本的重要主题。每个主题由不同的描述符分布来标识,然后每个文本在特定主题的向量上进行描绘。实验表明,该方法对阿拉伯文本分类任务具有较高的效率,准确率达到85.11%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信