Deep Sequence Models for Text Classification Tasks

S. Abdullahi, Su Yiming, Shamsuddeen Hassan Muhammad, A. Mustapha, Ahmad Muhammad Aminu, Abdulkadir Abdullahi, Musa Bello, Saminu Mohammad Aliyu
{"title":"Deep Sequence Models for Text Classification Tasks","authors":"S. Abdullahi, Su Yiming, Shamsuddeen Hassan Muhammad, A. Mustapha, Ahmad Muhammad Aminu, Abdulkadir Abdullahi, Musa Bello, Saminu Mohammad Aliyu","doi":"10.1109/ICECCE52056.2021.9514261","DOIUrl":null,"url":null,"abstract":"The exponential growth of data generated on the Internet in the current information age is a driving force for the digital economy. Extraction of information is the major value in an accumulated big data. Big data dependency on statistical analysis and hand-engineered rules machine learning algorithms are overwhelmed with vast complexities inherent in human languages. Natural Language Processing (NLP) is equipping machines to understand these human diverse and complicated languages. Text Classification is an NLP task which automatically identifies patterns based on predefined or undefined labeled sets. Common text classification application includes information retrieval, modeling news topic, theme extraction, sentiment analysis, and spam detection. In texts, some sequences of words depend on the previous or next word sequences to make full meaning; this is a challenging dependency task that requires the machine to be able to store some previous important information to impact future meaning. Sequence models such as RNN, GRU, and LSTM is a breakthrough for tasks with long-range dependencies. As such, we applied these models to Binary and Multi-class classification. Results generated were excellent with most of the models performing within the range of 80% and 94%. However, this result is not exhaustive as we believe there is room for improvement if machines are to compete with humans.","PeriodicalId":302947,"journal":{"name":"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCE52056.2021.9514261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The exponential growth of data generated on the Internet in the current information age is a driving force for the digital economy. Extraction of information is the major value in an accumulated big data. Big data dependency on statistical analysis and hand-engineered rules machine learning algorithms are overwhelmed with vast complexities inherent in human languages. Natural Language Processing (NLP) is equipping machines to understand these human diverse and complicated languages. Text Classification is an NLP task which automatically identifies patterns based on predefined or undefined labeled sets. Common text classification application includes information retrieval, modeling news topic, theme extraction, sentiment analysis, and spam detection. In texts, some sequences of words depend on the previous or next word sequences to make full meaning; this is a challenging dependency task that requires the machine to be able to store some previous important information to impact future meaning. Sequence models such as RNN, GRU, and LSTM is a breakthrough for tasks with long-range dependencies. As such, we applied these models to Binary and Multi-class classification. Results generated were excellent with most of the models performing within the range of 80% and 94%. However, this result is not exhaustive as we believe there is room for improvement if machines are to compete with humans.
文本分类任务的深度序列模型
在当今信息时代,互联网上产生的数据呈指数级增长,是数字经济的推动力。信息的提取是积累的大数据的主要价值。大数据依赖于统计分析和手工设计的规则,机器学习算法被人类语言固有的巨大复杂性所淹没。自然语言处理(NLP)正在装备机器来理解这些人类多样化和复杂的语言。文本分类是一种基于预定义或未定义标记集自动识别模式的自然语言处理任务。常见的文本分类应用包括信息检索、新闻主题建模、主题提取、情感分析和垃圾邮件检测。在文本中,一些单词序列依赖于前一个或后一个单词序列来表达完整的意思;这是一项具有挑战性的依赖任务,需要机器能够存储一些以前的重要信息来影响未来的含义。序列模型,如RNN、GRU和LSTM,对于具有长期依赖关系的任务来说是一个突破。因此,我们将这些模型应用于二元分类和多类分类。生成的结果非常好,大多数模型的性能在80%到94%之间。然而,这个结果并不详尽,因为我们认为如果机器要与人类竞争,还有改进的空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信