Deep Learning Methods with Pre-Trained Word Embeddings and Pre-Trained Transformers for Extreme Multi-Label Text Classification

Necdet Eren Erciyes, A. K. Görür
{"title":"Deep Learning Methods with Pre-Trained Word Embeddings and Pre-Trained Transformers for Extreme Multi-Label Text Classification","authors":"Necdet Eren Erciyes, A. K. Görür","doi":"10.1109/UBMK52708.2021.9558977","DOIUrl":null,"url":null,"abstract":"In recent years, there has been a considerable increase in textual documents online. This increase requires the creation of highly improved machine learning methods to classify text in many different domains. The effectiveness of these machine learning methods depends on the model capacity to understand the complex nature of the unstructured data and the relations of features that exist. Many different machine learning methods were proposed for a long time to solve text classification problems, such as SVM, kNN, and Rocchio classification. These shallow learning methods have achieved doubtless success in many different domains. For big and unstructured data like text, deep learning methods which can learn representations and features from the input data wtihout using any feature extraction methods have shown to be one of the major solutions. In this study, we explore the accuracy of recent recommended deep learning methods for multi-label text classification starting with simple RNN, CNN models to pretrained transformer models. We evaluated these methods’ performances by computing multi-label evaluation metrics and compared the results with the previous studies.","PeriodicalId":106516,"journal":{"name":"2021 6th International Conference on Computer Science and Engineering (UBMK)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK52708.2021.9558977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In recent years, there has been a considerable increase in textual documents online. This increase requires the creation of highly improved machine learning methods to classify text in many different domains. The effectiveness of these machine learning methods depends on the model capacity to understand the complex nature of the unstructured data and the relations of features that exist. Many different machine learning methods were proposed for a long time to solve text classification problems, such as SVM, kNN, and Rocchio classification. These shallow learning methods have achieved doubtless success in many different domains. For big and unstructured data like text, deep learning methods which can learn representations and features from the input data wtihout using any feature extraction methods have shown to be one of the major solutions. In this study, we explore the accuracy of recent recommended deep learning methods for multi-label text classification starting with simple RNN, CNN models to pretrained transformer models. We evaluated these methods’ performances by computing multi-label evaluation metrics and compared the results with the previous studies.
基于预训练词嵌入和预训练变形的深度学习方法用于极端多标签文本分类
近年来,在线文本文档有了相当大的增长。这种增长需要创建高度改进的机器学习方法来对许多不同领域的文本进行分类。这些机器学习方法的有效性取决于模型理解非结构化数据的复杂性和存在的特征关系的能力。长期以来,人们提出了许多不同的机器学习方法来解决文本分类问题,例如SVM、kNN和Rocchio分类。这些肤浅的学习方法无疑在许多不同的领域取得了成功。对于像文本这样的大数据和非结构化数据,无需使用任何特征提取方法就可以从输入数据中学习表征和特征的深度学习方法已被证明是主要的解决方案之一。在这项研究中,我们探索了最近推荐的深度学习方法在多标签文本分类中的准确性,从简单的RNN、CNN模型到预训练的变压器模型。我们通过计算多标签评价指标来评价这些方法的性能,并将结果与以往的研究结果进行比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信