使用XLNet和Infomap自动标注过程进行文本分类

Triana Dewi Salma, G. Saptawati, Yanti Rusmawati
{"title":"使用XLNet和Infomap自动标注过程进行文本分类","authors":"Triana Dewi Salma, G. Saptawati, Yanti Rusmawati","doi":"10.1109/ICAICTA53211.2021.9640255","DOIUrl":null,"url":null,"abstract":"Text data is growing rapidly and used in various fields such as chatbots and question answering systems, which are currently popular, where the system identifies the question category and the possibility of an answer to help provide answers to the questions entered. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling training data in supervised learning, is expensive, prone to mistakes, and has a low quantity. Automatic labeling that providing high quality and high quantity of training data is necessary to improve text classification performance. This study attempts to conduct community detection with the Infomap algorithm for automatic labeling in text classification using XLNet. The accuracy of the model is compared to the baseline, which using data with manual labeling. While the accuracy has not outperformed the overall baseline yet, but the result shows that automatic labeling can improve data labeling quickly with high quantity.","PeriodicalId":217463,"journal":{"name":"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text Classification Using XLNet with Infomap Automatic Labeling Process\",\"authors\":\"Triana Dewi Salma, G. Saptawati, Yanti Rusmawati\",\"doi\":\"10.1109/ICAICTA53211.2021.9640255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text data is growing rapidly and used in various fields such as chatbots and question answering systems, which are currently popular, where the system identifies the question category and the possibility of an answer to help provide answers to the questions entered. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling training data in supervised learning, is expensive, prone to mistakes, and has a low quantity. Automatic labeling that providing high quality and high quantity of training data is necessary to improve text classification performance. This study attempts to conduct community detection with the Infomap algorithm for automatic labeling in text classification using XLNet. The accuracy of the model is compared to the baseline, which using data with manual labeling. While the accuracy has not outperformed the overall baseline yet, but the result shows that automatic labeling can improve data labeling quickly with high quantity.\",\"PeriodicalId\":217463,\"journal\":{\"name\":\"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICTA53211.2021.9640255\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA53211.2021.9640255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

文本数据正在迅速增长,并用于各种领域,如聊天机器人和问答系统,这是目前流行的,其中系统识别问题的类别和答案的可能性,以帮助提供输入的问题的答案。拥有高质量的文本数据,特别是在文本分类中,对模型的性能有很大的影响。人工标注通常用于监督学习中的训练数据标注,成本高、容易出错、数量少。提供高质量、高数量训练数据的自动标注是提高文本分类性能的必要条件。本研究尝试利用Infomap算法对XLNet文本分类中的自动标注进行社区检测。将模型的精度与基线进行比较,基线使用手动标记的数据。虽然准确率还没有超过整体基线,但结果表明,自动标注可以快速、高质量地提高数据标注。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Text Classification Using XLNet with Infomap Automatic Labeling Process
Text data is growing rapidly and used in various fields such as chatbots and question answering systems, which are currently popular, where the system identifies the question category and the possibility of an answer to help provide answers to the questions entered. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling training data in supervised learning, is expensive, prone to mistakes, and has a low quantity. Automatic labeling that providing high quality and high quantity of training data is necessary to improve text classification performance. This study attempts to conduct community detection with the Infomap algorithm for automatic labeling in text classification using XLNet. The accuracy of the model is compared to the baseline, which using data with manual labeling. While the accuracy has not outperformed the overall baseline yet, but the result shows that automatic labeling can improve data labeling quickly with high quantity.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信