使用XLNet和Infomap自动标注过程进行文本分类

2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA) Pub Date : 2021-09-29 DOI:10.1109/ICAICTA53211.2021.9640255

Triana Dewi Salma, G. Saptawati, Yanti Rusmawati

{"title":"使用XLNet和Infomap自动标注过程进行文本分类","authors":"Triana Dewi Salma, G. Saptawati, Yanti Rusmawati","doi":"10.1109/ICAICTA53211.2021.9640255","DOIUrl":null,"url":null,"abstract":"Text data is growing rapidly and used in various fields such as chatbots and question answering systems, which are currently popular, where the system identifies the question category and the possibility of an answer to help provide answers to the questions entered. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling training data in supervised learning, is expensive, prone to mistakes, and has a low quantity. Automatic labeling that providing high quality and high quantity of training data is necessary to improve text classification performance. This study attempts to conduct community detection with the Infomap algorithm for automatic labeling in text classification using XLNet. The accuracy of the model is compared to the baseline, which using data with manual labeling. While the accuracy has not outperformed the overall baseline yet, but the result shows that automatic labeling can improve data labeling quickly with high quantity.","PeriodicalId":217463,"journal":{"name":"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text Classification Using XLNet with Infomap Automatic Labeling Process\",\"authors\":\"Triana Dewi Salma, G. Saptawati, Yanti Rusmawati\",\"doi\":\"10.1109/ICAICTA53211.2021.9640255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text data is growing rapidly and used in various fields such as chatbots and question answering systems, which are currently popular, where the system identifies the question category and the possibility of an answer to help provide answers to the questions entered. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling training data in supervised learning, is expensive, prone to mistakes, and has a low quantity. Automatic labeling that providing high quality and high quantity of training data is necessary to improve text classification performance. This study attempts to conduct community detection with the Infomap algorithm for automatic labeling in text classification using XLNet. The accuracy of the model is compared to the baseline, which using data with manual labeling. While the accuracy has not outperformed the overall baseline yet, but the result shows that automatic labeling can improve data labeling quickly with high quantity.\",\"PeriodicalId\":217463,\"journal\":{\"name\":\"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICTA53211.2021.9640255\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA53211.2021.9640255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本数据正在迅速增长，并用于各种领域，如聊天机器人和问答系统，这是目前流行的，其中系统识别问题的类别和答案的可能性，以帮助提供输入的问题的答案。拥有高质量的文本数据，特别是在文本分类中，对模型的性能有很大的影响。人工标注通常用于监督学习中的训练数据标注，成本高、容易出错、数量少。提供高质量、高数量训练数据的自动标注是提高文本分类性能的必要条件。本研究尝试利用Infomap算法对XLNet文本分类中的自动标注进行社区检测。将模型的精度与基线进行比较，基线使用手动标记的数据。虽然准确率还没有超过整体基线，但结果表明，自动标注可以快速、高质量地提高数据标注。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text Classification Using XLNet with Infomap Automatic Labeling Process

Text data is growing rapidly and used in various fields such as chatbots and question answering systems, which are currently popular, where the system identifies the question category and the possibility of an answer to help provide answers to the questions entered. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling training data in supervised learning, is expensive, prone to mistakes, and has a low quantity. Automatic labeling that providing high quality and high quantity of training data is necessary to improve text classification performance. This study attempts to conduct community detection with the Infomap algorithm for automatic labeling in text classification using XLNet. The accuracy of the model is compared to the baseline, which using data with manual labeling. While the accuracy has not outperformed the overall baseline yet, but the result shows that automatic labeling can improve data labeling quickly with high quantity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)

自引率

0.00%

发文量