基于RoBerta和TextCNN的政务信息文本分类

Yan Lai, Lin Zhang
{"title":"基于RoBerta和TextCNN的政务信息文本分类","authors":"Yan Lai, Lin Zhang","doi":"10.1109/CISCE58541.2023.10142573","DOIUrl":null,"url":null,"abstract":"With the arrival of the era of big data, the number of messages on the government affairs platform has grown rapidly. To better solve the urgent problems reflected by the public, this paper takes some real messages from a provincial government affairs platform as the research object and constructs a model of RoBerta and TextCNN fusion to classify the text of government affairs messages. Firstly, the message text is pre-processed, including de-duplication and noise reduction. Second, the RoBerta-TextCNN model is constructed to classify the message text, and the message text vector obtained from the RoBerta layer is input to the TextCNN layer for feature extraction, and then the captured features are classified using the softmax classifier. Finally, the classification results are compared with those of other models. The experimental results show that the RoBerta-TextCNN model constructed in this paper achieves better classification results in this task, with an accuracy rate of 89.63%, a recall rate of 89.95%, and an F1 value of 90.31%.","PeriodicalId":145263,"journal":{"name":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Government affairs message text classification based on RoBerta and TextCNN\",\"authors\":\"Yan Lai, Lin Zhang\",\"doi\":\"10.1109/CISCE58541.2023.10142573\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the arrival of the era of big data, the number of messages on the government affairs platform has grown rapidly. To better solve the urgent problems reflected by the public, this paper takes some real messages from a provincial government affairs platform as the research object and constructs a model of RoBerta and TextCNN fusion to classify the text of government affairs messages. Firstly, the message text is pre-processed, including de-duplication and noise reduction. Second, the RoBerta-TextCNN model is constructed to classify the message text, and the message text vector obtained from the RoBerta layer is input to the TextCNN layer for feature extraction, and then the captured features are classified using the softmax classifier. Finally, the classification results are compared with those of other models. The experimental results show that the RoBerta-TextCNN model constructed in this paper achieves better classification results in this task, with an accuracy rate of 89.63%, a recall rate of 89.95%, and an F1 value of 90.31%.\",\"PeriodicalId\":145263,\"journal\":{\"name\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISCE58541.2023.10142573\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISCE58541.2023.10142573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着大数据时代的到来,政务平台上的留言量迅速增长。为了更好地解决公众反映的急迫性问题,本文以某省级政务平台的部分真实信息为研究对象,构建RoBerta与TextCNN融合模型对政务信息文本进行分类。首先,对消息文本进行预处理,包括去重复和降噪。其次,构建RoBerta-TextCNN模型对消息文本进行分类,将RoBerta层得到的消息文本向量输入到TextCNN层进行特征提取,然后使用softmax分类器对捕获的特征进行分类。最后,将分类结果与其他模型的分类结果进行比较。实验结果表明,本文构建的RoBerta-TextCNN模型在该任务中取得了较好的分类效果,准确率为89.63%,召回率为89.95%,F1值为90.31%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Government affairs message text classification based on RoBerta and TextCNN
With the arrival of the era of big data, the number of messages on the government affairs platform has grown rapidly. To better solve the urgent problems reflected by the public, this paper takes some real messages from a provincial government affairs platform as the research object and constructs a model of RoBerta and TextCNN fusion to classify the text of government affairs messages. Firstly, the message text is pre-processed, including de-duplication and noise reduction. Second, the RoBerta-TextCNN model is constructed to classify the message text, and the message text vector obtained from the RoBerta layer is input to the TextCNN layer for feature extraction, and then the captured features are classified using the softmax classifier. Finally, the classification results are compared with those of other models. The experimental results show that the RoBerta-TextCNN model constructed in this paper achieves better classification results in this task, with an accuracy rate of 89.63%, a recall rate of 89.95%, and an F1 value of 90.31%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信