基于RoBerta和TextCNN的政务信息文本分类

2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE) Pub Date : 2023-04-14 DOI:10.1109/CISCE58541.2023.10142573

Yan Lai, Lin Zhang

{"title":"基于RoBerta和TextCNN的政务信息文本分类","authors":"Yan Lai, Lin Zhang","doi":"10.1109/CISCE58541.2023.10142573","DOIUrl":null,"url":null,"abstract":"With the arrival of the era of big data, the number of messages on the government affairs platform has grown rapidly. To better solve the urgent problems reflected by the public, this paper takes some real messages from a provincial government affairs platform as the research object and constructs a model of RoBerta and TextCNN fusion to classify the text of government affairs messages. Firstly, the message text is pre-processed, including de-duplication and noise reduction. Second, the RoBerta-TextCNN model is constructed to classify the message text, and the message text vector obtained from the RoBerta layer is input to the TextCNN layer for feature extraction, and then the captured features are classified using the softmax classifier. Finally, the classification results are compared with those of other models. The experimental results show that the RoBerta-TextCNN model constructed in this paper achieves better classification results in this task, with an accuracy rate of 89.63%, a recall rate of 89.95%, and an F1 value of 90.31%.","PeriodicalId":145263,"journal":{"name":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Government affairs message text classification based on RoBerta and TextCNN\",\"authors\":\"Yan Lai, Lin Zhang\",\"doi\":\"10.1109/CISCE58541.2023.10142573\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the arrival of the era of big data, the number of messages on the government affairs platform has grown rapidly. To better solve the urgent problems reflected by the public, this paper takes some real messages from a provincial government affairs platform as the research object and constructs a model of RoBerta and TextCNN fusion to classify the text of government affairs messages. Firstly, the message text is pre-processed, including de-duplication and noise reduction. Second, the RoBerta-TextCNN model is constructed to classify the message text, and the message text vector obtained from the RoBerta layer is input to the TextCNN layer for feature extraction, and then the captured features are classified using the softmax classifier. Finally, the classification results are compared with those of other models. The experimental results show that the RoBerta-TextCNN model constructed in this paper achieves better classification results in this task, with an accuracy rate of 89.63%, a recall rate of 89.95%, and an F1 value of 90.31%.\",\"PeriodicalId\":145263,\"journal\":{\"name\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISCE58541.2023.10142573\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISCE58541.2023.10142573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着大数据时代的到来，政务平台上的留言量迅速增长。为了更好地解决公众反映的急迫性问题，本文以某省级政务平台的部分真实信息为研究对象，构建RoBerta与TextCNN融合模型对政务信息文本进行分类。首先，对消息文本进行预处理，包括去重复和降噪。其次，构建RoBerta-TextCNN模型对消息文本进行分类，将RoBerta层得到的消息文本向量输入到TextCNN层进行特征提取，然后使用softmax分类器对捕获的特征进行分类。最后，将分类结果与其他模型的分类结果进行比较。实验结果表明，本文构建的RoBerta-TextCNN模型在该任务中取得了较好的分类效果，准确率为89.63%，召回率为89.95%，F1值为90.31%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Government affairs message text classification based on RoBerta and TextCNN

With the arrival of the era of big data, the number of messages on the government affairs platform has grown rapidly. To better solve the urgent problems reflected by the public, this paper takes some real messages from a provincial government affairs platform as the research object and constructs a model of RoBerta and TextCNN fusion to classify the text of government affairs messages. Firstly, the message text is pre-processed, including de-duplication and noise reduction. Second, the RoBerta-TextCNN model is constructed to classify the message text, and the message text vector obtained from the RoBerta layer is input to the TextCNN layer for feature extraction, and then the captured features are classified using the softmax classifier. Finally, the classification results are compared with those of other models. The experimental results show that the RoBerta-TextCNN model constructed in this paper achieves better classification results in this task, with an accuracy rate of 89.63%, a recall rate of 89.95%, and an F1 value of 90.31%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)

自引率

0.00%

发文量