{"title":"基于RoBerta和TextCNN的政务信息文本分类","authors":"Yan Lai, Lin Zhang","doi":"10.1109/CISCE58541.2023.10142573","DOIUrl":null,"url":null,"abstract":"With the arrival of the era of big data, the number of messages on the government affairs platform has grown rapidly. To better solve the urgent problems reflected by the public, this paper takes some real messages from a provincial government affairs platform as the research object and constructs a model of RoBerta and TextCNN fusion to classify the text of government affairs messages. Firstly, the message text is pre-processed, including de-duplication and noise reduction. Second, the RoBerta-TextCNN model is constructed to classify the message text, and the message text vector obtained from the RoBerta layer is input to the TextCNN layer for feature extraction, and then the captured features are classified using the softmax classifier. Finally, the classification results are compared with those of other models. The experimental results show that the RoBerta-TextCNN model constructed in this paper achieves better classification results in this task, with an accuracy rate of 89.63%, a recall rate of 89.95%, and an F1 value of 90.31%.","PeriodicalId":145263,"journal":{"name":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Government affairs message text classification based on RoBerta and TextCNN\",\"authors\":\"Yan Lai, Lin Zhang\",\"doi\":\"10.1109/CISCE58541.2023.10142573\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the arrival of the era of big data, the number of messages on the government affairs platform has grown rapidly. To better solve the urgent problems reflected by the public, this paper takes some real messages from a provincial government affairs platform as the research object and constructs a model of RoBerta and TextCNN fusion to classify the text of government affairs messages. Firstly, the message text is pre-processed, including de-duplication and noise reduction. Second, the RoBerta-TextCNN model is constructed to classify the message text, and the message text vector obtained from the RoBerta layer is input to the TextCNN layer for feature extraction, and then the captured features are classified using the softmax classifier. Finally, the classification results are compared with those of other models. The experimental results show that the RoBerta-TextCNN model constructed in this paper achieves better classification results in this task, with an accuracy rate of 89.63%, a recall rate of 89.95%, and an F1 value of 90.31%.\",\"PeriodicalId\":145263,\"journal\":{\"name\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISCE58541.2023.10142573\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISCE58541.2023.10142573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Government affairs message text classification based on RoBerta and TextCNN
With the arrival of the era of big data, the number of messages on the government affairs platform has grown rapidly. To better solve the urgent problems reflected by the public, this paper takes some real messages from a provincial government affairs platform as the research object and constructs a model of RoBerta and TextCNN fusion to classify the text of government affairs messages. Firstly, the message text is pre-processed, including de-duplication and noise reduction. Second, the RoBerta-TextCNN model is constructed to classify the message text, and the message text vector obtained from the RoBerta layer is input to the TextCNN layer for feature extraction, and then the captured features are classified using the softmax classifier. Finally, the classification results are compared with those of other models. The experimental results show that the RoBerta-TextCNN model constructed in this paper achieves better classification results in this task, with an accuracy rate of 89.63%, a recall rate of 89.95%, and an F1 value of 90.31%.