{"title":"基于Bert的策略文本分类算法","authors":"Bihui Yu, Chen Deng, Liping Bu","doi":"10.1109/ICTech55460.2022.00103","DOIUrl":null,"url":null,"abstract":"With the development of the Internet, the policy text classification model of deep learning is used to improve the effect of policy text classification, and to play and use the huge value contained in the policy text. In order to more accurately determine the policy field described by the text, a BERT-based policy text classification algorithm is proposed. First, the algorithm uses the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model to vectorize the sentence-level feature of the policy field text, and then the obtained feature vector is input into the classifier for classification, and finally the homology policy field is used. The text data set is verified. The experimental results show that the classification of the trained model on the test set recorded the highest F1 value of 93.25%. It is nearly 6% higher than the classification task of the BERT model for the MRPC task. Therefore, the proposed policy domain text classification algorithm can more accurately and efficiently judge the domain of the policy text, which is helpful for further analysis of the text data in the policy domain and extract more valuable information.","PeriodicalId":290836,"journal":{"name":"2022 11th International Conference of Information and Communication Technology (ICTech))","volume":"243 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Policy Text Classification Algorithm Based on Bert\",\"authors\":\"Bihui Yu, Chen Deng, Liping Bu\",\"doi\":\"10.1109/ICTech55460.2022.00103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of the Internet, the policy text classification model of deep learning is used to improve the effect of policy text classification, and to play and use the huge value contained in the policy text. In order to more accurately determine the policy field described by the text, a BERT-based policy text classification algorithm is proposed. First, the algorithm uses the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model to vectorize the sentence-level feature of the policy field text, and then the obtained feature vector is input into the classifier for classification, and finally the homology policy field is used. The text data set is verified. The experimental results show that the classification of the trained model on the test set recorded the highest F1 value of 93.25%. It is nearly 6% higher than the classification task of the BERT model for the MRPC task. Therefore, the proposed policy domain text classification algorithm can more accurately and efficiently judge the domain of the policy text, which is helpful for further analysis of the text data in the policy domain and extract more valuable information.\",\"PeriodicalId\":290836,\"journal\":{\"name\":\"2022 11th International Conference of Information and Communication Technology (ICTech))\",\"volume\":\"243 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 11th International Conference of Information and Communication Technology (ICTech))\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTech55460.2022.00103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference of Information and Communication Technology (ICTech))","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTech55460.2022.00103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
随着互联网的发展,利用深度学习的政策文本分类模型来提高政策文本分类的效果,发挥和利用政策文本所蕴含的巨大价值。为了更准确地确定文本所描述的策略域,提出了一种基于bert的策略文本分类算法。该算法首先使用BERT (Bidirectional Encoder Representations from Transformers)预训练的语言模型对策略域文本的句子级特征进行矢量化,然后将得到的特征向量输入到分类器中进行分类,最后使用同源策略域。验证文本数据集。实验结果表明,训练后的模型在测试集上的分类F1值最高,达到93.25%。对于MRPC任务,它比BERT模型的分类任务高出近6%。因此,本文提出的策略领域文本分类算法能够更加准确、高效地判断策略文本所属的领域,有助于进一步分析策略领域的文本数据,提取更多有价值的信息。
Policy Text Classification Algorithm Based on Bert
With the development of the Internet, the policy text classification model of deep learning is used to improve the effect of policy text classification, and to play and use the huge value contained in the policy text. In order to more accurately determine the policy field described by the text, a BERT-based policy text classification algorithm is proposed. First, the algorithm uses the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model to vectorize the sentence-level feature of the policy field text, and then the obtained feature vector is input into the classifier for classification, and finally the homology policy field is used. The text data set is verified. The experimental results show that the classification of the trained model on the test set recorded the highest F1 value of 93.25%. It is nearly 6% higher than the classification task of the BERT model for the MRPC task. Therefore, the proposed policy domain text classification algorithm can more accurately and efficiently judge the domain of the policy text, which is helpful for further analysis of the text data in the policy domain and extract more valuable information.