{"title":"自然语言处理:结合深度学习的网络文本分类","authors":"Chenwen Zhang","doi":"10.13052/jicts2245-800X.1312","DOIUrl":null,"url":null,"abstract":"With the increasing number of web texts, the classification of web texts has become an important task. In this paper, the text word vector representation method is first analyzed, and bidirectional encoder representations from transformers (BERT) are selected to extract the word vector. The bidirectional gated recurrent unit (BiGRU), convolutional neural network (CNN), and attention mechanism are combined to obtain the context and local features of the text, respectively. Experiments were carried out using the THUCNews dataset. The results showed that in the comparison between word-to-vector (Word2vec), Glove, and BERT, the BERT obtained the best classification result. In the classification of different types of text, the average accuracy and F1value of the BERT-BGCA method reached 0.9521 and 0.9436, respectively, which were superior to other deep learning methods such as TextCNN. The results suggest that the BERT-BGCA method is effective in classifying web texts and can be applied in practice.","PeriodicalId":36697,"journal":{"name":"Journal of ICT Standardization","volume":"13 1","pages":"25-40"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11042907","citationCount":"0","resultStr":"{\"title\":\"Natural Language Processing: Classification of Web Texts Combined with Deep Learning\",\"authors\":\"Chenwen Zhang\",\"doi\":\"10.13052/jicts2245-800X.1312\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing number of web texts, the classification of web texts has become an important task. In this paper, the text word vector representation method is first analyzed, and bidirectional encoder representations from transformers (BERT) are selected to extract the word vector. The bidirectional gated recurrent unit (BiGRU), convolutional neural network (CNN), and attention mechanism are combined to obtain the context and local features of the text, respectively. Experiments were carried out using the THUCNews dataset. The results showed that in the comparison between word-to-vector (Word2vec), Glove, and BERT, the BERT obtained the best classification result. In the classification of different types of text, the average accuracy and F1value of the BERT-BGCA method reached 0.9521 and 0.9436, respectively, which were superior to other deep learning methods such as TextCNN. The results suggest that the BERT-BGCA method is effective in classifying web texts and can be applied in practice.\",\"PeriodicalId\":36697,\"journal\":{\"name\":\"Journal of ICT Standardization\",\"volume\":\"13 1\",\"pages\":\"25-40\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11042907\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of ICT Standardization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11042907/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ICT Standardization","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11042907/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}
Natural Language Processing: Classification of Web Texts Combined with Deep Learning
With the increasing number of web texts, the classification of web texts has become an important task. In this paper, the text word vector representation method is first analyzed, and bidirectional encoder representations from transformers (BERT) are selected to extract the word vector. The bidirectional gated recurrent unit (BiGRU), convolutional neural network (CNN), and attention mechanism are combined to obtain the context and local features of the text, respectively. Experiments were carried out using the THUCNews dataset. The results showed that in the comparison between word-to-vector (Word2vec), Glove, and BERT, the BERT obtained the best classification result. In the classification of different types of text, the average accuracy and F1value of the BERT-BGCA method reached 0.9521 and 0.9436, respectively, which were superior to other deep learning methods such as TextCNN. The results suggest that the BERT-BGCA method is effective in classifying web texts and can be applied in practice.