{"title":"使用微调BERT对越南Facebook帖子进行分类","authors":"Dung Tran Tuan, Dang Van Thin, V. Pham, N. Nguyen","doi":"10.1109/NICS51282.2020.9335865","DOIUrl":null,"url":null,"abstract":"With the development of social networks in the age of information technology explosion, the classification of social news plays an important role in detecting the hot topics being discussed on social networks over a period of time. In this paper, we present a new model for effective Facebook's posts classification and a new dataset which is labeled for the corresponding subject. The dataset consists of 5191 Facebook user's public posts, which is divided into 3 subsets: training, validation and testing data sets. Then, we explore the effectiveness of fine-tuning BERT model with three truncation methods compared with other machine learning algorithms on our dataset. Experimental results show that the fine-tune BERT models outperform other approaches. The fine-tune BERT with “head + tail” truncation methods achieves the best scores with 84.31% of Precision, 84.12% of Recall and 84.15% of F1-score.","PeriodicalId":308944,"journal":{"name":"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Vietnamese Facebook Posts Classification using Fine-Tuning BERT\",\"authors\":\"Dung Tran Tuan, Dang Van Thin, V. Pham, N. Nguyen\",\"doi\":\"10.1109/NICS51282.2020.9335865\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of social networks in the age of information technology explosion, the classification of social news plays an important role in detecting the hot topics being discussed on social networks over a period of time. In this paper, we present a new model for effective Facebook's posts classification and a new dataset which is labeled for the corresponding subject. The dataset consists of 5191 Facebook user's public posts, which is divided into 3 subsets: training, validation and testing data sets. Then, we explore the effectiveness of fine-tuning BERT model with three truncation methods compared with other machine learning algorithms on our dataset. Experimental results show that the fine-tune BERT models outperform other approaches. The fine-tune BERT with “head + tail” truncation methods achieves the best scores with 84.31% of Precision, 84.12% of Recall and 84.15% of F1-score.\",\"PeriodicalId\":308944,\"journal\":{\"name\":\"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS51282.2020.9335865\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS51282.2020.9335865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Vietnamese Facebook Posts Classification using Fine-Tuning BERT
With the development of social networks in the age of information technology explosion, the classification of social news plays an important role in detecting the hot topics being discussed on social networks over a period of time. In this paper, we present a new model for effective Facebook's posts classification and a new dataset which is labeled for the corresponding subject. The dataset consists of 5191 Facebook user's public posts, which is divided into 3 subsets: training, validation and testing data sets. Then, we explore the effectiveness of fine-tuning BERT model with three truncation methods compared with other machine learning algorithms on our dataset. Experimental results show that the fine-tune BERT models outperform other approaches. The fine-tune BERT with “head + tail” truncation methods achieves the best scores with 84.31% of Precision, 84.12% of Recall and 84.15% of F1-score.