Ngoc-Dong Pham, Thi-Hanh Le, Thanh-Dat Do, Thanh-Toan Vuong, Thi-Hong Vuong, Quang-Thuy Ha
{"title":"Vietnamese Fake News Detection Based on Hybrid Transfer Learning Model and TF-IDF","authors":"Ngoc-Dong Pham, Thi-Hanh Le, Thanh-Dat Do, Thanh-Toan Vuong, Thi-Hong Vuong, Quang-Thuy Ha","doi":"10.1109/KSE53942.2021.9648676","DOIUrl":null,"url":null,"abstract":"There are a lot of studies about fake news detection on English social networks. However, Vietnamese fake news detection on social networks still limit. In this paper, we propose a new approach for Vietnamese Fake News Detection on Social Network Sites using a pre-train language model PhoBERT combine with Term Frequency - Inverse Document Frequency (TF-IDF) for word embedding and Convolutional Neural Network (CNN) for features extracting. Our proposed model is trained and evaluated on the dataset of Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL) shared task. We process text data into two scenarios: raw data and processed data to elucidate the hypothesis of pre-processing data on social networks. In addition, we use the different extra features to improve the efficiency of model. We compare our proposed model with the baseline methods. The proposed model achieved outstanding results with 0.9538 AUC score on raw data.","PeriodicalId":130986,"journal":{"name":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE53942.2021.9648676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
There are a lot of studies about fake news detection on English social networks. However, Vietnamese fake news detection on social networks still limit. In this paper, we propose a new approach for Vietnamese Fake News Detection on Social Network Sites using a pre-train language model PhoBERT combine with Term Frequency - Inverse Document Frequency (TF-IDF) for word embedding and Convolutional Neural Network (CNN) for features extracting. Our proposed model is trained and evaluated on the dataset of Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL) shared task. We process text data into two scenarios: raw data and processed data to elucidate the hypothesis of pre-processing data on social networks. In addition, we use the different extra features to improve the efficiency of model. We compare our proposed model with the baseline methods. The proposed model achieved outstanding results with 0.9538 AUC score on raw data.