Ngoc-Dong Pham, Thi-Hanh Le, Thanh-Dat Do, Thanh-Toan Vuong, Thi-Hong Vuong, Quang-Thuy Ha
{"title":"基于混合迁移学习模型和TF-IDF的越南假新闻检测","authors":"Ngoc-Dong Pham, Thi-Hanh Le, Thanh-Dat Do, Thanh-Toan Vuong, Thi-Hong Vuong, Quang-Thuy Ha","doi":"10.1109/KSE53942.2021.9648676","DOIUrl":null,"url":null,"abstract":"There are a lot of studies about fake news detection on English social networks. However, Vietnamese fake news detection on social networks still limit. In this paper, we propose a new approach for Vietnamese Fake News Detection on Social Network Sites using a pre-train language model PhoBERT combine with Term Frequency - Inverse Document Frequency (TF-IDF) for word embedding and Convolutional Neural Network (CNN) for features extracting. Our proposed model is trained and evaluated on the dataset of Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL) shared task. We process text data into two scenarios: raw data and processed data to elucidate the hypothesis of pre-processing data on social networks. In addition, we use the different extra features to improve the efficiency of model. We compare our proposed model with the baseline methods. The proposed model achieved outstanding results with 0.9538 AUC score on raw data.","PeriodicalId":130986,"journal":{"name":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Vietnamese Fake News Detection Based on Hybrid Transfer Learning Model and TF-IDF\",\"authors\":\"Ngoc-Dong Pham, Thi-Hanh Le, Thanh-Dat Do, Thanh-Toan Vuong, Thi-Hong Vuong, Quang-Thuy Ha\",\"doi\":\"10.1109/KSE53942.2021.9648676\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are a lot of studies about fake news detection on English social networks. However, Vietnamese fake news detection on social networks still limit. In this paper, we propose a new approach for Vietnamese Fake News Detection on Social Network Sites using a pre-train language model PhoBERT combine with Term Frequency - Inverse Document Frequency (TF-IDF) for word embedding and Convolutional Neural Network (CNN) for features extracting. Our proposed model is trained and evaluated on the dataset of Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL) shared task. We process text data into two scenarios: raw data and processed data to elucidate the hypothesis of pre-processing data on social networks. In addition, we use the different extra features to improve the efficiency of model. We compare our proposed model with the baseline methods. The proposed model achieved outstanding results with 0.9538 AUC score on raw data.\",\"PeriodicalId\":130986,\"journal\":{\"name\":\"2021 13th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE53942.2021.9648676\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE53942.2021.9648676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Vietnamese Fake News Detection Based on Hybrid Transfer Learning Model and TF-IDF
There are a lot of studies about fake news detection on English social networks. However, Vietnamese fake news detection on social networks still limit. In this paper, we propose a new approach for Vietnamese Fake News Detection on Social Network Sites using a pre-train language model PhoBERT combine with Term Frequency - Inverse Document Frequency (TF-IDF) for word embedding and Convolutional Neural Network (CNN) for features extracting. Our proposed model is trained and evaluated on the dataset of Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL) shared task. We process text data into two scenarios: raw data and processed data to elucidate the hypothesis of pre-processing data on social networks. In addition, we use the different extra features to improve the efficiency of model. We compare our proposed model with the baseline methods. The proposed model achieved outstanding results with 0.9538 AUC score on raw data.