{"title":"使用深度学习打击假新闻:预训练词嵌入和嵌入层的研究","authors":"Fantahun Gereme, William Zhu","doi":"10.1145/3440840.3440847","DOIUrl":null,"url":null,"abstract":"Fake news is progressively becoming a threat to individuals, society, news systems, governments and democracy. The need to fight it is rising accompanied by various researches that showed promising results. Deep learning methods and word embeddings contributed a lot in devising detection mechanisms. However, lack of sufficient datasets and the question “which word embedding best captures content features” have posed challenges to make detection methods adequately accurate. In this work, we prepared a dataset from a scrape of 13 years of continuous data that we believe will narrow the gap. We also proposed a deep learning model for early detection of fake news using convolutional neural networks and long short-term memory networks. We evaluated three pre-trained word embeddings in the context of the fake news problem using different measures. Series of experiments were made on three real world datasets, including ours, using the proposed model. Results showed that the choice of pre-trained embeddings can be arbitrary. However, embeddings purely trained from the fake news dataset and pre-trained embeddings allowed to update showed relatively better performance over static embeddings. High dimensional embeddings showed better results than low dimensional embeddings and this persisted for all the datasets used.","PeriodicalId":273859,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Computational Intelligence and Intelligent Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Fighting Fake News Using Deep Learning: Pre-trained Word Embeddings and the Embedding Layer Investigated\",\"authors\":\"Fantahun Gereme, William Zhu\",\"doi\":\"10.1145/3440840.3440847\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fake news is progressively becoming a threat to individuals, society, news systems, governments and democracy. The need to fight it is rising accompanied by various researches that showed promising results. Deep learning methods and word embeddings contributed a lot in devising detection mechanisms. However, lack of sufficient datasets and the question “which word embedding best captures content features” have posed challenges to make detection methods adequately accurate. In this work, we prepared a dataset from a scrape of 13 years of continuous data that we believe will narrow the gap. We also proposed a deep learning model for early detection of fake news using convolutional neural networks and long short-term memory networks. We evaluated three pre-trained word embeddings in the context of the fake news problem using different measures. Series of experiments were made on three real world datasets, including ours, using the proposed model. Results showed that the choice of pre-trained embeddings can be arbitrary. However, embeddings purely trained from the fake news dataset and pre-trained embeddings allowed to update showed relatively better performance over static embeddings. High dimensional embeddings showed better results than low dimensional embeddings and this persisted for all the datasets used.\",\"PeriodicalId\":273859,\"journal\":{\"name\":\"Proceedings of the 2020 3rd International Conference on Computational Intelligence and Intelligent Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 3rd International Conference on Computational Intelligence and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3440840.3440847\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Computational Intelligence and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440840.3440847","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fighting Fake News Using Deep Learning: Pre-trained Word Embeddings and the Embedding Layer Investigated
Fake news is progressively becoming a threat to individuals, society, news systems, governments and democracy. The need to fight it is rising accompanied by various researches that showed promising results. Deep learning methods and word embeddings contributed a lot in devising detection mechanisms. However, lack of sufficient datasets and the question “which word embedding best captures content features” have posed challenges to make detection methods adequately accurate. In this work, we prepared a dataset from a scrape of 13 years of continuous data that we believe will narrow the gap. We also proposed a deep learning model for early detection of fake news using convolutional neural networks and long short-term memory networks. We evaluated three pre-trained word embeddings in the context of the fake news problem using different measures. Series of experiments were made on three real world datasets, including ours, using the proposed model. Results showed that the choice of pre-trained embeddings can be arbitrary. However, embeddings purely trained from the fake news dataset and pre-trained embeddings allowed to update showed relatively better performance over static embeddings. High dimensional embeddings showed better results than low dimensional embeddings and this persisted for all the datasets used.