M. Sarnovský, V. Maslej-Krešňáková, Nikola Hrabovska
{"title":"斯洛伐克语假新闻分类的标注数据集","authors":"M. Sarnovský, V. Maslej-Krešňáková, Nikola Hrabovska","doi":"10.1109/ICETA51985.2020.9379254","DOIUrl":null,"url":null,"abstract":"Fake news detection currently presents an active field of research. Detection methods based on natural language processing and machine learning are being developed to automatically identify the possible misinformation contained within the news articles. To successfully train these models, annotated data are needed. In English language, multiple human-annotated datasets already are available and are being widely used in the research. The main objective of the work presented in this paper, was to create similar dataset consisting of articles in Slovak language. We collected the data from the various local news portals including reputable publishers as well as suspicious conspiratory portals. To obtain the annotations, we used crowdsourcing approach. Annotated dataset was used in preliminary experiments, in which neural network classifier was trained and evaluated.","PeriodicalId":149716,"journal":{"name":"2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA)","volume":"9 42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Annotated dataset for the fake news classification in Slovak language\",\"authors\":\"M. Sarnovský, V. Maslej-Krešňáková, Nikola Hrabovska\",\"doi\":\"10.1109/ICETA51985.2020.9379254\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fake news detection currently presents an active field of research. Detection methods based on natural language processing and machine learning are being developed to automatically identify the possible misinformation contained within the news articles. To successfully train these models, annotated data are needed. In English language, multiple human-annotated datasets already are available and are being widely used in the research. The main objective of the work presented in this paper, was to create similar dataset consisting of articles in Slovak language. We collected the data from the various local news portals including reputable publishers as well as suspicious conspiratory portals. To obtain the annotations, we used crowdsourcing approach. Annotated dataset was used in preliminary experiments, in which neural network classifier was trained and evaluated.\",\"PeriodicalId\":149716,\"journal\":{\"name\":\"2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA)\",\"volume\":\"9 42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICETA51985.2020.9379254\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETA51985.2020.9379254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Annotated dataset for the fake news classification in Slovak language
Fake news detection currently presents an active field of research. Detection methods based on natural language processing and machine learning are being developed to automatically identify the possible misinformation contained within the news articles. To successfully train these models, annotated data are needed. In English language, multiple human-annotated datasets already are available and are being widely used in the research. The main objective of the work presented in this paper, was to create similar dataset consisting of articles in Slovak language. We collected the data from the various local news portals including reputable publishers as well as suspicious conspiratory portals. To obtain the annotations, we used crowdsourcing approach. Annotated dataset was used in preliminary experiments, in which neural network classifier was trained and evaluated.