M. Sarnovský, V. Maslej-Krešňáková, Nikola Hrabovska
{"title":"Annotated dataset for the fake news classification in Slovak language","authors":"M. Sarnovský, V. Maslej-Krešňáková, Nikola Hrabovska","doi":"10.1109/ICETA51985.2020.9379254","DOIUrl":null,"url":null,"abstract":"Fake news detection currently presents an active field of research. Detection methods based on natural language processing and machine learning are being developed to automatically identify the possible misinformation contained within the news articles. To successfully train these models, annotated data are needed. In English language, multiple human-annotated datasets already are available and are being widely used in the research. The main objective of the work presented in this paper, was to create similar dataset consisting of articles in Slovak language. We collected the data from the various local news portals including reputable publishers as well as suspicious conspiratory portals. To obtain the annotations, we used crowdsourcing approach. Annotated dataset was used in preliminary experiments, in which neural network classifier was trained and evaluated.","PeriodicalId":149716,"journal":{"name":"2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETA51985.2020.9379254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Fake news detection currently presents an active field of research. Detection methods based on natural language processing and machine learning are being developed to automatically identify the possible misinformation contained within the news articles. To successfully train these models, annotated data are needed. In English language, multiple human-annotated datasets already are available and are being widely used in the research. The main objective of the work presented in this paper, was to create similar dataset consisting of articles in Slovak language. We collected the data from the various local news portals including reputable publishers as well as suspicious conspiratory portals. To obtain the annotations, we used crowdsourcing approach. Annotated dataset was used in preliminary experiments, in which neural network classifier was trained and evaluated.