Tong Liu, K. Yu, Lu Wang, Xuanyu Zhang, Xiaofei Wu
{"title":"WCD: A New Chinese Online Social Media Dataset for Clickbait Analysis and Detection","authors":"Tong Liu, K. Yu, Lu Wang, Xuanyu Zhang, Xiaofei Wu","doi":"10.1109/IC-NIDC54101.2021.9660453","DOIUrl":null,"url":null,"abstract":"In online social medias, there is a large amount of clickbait using various tricks such as curious words and well-designed sentence structures, to attract users to click on hyperlinks for unknown benefits. Clickbait detection aims to detect these hyperlinks through automated algorithms. Most of the previous clickbait datasets are based on English online social media corpus. Detection models based on these datasets usually cannot be well generalized to Chinese social media scenarios. In this paper, we construct a WeChat based Chinese clickbait dataset, i.e., WCD. Based on the WCD, we conduct a detailed analysis of the clickbait features from three aspects: behavior features, headline text features, and content text features. Finally, we use popular methods for clickbait detection on our dataset. We also respectively propose a machine learning detection model using feature fusion and a deep learning detection model combining headline semantic and POS tag information, both of which achieve excellent detection performance. The results of clickbait analysis and detection show that the dataset we constructed is of high quality.","PeriodicalId":264468,"journal":{"name":"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC-NIDC54101.2021.9660453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In online social medias, there is a large amount of clickbait using various tricks such as curious words and well-designed sentence structures, to attract users to click on hyperlinks for unknown benefits. Clickbait detection aims to detect these hyperlinks through automated algorithms. Most of the previous clickbait datasets are based on English online social media corpus. Detection models based on these datasets usually cannot be well generalized to Chinese social media scenarios. In this paper, we construct a WeChat based Chinese clickbait dataset, i.e., WCD. Based on the WCD, we conduct a detailed analysis of the clickbait features from three aspects: behavior features, headline text features, and content text features. Finally, we use popular methods for clickbait detection on our dataset. We also respectively propose a machine learning detection model using feature fusion and a deep learning detection model combining headline semantic and POS tag information, both of which achieve excellent detection performance. The results of clickbait analysis and detection show that the dataset we constructed is of high quality.