{"title":"Thai Clickbait Detection Algorithms Using Natural Language Processing with Machine Learning Techniques","authors":"Praphan Klairith, Sansiri Tanachutiwat","doi":"10.1109/ICEAST.2018.8434447","DOIUrl":null,"url":null,"abstract":"This paper proposes the approach based on machine learning for detection of Thai clickbait. The clickbait messages often adopt eye-catching on wording, lagging of information on a content to attract visitors. We contribute the clickbait corpus by crowdsourcing, 30,000 of headlines are selected to draw up the dataset. In this work attempt to develop clickbait detection model using two type of features in the embedding layer and three different of networks in the hidden layer. BiLSTM with word level embedding performs very well achieving accuracy rate of 0.98, fl-score of 0.98.","PeriodicalId":138654,"journal":{"name":"2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEAST.2018.8434447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
This paper proposes the approach based on machine learning for detection of Thai clickbait. The clickbait messages often adopt eye-catching on wording, lagging of information on a content to attract visitors. We contribute the clickbait corpus by crowdsourcing, 30,000 of headlines are selected to draw up the dataset. In this work attempt to develop clickbait detection model using two type of features in the embedding layer and three different of networks in the hidden layer. BiLSTM with word level embedding performs very well achieving accuracy rate of 0.98, fl-score of 0.98.