{"title":"Classifying insincere questions on Question Answering (QA) websites: meta-textual features and word embedding","authors":"M. Al-Ramahi, I. Alsmadi","doi":"10.1080/2573234X.2021.1895681","DOIUrl":null,"url":null,"abstract":"ABSTRACT The power of information and information exchange defines the current Internet and Online Social Networks (OSNs). With such power and influence, individuals and entities expose those networks to different types of false information. This paper proposes several classification models based on Quora insincere questions; a dataset released by Kaggle. We evaluated several models including word embeddings based on meta and word-level features. Best results were achieved using the BERT transformer with an overall accuracy of more than 95% on several individual classifiers. Overall, results indicated that the meta-textual features are important predictors for whether a question is sincere or not. In one implication, we noticed that users are putting more cognitive efforts into writing more readable sincere questions compared to insincere questions. Moreover, a dictionary is assembled from several explicit dictionaries and significant words selected from Quora questions. The dictionary showed a good performance in predicting insincere questions.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2021-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234X.2021.1895681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2
Abstract
ABSTRACT The power of information and information exchange defines the current Internet and Online Social Networks (OSNs). With such power and influence, individuals and entities expose those networks to different types of false information. This paper proposes several classification models based on Quora insincere questions; a dataset released by Kaggle. We evaluated several models including word embeddings based on meta and word-level features. Best results were achieved using the BERT transformer with an overall accuracy of more than 95% on several individual classifiers. Overall, results indicated that the meta-textual features are important predictors for whether a question is sincere or not. In one implication, we noticed that users are putting more cognitive efforts into writing more readable sincere questions compared to insincere questions. Moreover, a dictionary is assembled from several explicit dictionaries and significant words selected from Quora questions. The dictionary showed a good performance in predicting insincere questions.