{"title":"问答网站上不真实问题的分类:元文本特征和词嵌入","authors":"M. Al-Ramahi, I. Alsmadi","doi":"10.1080/2573234X.2021.1895681","DOIUrl":null,"url":null,"abstract":"ABSTRACT The power of information and information exchange defines the current Internet and Online Social Networks (OSNs). With such power and influence, individuals and entities expose those networks to different types of false information. This paper proposes several classification models based on Quora insincere questions; a dataset released by Kaggle. We evaluated several models including word embeddings based on meta and word-level features. Best results were achieved using the BERT transformer with an overall accuracy of more than 95% on several individual classifiers. Overall, results indicated that the meta-textual features are important predictors for whether a question is sincere or not. In one implication, we noticed that users are putting more cognitive efforts into writing more readable sincere questions compared to insincere questions. Moreover, a dictionary is assembled from several explicit dictionaries and significant words selected from Quora questions. The dictionary showed a good performance in predicting insincere questions.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2021-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Classifying insincere questions on Question Answering (QA) websites: meta-textual features and word embedding\",\"authors\":\"M. Al-Ramahi, I. Alsmadi\",\"doi\":\"10.1080/2573234X.2021.1895681\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT The power of information and information exchange defines the current Internet and Online Social Networks (OSNs). With such power and influence, individuals and entities expose those networks to different types of false information. This paper proposes several classification models based on Quora insincere questions; a dataset released by Kaggle. We evaluated several models including word embeddings based on meta and word-level features. Best results were achieved using the BERT transformer with an overall accuracy of more than 95% on several individual classifiers. Overall, results indicated that the meta-textual features are important predictors for whether a question is sincere or not. In one implication, we noticed that users are putting more cognitive efforts into writing more readable sincere questions compared to insincere questions. Moreover, a dictionary is assembled from several explicit dictionaries and significant words selected from Quora questions. The dictionary showed a good performance in predicting insincere questions.\",\"PeriodicalId\":36417,\"journal\":{\"name\":\"Journal of Business Analytics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2021-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Business Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/2573234X.2021.1895681\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234X.2021.1895681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Classifying insincere questions on Question Answering (QA) websites: meta-textual features and word embedding
ABSTRACT The power of information and information exchange defines the current Internet and Online Social Networks (OSNs). With such power and influence, individuals and entities expose those networks to different types of false information. This paper proposes several classification models based on Quora insincere questions; a dataset released by Kaggle. We evaluated several models including word embeddings based on meta and word-level features. Best results were achieved using the BERT transformer with an overall accuracy of more than 95% on several individual classifiers. Overall, results indicated that the meta-textual features are important predictors for whether a question is sincere or not. In one implication, we noticed that users are putting more cognitive efforts into writing more readable sincere questions compared to insincere questions. Moreover, a dictionary is assembled from several explicit dictionaries and significant words selected from Quora questions. The dictionary showed a good performance in predicting insincere questions.