{"title":"Context Free Frequently Asked Questions Detection Using Machine Learning Techniques","authors":"Fatemeh Razzaghi, Hamed Minaee, A. Ghorbani","doi":"10.1109/WI.2016.0095","DOIUrl":null,"url":null,"abstract":"FAQs are the lists of common questions and answers on particular topics. Today one can find them in almost all web sites on the internet and they can be a great tool to give information to the users. Questions in FAQs are usually identified by the site administrators on the basis of the questions that are asked by their users. While such questions can respond to required information about a service, topic, or particular subject, they can not easily be distinguished from non-FAQ questions. This paper describes machine learning based parsing and question classification for FAQs. We demonstrate that questions for FAQs can be distinguished from other types of questions. Identification of specific features is the key to obtaining an accurate FAQ classifier. We propose a simple yet effective feature set including bag of words, lexical, syntactical, and semantic features. To evaluate our proposed methods, we gathered a large data set of FAQs in three different contexts, which were labeled by humans from real data. We showed that the SVM and Naive Bayes reach the accuracy of 80.3%, which is an outstanding result for the early stage research on FAQ classification. Experimental results show that the proposed approach can be a practical tool for question answering systems. To evaluate the accuracy of our classifier we have conducted an evaluation process and built the questionnaire. Therefore, we compared our classifier ranked questions with user rates and almost 81% similarity of the question ratings gives some confidence.","PeriodicalId":6513,"journal":{"name":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"27 1","pages":"558-561"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2016.0095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
FAQs are the lists of common questions and answers on particular topics. Today one can find them in almost all web sites on the internet and they can be a great tool to give information to the users. Questions in FAQs are usually identified by the site administrators on the basis of the questions that are asked by their users. While such questions can respond to required information about a service, topic, or particular subject, they can not easily be distinguished from non-FAQ questions. This paper describes machine learning based parsing and question classification for FAQs. We demonstrate that questions for FAQs can be distinguished from other types of questions. Identification of specific features is the key to obtaining an accurate FAQ classifier. We propose a simple yet effective feature set including bag of words, lexical, syntactical, and semantic features. To evaluate our proposed methods, we gathered a large data set of FAQs in three different contexts, which were labeled by humans from real data. We showed that the SVM and Naive Bayes reach the accuracy of 80.3%, which is an outstanding result for the early stage research on FAQ classification. Experimental results show that the proposed approach can be a practical tool for question answering systems. To evaluate the accuracy of our classifier we have conducted an evaluation process and built the questionnaire. Therefore, we compared our classifier ranked questions with user rates and almost 81% similarity of the question ratings gives some confidence.