Context Free Frequently Asked Questions Detection Using Machine Learning Techniques

2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI) Pub Date : 2016-10-01 DOI:10.1109/WI.2016.0095

Fatemeh Razzaghi, Hamed Minaee, A. Ghorbani

{"title":"Context Free Frequently Asked Questions Detection Using Machine Learning Techniques","authors":"Fatemeh Razzaghi, Hamed Minaee, A. Ghorbani","doi":"10.1109/WI.2016.0095","DOIUrl":null,"url":null,"abstract":"FAQs are the lists of common questions and answers on particular topics. Today one can find them in almost all web sites on the internet and they can be a great tool to give information to the users. Questions in FAQs are usually identified by the site administrators on the basis of the questions that are asked by their users. While such questions can respond to required information about a service, topic, or particular subject, they can not easily be distinguished from non-FAQ questions. This paper describes machine learning based parsing and question classification for FAQs. We demonstrate that questions for FAQs can be distinguished from other types of questions. Identification of specific features is the key to obtaining an accurate FAQ classifier. We propose a simple yet effective feature set including bag of words, lexical, syntactical, and semantic features. To evaluate our proposed methods, we gathered a large data set of FAQs in three different contexts, which were labeled by humans from real data. We showed that the SVM and Naive Bayes reach the accuracy of 80.3%, which is an outstanding result for the early stage research on FAQ classification. Experimental results show that the proposed approach can be a practical tool for question answering systems. To evaluate the accuracy of our classifier we have conducted an evaluation process and built the questionnaire. Therefore, we compared our classifier ranked questions with user rates and almost 81% similarity of the question ratings gives some confidence.","PeriodicalId":6513,"journal":{"name":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"27 1","pages":"558-561"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2016.0095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

FAQs are the lists of common questions and answers on particular topics. Today one can find them in almost all web sites on the internet and they can be a great tool to give information to the users. Questions in FAQs are usually identified by the site administrators on the basis of the questions that are asked by their users. While such questions can respond to required information about a service, topic, or particular subject, they can not easily be distinguished from non-FAQ questions. This paper describes machine learning based parsing and question classification for FAQs. We demonstrate that questions for FAQs can be distinguished from other types of questions. Identification of specific features is the key to obtaining an accurate FAQ classifier. We propose a simple yet effective feature set including bag of words, lexical, syntactical, and semantic features. To evaluate our proposed methods, we gathered a large data set of FAQs in three different contexts, which were labeled by humans from real data. We showed that the SVM and Naive Bayes reach the accuracy of 80.3%, which is an outstanding result for the early stage research on FAQ classification. Experimental results show that the proposed approach can be a practical tool for question answering systems. To evaluate the accuracy of our classifier we have conducted an evaluation process and built the questionnaire. Therefore, we compared our classifier ranked questions with user rates and almost 81% similarity of the question ratings gives some confidence.

查看原文本刊更多论文

使用机器学习技术的无上下文常见问题检测

faq是关于特定主题的常见问题和答案的列表。今天，人们可以在互联网上几乎所有的网站上找到它们，它们可以成为向用户提供信息的好工具。faq中的问题通常由站点管理员根据用户提出的问题确定。虽然这些问题可以回答有关服务、主题或特定主题的所需信息，但它们不容易与非faq问题区分开来。本文描述了基于机器学习的faq解析和问题分类。我们证明faq的问题可以与其他类型的问题区分开来。识别特定特征是获得准确FAQ分类器的关键。我们提出了一个简单而有效的特征集，包括单词、词汇、句法和语义特征。为了评估我们提出的方法，我们在三种不同的环境中收集了大量的faq数据集，这些数据集由人类从真实数据中标记。我们发现SVM和朴素贝叶斯的准确率达到了80.3%，这对于FAQ分类的早期研究来说是一个突出的结果。实验结果表明，该方法可以作为一种实用的问答系统工具。为了评估我们分类器的准确性，我们进行了一个评估过程并构建了问卷。因此，我们将分类器排名的问题与用户率进行了比较，几乎81%的问题评级相似性给出了一定的信心。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)

自引率

0.00%

发文量