CQA网站中使用监督学习的问答帖子的主题权威答疑人识别

Proceedings of the 9th Annual ACM India Conference Pub Date : 2016-10-21 DOI:10.1145/2998476.2998490

T. P. Sahu, N. K. Nagwani, Shrish Verma

{"title":"CQA网站中使用监督学习的问答帖子的主题权威答疑人识别","authors":"T. P. Sahu, N. K. Nagwani, Shrish Verma","doi":"10.1145/2998476.2998490","DOIUrl":null,"url":null,"abstract":"Community Question Answering (CQA) site is an online platform for hosting information in question-answer form by collaborative users worldwide. There are basically two types of user in this CQA sites: Asker -- who post their query as questions and Answerer -- who provide the answers to these questions. The semi-structured and growing size of contents in CQA sites is posing several challenges. As there is no restriction in posting the number of answers to a question, so the common challenge is to identify the authoritative answerers of a question in order to evaluate the answer quality for selecting the best answer. In this paper, we use latent dirichlet allocation (LDA) the statistical topic modelling on textual data and statistical computing on metadata to identify the features that would reflect the topical authoritative of answerer. Then these features are represented as vector for each answerer of the dataset under investigation for learning the classifier model. The various baseline classifier model are used to identify the topical authoritative answerer on Q&A posts of two real dataset extracted from StackOverflow and AskUbuntu. The correctness and effectiveness of classifier models are evaluated using various parameters like accuracy, precision, recall, and kappa statistic. The experimental result shows that Random Forest classifier outperforms over each evaluation parameter than other classification algorithms.","PeriodicalId":171399,"journal":{"name":"Proceedings of the 9th Annual ACM India Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Topical Authoritative Answerer Identification on Q&A Posts using Supervised Learning in CQA Sites\",\"authors\":\"T. P. Sahu, N. K. Nagwani, Shrish Verma\",\"doi\":\"10.1145/2998476.2998490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Community Question Answering (CQA) site is an online platform for hosting information in question-answer form by collaborative users worldwide. There are basically two types of user in this CQA sites: Asker -- who post their query as questions and Answerer -- who provide the answers to these questions. The semi-structured and growing size of contents in CQA sites is posing several challenges. As there is no restriction in posting the number of answers to a question, so the common challenge is to identify the authoritative answerers of a question in order to evaluate the answer quality for selecting the best answer. In this paper, we use latent dirichlet allocation (LDA) the statistical topic modelling on textual data and statistical computing on metadata to identify the features that would reflect the topical authoritative of answerer. Then these features are represented as vector for each answerer of the dataset under investigation for learning the classifier model. The various baseline classifier model are used to identify the topical authoritative answerer on Q&A posts of two real dataset extracted from StackOverflow and AskUbuntu. The correctness and effectiveness of classifier models are evaluated using various parameters like accuracy, precision, recall, and kappa statistic. The experimental result shows that Random Forest classifier outperforms over each evaluation parameter than other classification algorithms.\",\"PeriodicalId\":171399,\"journal\":{\"name\":\"Proceedings of the 9th Annual ACM India Conference\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th Annual ACM India Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2998476.2998490\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th Annual ACM India Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2998476.2998490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

社区问答(CQA)网站是全球协作用户以问答形式托管信息的在线平台。在CQA网站上基本上有两种类型的用户:提问者(Asker)——他们将自己的问题作为问题发布，而回答者(Answerer)——他们提供这些问题的答案。CQA站点中内容的半结构化和不断增长的规模带来了一些挑战。由于对一个问题的答案数量没有限制，所以常见的挑战是确定一个问题的权威答案，以便评估答案的质量，以选择最佳答案。本文采用潜在狄利克雷分配(latent dirichlet allocation, LDA)方法，对文本数据进行统计主题建模，对元数据进行统计计算，以识别能够反映答题者主题权威性的特征。然后将这些特征表示为正在研究的数据集的每个答案的向量，以学习分类器模型。采用各种基线分类器模型，分别从StackOverflow和AskUbuntu中提取两个真实数据集，对问答帖子中的主题权威答案进行识别。分类器模型的正确性和有效性评估使用各种参数，如准确性，精度，召回率和kappa统计。实验结果表明，随机森林分类器在各评价参数上都优于其他分类算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Topical Authoritative Answerer Identification on Q&A Posts using Supervised Learning in CQA Sites

Community Question Answering (CQA) site is an online platform for hosting information in question-answer form by collaborative users worldwide. There are basically two types of user in this CQA sites: Asker -- who post their query as questions and Answerer -- who provide the answers to these questions. The semi-structured and growing size of contents in CQA sites is posing several challenges. As there is no restriction in posting the number of answers to a question, so the common challenge is to identify the authoritative answerers of a question in order to evaluate the answer quality for selecting the best answer. In this paper, we use latent dirichlet allocation (LDA) the statistical topic modelling on textual data and statistical computing on metadata to identify the features that would reflect the topical authoritative of answerer. Then these features are represented as vector for each answerer of the dataset under investigation for learning the classifier model. The various baseline classifier model are used to identify the topical authoritative answerer on Q&A posts of two real dataset extracted from StackOverflow and AskUbuntu. The correctness and effectiveness of classifier models are evaluated using various parameters like accuracy, precision, recall, and kappa statistic. The experimental result shows that Random Forest classifier outperforms over each evaluation parameter than other classification algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 9th Annual ACM India Conference

自引率

0.00%

发文量