CQA网站中使用监督学习的问答帖子的主题权威答疑人识别

T. P. Sahu, N. K. Nagwani, Shrish Verma
{"title":"CQA网站中使用监督学习的问答帖子的主题权威答疑人识别","authors":"T. P. Sahu, N. K. Nagwani, Shrish Verma","doi":"10.1145/2998476.2998490","DOIUrl":null,"url":null,"abstract":"Community Question Answering (CQA) site is an online platform for hosting information in question-answer form by collaborative users worldwide. There are basically two types of user in this CQA sites: Asker -- who post their query as questions and Answerer -- who provide the answers to these questions. The semi-structured and growing size of contents in CQA sites is posing several challenges. As there is no restriction in posting the number of answers to a question, so the common challenge is to identify the authoritative answerers of a question in order to evaluate the answer quality for selecting the best answer. In this paper, we use latent dirichlet allocation (LDA) the statistical topic modelling on textual data and statistical computing on metadata to identify the features that would reflect the topical authoritative of answerer. Then these features are represented as vector for each answerer of the dataset under investigation for learning the classifier model. The various baseline classifier model are used to identify the topical authoritative answerer on Q&A posts of two real dataset extracted from StackOverflow and AskUbuntu. The correctness and effectiveness of classifier models are evaluated using various parameters like accuracy, precision, recall, and kappa statistic. The experimental result shows that Random Forest classifier outperforms over each evaluation parameter than other classification algorithms.","PeriodicalId":171399,"journal":{"name":"Proceedings of the 9th Annual ACM India Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Topical Authoritative Answerer Identification on Q&A Posts using Supervised Learning in CQA Sites\",\"authors\":\"T. P. Sahu, N. K. Nagwani, Shrish Verma\",\"doi\":\"10.1145/2998476.2998490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Community Question Answering (CQA) site is an online platform for hosting information in question-answer form by collaborative users worldwide. There are basically two types of user in this CQA sites: Asker -- who post their query as questions and Answerer -- who provide the answers to these questions. The semi-structured and growing size of contents in CQA sites is posing several challenges. As there is no restriction in posting the number of answers to a question, so the common challenge is to identify the authoritative answerers of a question in order to evaluate the answer quality for selecting the best answer. In this paper, we use latent dirichlet allocation (LDA) the statistical topic modelling on textual data and statistical computing on metadata to identify the features that would reflect the topical authoritative of answerer. Then these features are represented as vector for each answerer of the dataset under investigation for learning the classifier model. The various baseline classifier model are used to identify the topical authoritative answerer on Q&A posts of two real dataset extracted from StackOverflow and AskUbuntu. The correctness and effectiveness of classifier models are evaluated using various parameters like accuracy, precision, recall, and kappa statistic. The experimental result shows that Random Forest classifier outperforms over each evaluation parameter than other classification algorithms.\",\"PeriodicalId\":171399,\"journal\":{\"name\":\"Proceedings of the 9th Annual ACM India Conference\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th Annual ACM India Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2998476.2998490\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th Annual ACM India Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2998476.2998490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

社区问答(CQA)网站是全球协作用户以问答形式托管信息的在线平台。在CQA网站上基本上有两种类型的用户:提问者(Asker)——他们将自己的问题作为问题发布,而回答者(Answerer)——他们提供这些问题的答案。CQA站点中内容的半结构化和不断增长的规模带来了一些挑战。由于对一个问题的答案数量没有限制,所以常见的挑战是确定一个问题的权威答案,以便评估答案的质量,以选择最佳答案。本文采用潜在狄利克雷分配(latent dirichlet allocation, LDA)方法,对文本数据进行统计主题建模,对元数据进行统计计算,以识别能够反映答题者主题权威性的特征。然后将这些特征表示为正在研究的数据集的每个答案的向量,以学习分类器模型。采用各种基线分类器模型,分别从StackOverflow和AskUbuntu中提取两个真实数据集,对问答帖子中的主题权威答案进行识别。分类器模型的正确性和有效性评估使用各种参数,如准确性,精度,召回率和kappa统计。实验结果表明,随机森林分类器在各评价参数上都优于其他分类算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Topical Authoritative Answerer Identification on Q&A Posts using Supervised Learning in CQA Sites
Community Question Answering (CQA) site is an online platform for hosting information in question-answer form by collaborative users worldwide. There are basically two types of user in this CQA sites: Asker -- who post their query as questions and Answerer -- who provide the answers to these questions. The semi-structured and growing size of contents in CQA sites is posing several challenges. As there is no restriction in posting the number of answers to a question, so the common challenge is to identify the authoritative answerers of a question in order to evaluate the answer quality for selecting the best answer. In this paper, we use latent dirichlet allocation (LDA) the statistical topic modelling on textual data and statistical computing on metadata to identify the features that would reflect the topical authoritative of answerer. Then these features are represented as vector for each answerer of the dataset under investigation for learning the classifier model. The various baseline classifier model are used to identify the topical authoritative answerer on Q&A posts of two real dataset extracted from StackOverflow and AskUbuntu. The correctness and effectiveness of classifier models are evaluated using various parameters like accuracy, precision, recall, and kappa statistic. The experimental result shows that Random Forest classifier outperforms over each evaluation parameter than other classification algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信