Partially Labeled Supervised Topic Models for RetrievingSimilar Questions in CQA Forums

Debasis Ganguly, G. Jones
{"title":"Partially Labeled Supervised Topic Models for RetrievingSimilar Questions in CQA Forums","authors":"Debasis Ganguly, G. Jones","doi":"10.1145/2808194.2809460","DOIUrl":null,"url":null,"abstract":"Manual annotations, e.g. tags and links, of user generated content in community question answering forums and social media play an important role in making the content searchable. During the active phase of a new question entered into a CQA forum, a moderator or an answerer often has to make a significant effort to manually search for related question threads (which we refer to as documents), that he may consider linking to the current question. This manual effort can be greatly reduced by an automated search process to suggest a list of candidate documents to be linked to the new document. We described our investigation of link recommendation for this task. We approach the problem as an ad-hoc information retrieval (IR) task in which a new document (question) acts as the query and the intention is to retrieve a list of potentially relevant documents (previously asked questions in the forum), which could then be linked (manually) to the new one. In contrast to standard ad-hoc search, two pieces of human annotated additional information, namely the tags of the documents and the known links between existing document pairs, can potentially be used to improve the search quality for new questions. To utilize this additional information, we propose a generative model of tagged documents which jointly estimates the distribution of topics corresponding to each tag of a document along with the likelihood of a document being linked to another one. The model predictions are then incorporated in the query likelihood estimate of a standard language model (LM) of IR. Experiments conducted on three months of a crawled StackOverflow dataset show that utilizing the tag specific topic distributions results in a significant improvement in retrieval of the candidate set of related documents.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"230 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808194.2809460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Manual annotations, e.g. tags and links, of user generated content in community question answering forums and social media play an important role in making the content searchable. During the active phase of a new question entered into a CQA forum, a moderator or an answerer often has to make a significant effort to manually search for related question threads (which we refer to as documents), that he may consider linking to the current question. This manual effort can be greatly reduced by an automated search process to suggest a list of candidate documents to be linked to the new document. We described our investigation of link recommendation for this task. We approach the problem as an ad-hoc information retrieval (IR) task in which a new document (question) acts as the query and the intention is to retrieve a list of potentially relevant documents (previously asked questions in the forum), which could then be linked (manually) to the new one. In contrast to standard ad-hoc search, two pieces of human annotated additional information, namely the tags of the documents and the known links between existing document pairs, can potentially be used to improve the search quality for new questions. To utilize this additional information, we propose a generative model of tagged documents which jointly estimates the distribution of topics corresponding to each tag of a document along with the likelihood of a document being linked to another one. The model predictions are then incorporated in the query likelihood estimate of a standard language model (LM) of IR. Experiments conducted on three months of a crawled StackOverflow dataset show that utilizing the tag specific topic distributions results in a significant improvement in retrieval of the candidate set of related documents.
用于检索CQA论坛中相似问题的部分标记监督主题模型
在社区问答论坛和社交媒体中,用户生成内容的手动注释(如标签和链接)在使内容可搜索方面起着重要作用。在进入CQA论坛的新问题的活跃阶段,版主或回答者通常必须付出巨大的努力,手动搜索相关的问题线程(我们称之为文档),他可能会考虑将其链接到当前问题。通过自动搜索过程建议将候选文档列表链接到新文档,可以大大减少这种手工工作。我们描述了我们对该任务的链接推荐的调查。我们把这个问题当作一个特别的信息检索(IR)任务来处理,其中一个新文档(问题)充当查询,目的是检索潜在相关文档(以前在论坛中提出的问题)的列表,然后可以(手动)链接到新的文档。与标准的特别搜索相比,两个人工注释的附加信息,即文档的标记和现有文档对之间的已知链接,可以潜在地用于提高新问题的搜索质量。为了利用这些额外的信息,我们提出了一个标记文档的生成模型,该模型联合估计文档的每个标签对应的主题分布以及文档链接到另一个文档的可能性。然后将模型预测合并到IR的标准语言模型(LM)的查询似然估计中。在三个月的爬行StackOverflow数据集上进行的实验表明,利用特定于标签的主题分布可以显著改善相关文档候选集的检索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信