{"title":"Temporal Query Expansion Using a Continuous Hidden Markov Model","authors":"J. Rao, Jimmy J. Lin","doi":"10.1145/2970398.2970424","DOIUrl":null,"url":null,"abstract":"In standard formulations of pseudo-relevance feedback, document timestamps do not play a role in identifying expansion terms. Yet we know that when searching social media posts such as tweets, relevant documents are bursty and usually occur in temporal clusters. The main insight of our work is that term expansions should be biased to draw from documents that occur in bursty temporal clusters. This is formally captured by a continuous hidden Markov model (cHMM), for which we derive an EM algorithm for parameter estimation. Given a query, we estimate the parameters for a cHMM that best explains the observed distribution of an initial set of retrieved documents, and then use Viterbi decoding to compute the most likely state sequence. In identifying expansion terms, we only select documents from bursty states. Experiments on test collections from the TREC 2011 and 2012 Microblog tracks show that our approach is significantly more effective than the popular RM3 pseudo-relevance feedback model.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2970398.2970424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
In standard formulations of pseudo-relevance feedback, document timestamps do not play a role in identifying expansion terms. Yet we know that when searching social media posts such as tweets, relevant documents are bursty and usually occur in temporal clusters. The main insight of our work is that term expansions should be biased to draw from documents that occur in bursty temporal clusters. This is formally captured by a continuous hidden Markov model (cHMM), for which we derive an EM algorithm for parameter estimation. Given a query, we estimate the parameters for a cHMM that best explains the observed distribution of an initial set of retrieved documents, and then use Viterbi decoding to compute the most likely state sequence. In identifying expansion terms, we only select documents from bursty states. Experiments on test collections from the TREC 2011 and 2012 Microblog tracks show that our approach is significantly more effective than the popular RM3 pseudo-relevance feedback model.