基于成对耦合主题模型的搜索查询模式提取

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI:10.1145/2835776.2835794

Takuya Konishi, Takuya Ohwa, Sumio Fujita, K. Ikeda, K. Hayashi

{"title":"基于成对耦合主题模型的搜索查询模式提取","authors":"Takuya Konishi, Takuya Ohwa, Sumio Fujita, K. Ikeda, K. Hayashi","doi":"10.1145/2835776.2835794","DOIUrl":null,"url":null,"abstract":"A fundamental yet new challenge in information retrieval is the identification of patterns behind search queries. For example, the query \"NY restaurant\" and \"boston hotel\" shares the common pattern \"LOCATION SERVICE\". However, because of the diversity of real queries, existing approaches require data preprocessing by humans or specifying the target query domains, which hinders their applicability. We propose a probabilistic topic model that assumes that each term (e.g., \"NY\") has a topic (LOCATION). The key idea is that we consider topic co-occurrence in a query rather than a topic sequence, which significantly reduces computational cost yet enables us to acquire coherent topics without the preprocessing. Using two real query datasets, we demonstrate that the obtained topics are intelligible by humans, and are highly accurate in keyword prediction and query generation tasks.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Extracting Search Query Patterns via the Pairwise Coupled Topic Model\",\"authors\":\"Takuya Konishi, Takuya Ohwa, Sumio Fujita, K. Ikeda, K. Hayashi\",\"doi\":\"10.1145/2835776.2835794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A fundamental yet new challenge in information retrieval is the identification of patterns behind search queries. For example, the query \\\"NY restaurant\\\" and \\\"boston hotel\\\" shares the common pattern \\\"LOCATION SERVICE\\\". However, because of the diversity of real queries, existing approaches require data preprocessing by humans or specifying the target query domains, which hinders their applicability. We propose a probabilistic topic model that assumes that each term (e.g., \\\"NY\\\") has a topic (LOCATION). The key idea is that we consider topic co-occurrence in a query rather than a topic sequence, which significantly reduces computational cost yet enables us to acquire coherent topics without the preprocessing. Using two real query datasets, we demonstrate that the obtained topics are intelligible by humans, and are highly accurate in keyword prediction and query generation tasks.\",\"PeriodicalId\":20567,\"journal\":{\"name\":\"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2835776.2835794\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2835794","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

信息检索中一个基本的新挑战是识别搜索查询背后的模式。例如，查询“NY restaurant”和“boston hotel”共享相同的模式“LOCATION SERVICE”。然而，由于实际查询的多样性，现有的方法需要人工对数据进行预处理或指定目标查询域，这阻碍了它们的适用性。我们提出了一个概率主题模型，该模型假设每个术语(例如“NY”)都有一个主题(LOCATION)。关键思想是我们在查询中考虑主题共现而不是主题序列，这大大降低了计算成本，并且使我们无需预处理即可获得一致的主题。使用两个真实的查询数据集，我们证明了获得的主题是可理解的，并且在关键字预测和查询生成任务中具有很高的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extracting Search Query Patterns via the Pairwise Coupled Topic Model

A fundamental yet new challenge in information retrieval is the identification of patterns behind search queries. For example, the query "NY restaurant" and "boston hotel" shares the common pattern "LOCATION SERVICE". However, because of the diversity of real queries, existing approaches require data preprocessing by humans or specifying the target query domains, which hinders their applicability. We propose a probabilistic topic model that assumes that each term (e.g., "NY") has a topic (LOCATION). The key idea is that we consider topic co-occurrence in a query rather than a topic sequence, which significantly reduces computational cost yet enables us to acquire coherent topics without the preprocessing. Using two real query datasets, we demonstrate that the obtained topics are intelligible by humans, and are highly accurate in keyword prediction and query generation tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量