Takuya Konishi, Takuya Ohwa, Sumio Fujita, K. Ikeda, K. Hayashi
{"title":"基于成对耦合主题模型的搜索查询模式提取","authors":"Takuya Konishi, Takuya Ohwa, Sumio Fujita, K. Ikeda, K. Hayashi","doi":"10.1145/2835776.2835794","DOIUrl":null,"url":null,"abstract":"A fundamental yet new challenge in information retrieval is the identification of patterns behind search queries. For example, the query \"NY restaurant\" and \"boston hotel\" shares the common pattern \"LOCATION SERVICE\". However, because of the diversity of real queries, existing approaches require data preprocessing by humans or specifying the target query domains, which hinders their applicability. We propose a probabilistic topic model that assumes that each term (e.g., \"NY\") has a topic (LOCATION). The key idea is that we consider topic co-occurrence in a query rather than a topic sequence, which significantly reduces computational cost yet enables us to acquire coherent topics without the preprocessing. Using two real query datasets, we demonstrate that the obtained topics are intelligible by humans, and are highly accurate in keyword prediction and query generation tasks.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Extracting Search Query Patterns via the Pairwise Coupled Topic Model\",\"authors\":\"Takuya Konishi, Takuya Ohwa, Sumio Fujita, K. Ikeda, K. Hayashi\",\"doi\":\"10.1145/2835776.2835794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A fundamental yet new challenge in information retrieval is the identification of patterns behind search queries. For example, the query \\\"NY restaurant\\\" and \\\"boston hotel\\\" shares the common pattern \\\"LOCATION SERVICE\\\". However, because of the diversity of real queries, existing approaches require data preprocessing by humans or specifying the target query domains, which hinders their applicability. We propose a probabilistic topic model that assumes that each term (e.g., \\\"NY\\\") has a topic (LOCATION). The key idea is that we consider topic co-occurrence in a query rather than a topic sequence, which significantly reduces computational cost yet enables us to acquire coherent topics without the preprocessing. Using two real query datasets, we demonstrate that the obtained topics are intelligible by humans, and are highly accurate in keyword prediction and query generation tasks.\",\"PeriodicalId\":20567,\"journal\":{\"name\":\"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2835776.2835794\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2835794","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extracting Search Query Patterns via the Pairwise Coupled Topic Model
A fundamental yet new challenge in information retrieval is the identification of patterns behind search queries. For example, the query "NY restaurant" and "boston hotel" shares the common pattern "LOCATION SERVICE". However, because of the diversity of real queries, existing approaches require data preprocessing by humans or specifying the target query domains, which hinders their applicability. We propose a probabilistic topic model that assumes that each term (e.g., "NY") has a topic (LOCATION). The key idea is that we consider topic co-occurrence in a query rather than a topic sequence, which significantly reduces computational cost yet enables us to acquire coherent topics without the preprocessing. Using two real query datasets, we demonstrate that the obtained topics are intelligible by humans, and are highly accurate in keyword prediction and query generation tasks.