人们在搜索什么?学术搜索引擎查询日志分析

2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-09-01 DOI:10.1109/JCDL52503.2021.00062

Shaurya Rohatgi, C. Lee Giles

{"title":"人们在搜索什么?学术搜索引擎查询日志分析","authors":"Shaurya Rohatgi, C. Lee Giles","doi":"10.1109/JCDL52503.2021.00062","DOIUrl":null,"url":null,"abstract":"Academic search engines have served the research community for years, yet there is little work done on understanding the taxonomy of query semantics. In this work, we present our findings of analyzing the query log of an academic search engine in the past four years. We study the distribution of query intents to understand the information requested by users. We classify query strings by topics using shallow and latent features captured using a customized word embedding model. To this end, we create a dataset that has scientific keywords and titles labeled with fields of study. This dataset is later used to train a classifier that discriminates query logs by topics. Our work will help to train better learning-based ranking functions that improve user experiences for an academic search engine. In addition, we anonymize our 14,759,852 query logs and make them available to the research community for further exploration.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"What Were People Searching For? A Query Log Analysis of An Academic Search Engine\",\"authors\":\"Shaurya Rohatgi, C. Lee Giles\",\"doi\":\"10.1109/JCDL52503.2021.00062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Academic search engines have served the research community for years, yet there is little work done on understanding the taxonomy of query semantics. In this work, we present our findings of analyzing the query log of an academic search engine in the past four years. We study the distribution of query intents to understand the information requested by users. We classify query strings by topics using shallow and latent features captured using a customized word embedding model. To this end, we create a dataset that has scientific keywords and titles labeled with fields of study. This dataset is later used to train a classifier that discriminates query logs by topics. Our work will help to train better learning-based ranking functions that improve user experiences for an academic search engine. In addition, we anonymize our 14,759,852 query logs and make them available to the research community for further exploration.\",\"PeriodicalId\":112400,\"journal\":{\"name\":\"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCDL52503.2021.00062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCDL52503.2021.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

学术搜索引擎已经为研究界服务多年，但在理解查询语义的分类方面做的工作很少。在这项工作中，我们分析了一个学术搜索引擎在过去四年中的查询日志。我们研究查询意图的分布，以了解用户所请求的信息。我们使用自定义词嵌入模型捕获的浅特征和潜在特征，根据主题对查询字符串进行分类。为此，我们创建了一个数据集，其中包含标有研究领域的科学关键词和标题。该数据集稍后用于训练分类器，该分类器根据主题区分查询日志。我们的工作将有助于训练更好的基于学习的排名功能，从而改善学术搜索引擎的用户体验。此外，我们匿名化了14,759,852条查询日志，并将其提供给研究社区以供进一步探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

What Were People Searching For? A Query Log Analysis of An Academic Search Engine

Academic search engines have served the research community for years, yet there is little work done on understanding the taxonomy of query semantics. In this work, we present our findings of analyzing the query log of an academic search engine in the past four years. We study the distribution of query intents to understand the information requested by users. We classify query strings by topics using shallow and latent features captured using a customized word embedding model. To this end, we create a dataset that has scientific keywords and titles labeled with fields of study. This dataset is later used to train a classifier that discriminates query logs by topics. Our work will help to train better learning-based ranking functions that improve user experiences for an academic search engine. In addition, we anonymize our 14,759,852 query logs and make them available to the research community for further exploration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)

自引率

0.00%

发文量