预测网络搜索命中数

2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Pub Date : 2010-08-31 DOI:10.1109/WI-IAT.2010.227

Tian Tian, J. Geller, Soon Ae Chun

{"title":"预测网络搜索命中数","authors":"Tian Tian, J. Geller, Soon Ae Chun","doi":"10.1109/WI-IAT.2010.227","DOIUrl":null,"url":null,"abstract":"Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms. Then we derive a correlation function between the computed probabilities of search terms and the observed hit counts for them. This regression function is used to predict the hit counts for a user’s new searches, with the intention of avoiding information overload. We report the results of experiments with Google, Yahoo! and Bing to validate our methodology. We further investigate the monotonicity of search results for negative search terms by those three search engines.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Predicting Web Search Hit Counts\",\"authors\":\"Tian Tian, J. Geller, Soon Ae Chun\",\"doi\":\"10.1109/WI-IAT.2010.227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms. Then we derive a correlation function between the computed probabilities of search terms and the observed hit counts for them. This regression function is used to predict the hit counts for a user’s new searches, with the intention of avoiding information overload. We report the results of experiments with Google, Yahoo! and Bing to validate our methodology. We further investigate the monotonicity of search results for negative search terms by those three search engines.\",\"PeriodicalId\":340211,\"journal\":{\"name\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IAT.2010.227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT.2010.227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

基于关键字的搜索引擎通常会返回一些意想不到的结果。零命中自然是不受欢迎的，而太多命中可能是压倒性的和低精度的。我们提出了一种预测给定查询词集命中次数的方法。使用来自大型语料库的词频，我们构建这些词组合的随机样本作为搜索词。然后，我们推导出搜索项的计算概率与观察到的命中次数之间的关联函数。这个回归函数用于预测用户新搜索的命中次数，目的是避免信息过载。我们报道了与谷歌、雅虎!和必应来验证我们的方法我们进一步研究了这三个搜索引擎对负面搜索项的搜索结果的单调性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Predicting Web Search Hit Counts

Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms. Then we derive a correlation function between the computed probabilities of search terms and the observed hit counts for them. This regression function is used to predict the hit counts for a user’s new searches, with the intention of avoiding information overload. We report the results of experiments with Google, Yahoo! and Bing to validate our methodology. We further investigate the monotonicity of search results for negative search terms by those three search engines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

自引率

0.00%

发文量