{"title":"预测网络搜索命中数","authors":"Tian Tian, J. Geller, Soon Ae Chun","doi":"10.1109/WI-IAT.2010.227","DOIUrl":null,"url":null,"abstract":"Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms. Then we derive a correlation function between the computed probabilities of search terms and the observed hit counts for them. This regression function is used to predict the hit counts for a user’s new searches, with the intention of avoiding information overload. We report the results of experiments with Google, Yahoo! and Bing to validate our methodology. We further investigate the monotonicity of search results for negative search terms by those three search engines.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Predicting Web Search Hit Counts\",\"authors\":\"Tian Tian, J. Geller, Soon Ae Chun\",\"doi\":\"10.1109/WI-IAT.2010.227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms. Then we derive a correlation function between the computed probabilities of search terms and the observed hit counts for them. This regression function is used to predict the hit counts for a user’s new searches, with the intention of avoiding information overload. We report the results of experiments with Google, Yahoo! and Bing to validate our methodology. We further investigate the monotonicity of search results for negative search terms by those three search engines.\",\"PeriodicalId\":340211,\"journal\":{\"name\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IAT.2010.227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT.2010.227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms. Then we derive a correlation function between the computed probabilities of search terms and the observed hit counts for them. This regression function is used to predict the hit counts for a user’s new searches, with the intention of avoiding information overload. We report the results of experiments with Google, Yahoo! and Bing to validate our methodology. We further investigate the monotonicity of search results for negative search terms by those three search engines.