{"title":"Mining Related Queries from Query Logs Based on Linear Regression","authors":"Haijun Zhai, Jin Zhang, Xiaolei Wang, Gang Zhang","doi":"10.1109/FITME.2008.59","DOIUrl":null,"url":null,"abstract":"In this paper a novel linear regression model is proposed to mine related queries from query logs. Three types of association relationships between queries are identified and leveraged in our model, which include query session co-occurence, URL-clicked sharing and text similarity. Previous work directly applies part of these relations, which may be largely affected by the noise in query logs, such as the sparsity of click-through data, query-session segmentation errors and noisy clicks. In this work we propose linear regression analysis to identify effective features. In this way, we can effectively deal with the noise issue. The experiments demonstrate that the features identified with linear regression analysis are very effective. Moreover, the performance of our proposed linear regression model outperforms existing methods.","PeriodicalId":218182,"journal":{"name":"2008 International Seminar on Future Information Technology and Management Engineering","volume":"359 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Seminar on Future Information Technology and Management Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FITME.2008.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In this paper a novel linear regression model is proposed to mine related queries from query logs. Three types of association relationships between queries are identified and leveraged in our model, which include query session co-occurence, URL-clicked sharing and text similarity. Previous work directly applies part of these relations, which may be largely affected by the noise in query logs, such as the sparsity of click-through data, query-session segmentation errors and noisy clicks. In this work we propose linear regression analysis to identify effective features. In this way, we can effectively deal with the noise issue. The experiments demonstrate that the features identified with linear regression analysis are very effective. Moreover, the performance of our proposed linear regression model outperforms existing methods.