{"title":"Online learning for recency search ranking using real-time user feedback","authors":"Taesup Moon, Lihong Li, Wei Chu, Ciya Liao, Zhaohui Zheng, Yi Chang","doi":"10.1145/1871437.1871657","DOIUrl":"https://doi.org/10.1145/1871437.1871657","url":null,"abstract":"Traditional machine-learned ranking algorithms for web search are trained in batch mode, which assume static relevance of documents for a given query. Although such a batch-learning framework has been tremendously successful in commercial search engines, in scenarios where relevance of documents to a query changes over time, such as ranking recent documents for a breaking news query, the batch-learned ranking functions do have limitations. Users' real-time click feedback becomes a better and timely proxy for the varying relevance of documents rather than the editorial judgments provided by human editors. In this paper, we propose an online learning algorithm that can quickly learn the best re-ranking of the top portion of the original ranked list based on real-time users' click feedback. In order to devise our algorithm and evaluate it accurately, we collected exploration bucket data that removes positional biases on clicks on the documents for recency-classified queries. Our initial experimental result shows that our scheme is more capable of quickly adjusting the ranking to track the varying relevance of documents reflected in the click feedback, compared to batch-trained ranking functions.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"800 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117041798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partial drift detection using a rule induction framework","authors":"Damon Sotoudeh, Aijun An","doi":"10.1145/1871437.1871536","DOIUrl":"https://doi.org/10.1145/1871437.1871536","url":null,"abstract":"The major challenge in mining data streams is the issue of concept drift, the tendency of the underlying data generation process to change over time. In this paper, we propose a general rule learning framework that can efficiently handle concept-drifting data streams and maintain a highly accurate classification model. The main idea is to focus on partial drifts by allowing individual rules to monitor the stream and detect if there is a drift in the regions they cover. A rule quality measure then decides whether the affected rules are inconsistent with the concept drift. The model is accordingly updated to only include rules that are consistent with the newly arrived concept. A dynamically maintained set of instances deemed relevant to the most recent concept is also kept at memory. Learning a new concept from a larger set of instances reduces the variance of data distribution and allows for a more accurate, stable classification model. Our experiments show that this approach not only handles the drift efficiently, but it also can provide higher classification accuracy compared to other competitive approaches on a variety of real and synthetic data sets.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128755041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Ge, Hui Xiong, Zhi-Hua Zhou, H. Ozdemir, Jannite Yu, Kuo Chu Lee
{"title":"Top-Eye: top-k evolving trajectory outlier detection","authors":"Yong Ge, Hui Xiong, Zhi-Hua Zhou, H. Ozdemir, Jannite Yu, Kuo Chu Lee","doi":"10.1145/1871437.1871716","DOIUrl":"https://doi.org/10.1145/1871437.1871716","url":null,"abstract":"The increasing availability of large-scale location traces creates unprecedent opportunities to change the paradigm for identifying abnormal moving activities. Indeed, various aspects of abnormality of moving patterns have recently been exploited, such as wrong direction and wandering. However, there is no recognized way of combining different aspects into an unified evolving abnormality score which has the ability to capture the evolving nature of abnormal moving trajectories. To that end, in this paper, we provide an evolving trajectory outlier detection method, named TOP-EYE, which continuously computes the outlying score for each trajectory in an accumulating way. Specifically, in TOP-EYE, we introduce a decay function to mitigate the influence of the past trajectories on the evolving outlying score, which is defined based on the evolving moving direction and density of trajectories. This decay function enables the evolving computation of accumulated outlying scores along the trajectories. An advantage of TOP-EYE is to identify evolving outliers at very early stage with relatively low false alarm rate. Finally, experimental results on real-world location traces show that TOP-EYE can effectively capture evolving abnormal trajectories.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128259609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Zhao, Jiajun Bu, Chun Chen, Ziyu Guan, C. Wang, Cheng Zhang
{"title":"Learning a user-thread alignment manifold for thread recommendation in online forum","authors":"Jun Zhao, Jiajun Bu, Chun Chen, Ziyu Guan, C. Wang, Cheng Zhang","doi":"10.1145/1871437.1871511","DOIUrl":"https://doi.org/10.1145/1871437.1871511","url":null,"abstract":"People are more and more willing to participate in online forums to share their knowledge and experience. However, it may not be easy for them to find their desired threads in online forums due to the information overload problem. Traditional recommendation approaches can not be directly applied to online forums due to two reasons. First, unlike traditional movie or music recommendation problem, there is no rating information in online forums. Second, the sparsity problem is more severe since the users may only read threads but take no actions. To address these limitations, in this paper we propose to make use of the reply relationships among users, as well as thread contents. A learning algorithm is introduced to infer a user-thread alignment manifold in which both users and thread contents can be well represented. Thus, the relatedness between users and threads can be measured on this alignment manifold, and the closest threads which can best meet the corresponding user's information needs are recommended. Experiments on a dataset crawled from digg.com have demonstrated the superiority of our algorithm over traditional recommendation algorithms.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129467961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang, Zhong Su
{"title":"OpinionIt: a text mining system for cross-lingual opinion analysis","authors":"Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang, Zhong Su","doi":"10.1145/1871437.1871589","DOIUrl":"https://doi.org/10.1145/1871437.1871589","url":null,"abstract":"Opinion mining focuses on extracting customers' opinions from the reviews and predicting their sentiment orientation. Reviewers usually praise a product in some aspects and bemoan it in other aspects. With the business globalization, it is very important for enterprises to extract the opinions toward different aspects and find out cross-lingual/cross-culture difference in opinions. Cross-lingual opinion mining is a very challenging task as amounts of opinions are written in different languages, and not well structured. Since people usually use different words to describe the same aspect in the reviews, product-feature (PF) categorization becomes very critical in cross-lingual opinion mining. Manual cross-lingual PF categorization is time consuming, and practically infeasible for the massive amount of data written in different languages. In order to effectively find out cross-lingual difference in opinions, we present an aspect-oriented opinion mining method with Cross-lingual Latent Semantic Association (CLaSA). We first construct CLaSA model to learn the cross-lingual latent semantic association among all the PFs from multi-dimension semantic clues in the review corpus. Then we employ CLaSA model to categorize all the multilingual PFs into semantic aspects, and summarize cross-lingual difference in opinions towards different aspects. Experimental results show that our method achieves better performance compared with the existing approaches. With CLaSA model, our text mining system OpinionIt can effectively discover cross-lingual difference in opinions.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130589433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic first pass retrieval for search advertising: from theory to practice","authors":"Hema Raghavan, R. Iyer","doi":"10.1145/1871437.1871567","DOIUrl":"https://doi.org/10.1145/1871437.1871567","url":null,"abstract":"Information retrieval in search advertising, as in other ad-hoc retrieval tasks, aims to find the most appropriate ranking of the ad documents of a corpus for a given query. In addition to ranking the ad documents, we also need to filter or threshold irrelevant ads from participating in the auction to be displayed alongside search results. In this work, we describe our experience in implementing a successful ad retrieval system for a commercial search engine based on the Language Modeling (LM) framework for retrieval. The LM demonstrates significant performance improvements over the baseline vector space model (TF-IDF) system that was in production at the time. From a modeling perspective, we propose a novel approach to incorporate query segmentation and phrases in the LM framework, discuss impact of score normalization for relevance filtering, and present preliminary results of incorporating query expansions using query rewriting techniques. From an implementation perspective, we also discuss real-time latency constraints of a production search engine and how we overcome them by adapting the WAND algorithm to work with language models. In sum, our LM formulation is considerably better in terms of accuracy metrics such as Precision-Recall (10% improvement in AUC) and nDCG (8% improvement in nDCG@5) on editorial data and also demonstrates significant improvements in clicks in live user tests (0.787% improvement in Click Yield, with 8% coverage increase). Finally, we hope that this paper provides the reader with adequate insights into the challenges of building a system that serves millions of users every day.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121317118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Opinion digger: an unsupervised opinion miner from unstructured product reviews","authors":"Samaneh Moghaddam, M. Ester","doi":"10.1145/1871437.1871739","DOIUrl":"https://doi.org/10.1145/1871437.1871739","url":null,"abstract":"Mining customer reviews (opinion mining) has emerged as an interesting new research direction. Most of the reviewing websites such as Epinions.com provide some additional information on top of the review text and overall rating, including a set of predefined aspects and their ratings, and a rating guideline which shows the intended interpretation of the numerical ratings. However, the existing methods have ignored this additional information. We claim that using this information, which is freely available, along with the review text can effectively improve the accuracy of opinion mining. We propose an unsupervised method, called Opinion Digger, which extracts important aspects of a product and determines the overall consumer's satisfaction for each, by estimating a rating in the range from 1 to 5. We demonstrate the improved effectiveness of our methods on a real life dataset that we crawled from Epinions.com.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114116589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MI-WDIS: web data integration system for market intelligence","authors":"Zhongmin Yan, Qingzhong Li, Shidong Zhang, Zhaohui Peng, Yongquan Dong, Yanhui Ding, Yongxin Zhang, Xiuxing Xu","doi":"10.1145/1871437.1871783","DOIUrl":"https://doi.org/10.1145/1871437.1871783","url":null,"abstract":"As an important supporting technology of Market Intelligence (MI), Web data integration is facing new challenges, such as the integrity of data acquisition, the quality of data extraction and data consolidation. To solve such problems, we propose an MI-oriented web data integration system (MI-WDIS), which achieves excellent performances in integrating Surface Web and Deep Web data with much less manual work. Based on MI-WDIS, we have developed a platform for intelligent analysis of job data. The platform collects tens of thousands of job data daily and provides personalized services for job seekers through diversified channels. Besides, it provides other advanced services, including intelligence analysis, automatic monitoring and alerting, for various organizations, such as enterprises, training institutions and recruitment agencies.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121670531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning naïve bayes transfer classifier throughclass-wise test distribution estimation","authors":"J. Son, Seong-Bae Park, Hyun-Je Song","doi":"10.1145/1871437.1871715","DOIUrl":"https://doi.org/10.1145/1871437.1871715","url":null,"abstract":"Text classification is a well-known problem for various applications. For last decades, it is beleived that a large corpus is one of the most important aspects for better classification. However, even though a great number of documents is available for training a classifier, it is practically impossible to achieve an ideal performance, since the distributions of labeled and unlabeled documents are often different. To overcome this problem, this paper describes a novel Naïve Bayes classifier for text classification under distribution difference between training and test data. The proposed method approximates test distribution by weighting labeled documents to cope with the distribution difference. Unlike other transfer learning which estimates the weights of labeled documents, the proposed method considerd both the documents and their estimated class labels. Therefore, the proposed method naturally combines the advantage of semi-supervised learning with those of transfer learning.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126209437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective data acquisition for probabilistic K-NN query","authors":"Yu-Chieh Lin, De-Nian Yang, Ming-Syan Chen","doi":"10.1145/1871437.1871620","DOIUrl":"https://doi.org/10.1145/1871437.1871620","url":null,"abstract":"Recently, management of uncertain data draws lots of attention to consider the granularity of devices and noises in collection and delivery of data. Previous works directly model and handle uncertain data to find the required results. However, when data uncertainty is not small or limited, users are not able to obtain useful insights and thereby tend to provide more resources to improve the solution, by reducing the uncertainty of data. In light of this issue, this paper formulates a new problem of choosing a given number of uncertain data objects for acquiring their attribute values to improve the solutions of Probabilistic k-Nearest-Neighbor (k-PNN) query. We prove that solutions must be better after data acquisition, and we devise algorithms to maximize expected improvement. Our experiment results demonstrate that the probability can be significantly improved with only a small number of data acquisitions.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125677689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}