{"title":"PicAlert!: a system for privacy-aware image classification and retrieval","authors":"Sergej Zerr, Stefan Siersdorfer, Jonathon S. Hare","doi":"10.1145/2396761.2398735","DOIUrl":"https://doi.org/10.1145/2396761.2398735","url":null,"abstract":"Photo publishing in Social Networks and other Web2.0 applications has become very popular due to the pervasive availability of cheap digital cameras, powerful batch upload tools and a huge amount of storage space. A portion of uploaded images are of a highly sensitive nature, disclosing many details of the users' private life. We have developed a web service which can detect private images within a user's photo stream and provide support in making privacy decisions in the sharing context. In addition, we present a privacy-oriented image search application which automatically identifies potentially sensitive images in the result set and separates them from the remaining pictures.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116588958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating variability in user behavior into systems based evaluation","authors":"Ben Carterette, E. Kanoulas, Emine Yilmaz","doi":"10.1145/2396761.2396782","DOIUrl":"https://doi.org/10.1145/2396761.2396782","url":null,"abstract":"Click logs present a wealth of evidence about how users interact with a search system. This evidence has been used for many things: learning rankings, personalizing, evaluating effectiveness, and more. But it is almost always distilled into point estimates of feature or parameter values, ignoring what may be the most salient feature of users---their variability. No two users interact with a system in exactly the same way, and even a single user may interact with results for the same query differently depending on information need, mood, time of day, and a host of other factors. We present a Bayesian approach to using logs to compute posterior distributions for probabilistic models of user interactions. Since they are distributions rather than point estimates, they naturally capture variability in the population. We show how to cluster posterior distributions to discover patterns of user interactions in logs, and discuss how to use the clusters to evaluate search engines according to a user model. Because the approach is Bayesian, our methods can be applied to very large logs (such as those possessed by Web search engines) as well as very small (such as those found in almost any other setting).","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128463578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering logical knowledge for deep question answering","authors":"Zhao Liu, Xipeng Qiu, L. Cao, Xuanjing Huang","doi":"10.1145/2396761.2398544","DOIUrl":"https://doi.org/10.1145/2396761.2398544","url":null,"abstract":"Most open-domain question answering systems achieve better performances with large corpora, such as Web, by taking advantage of information redundancy. However, explicit answers are not always mentioned in the corpus, many answers are implicitly contained and can only be deducted by inference. In this paper, we propose an approach to discover logical knowledge for deep question answering, which automatically extracts knowledge in an unsupervised, domain-independent manner from background texts and reasons out implicit answers for the questions. Firstly, we use semantic role labeling to transform natural language expressions to predicates in first-order logic. Then we use association analysis to uncover the implicit relations among these predicates and build propositions for inference. Since our knowledge is drawn from different sources, we use Markov logic to merge multiple knowledge bases without resolving their inconsistencies. Our experiments show that these propositions can improve the performance of question answering significantly.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128378195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiview hierarchical bayesian regression model andapplication to online advertising","authors":"Tianbing Xu, Ruofei Zhang, Zhen Guo","doi":"10.1145/2396761.2396825","DOIUrl":"https://doi.org/10.1145/2396761.2396825","url":null,"abstract":"With the development of Web applications, large scale data are popular; and they are not only getting richer, but also ubiquitously interconnected with users and other objects in various ways, which brings about multi-view data with implicit structure. In this paper, we propose a novel hierarchical Bayesian mixture regression model, which discovers and then exploits the relationships among multiple views of the data to perform various machine learning tasks. A stochastic EM inference and learning algorithm is derived; and a parallel implementation in Hadoop MapReduce [9] paradigm is developed to scale up the learning. We apply the developed model and algorithm on click-through-rate (CTR) prediction and campaign targeting recommendation in online advertising to measure its effectiveness. The experiments on both synthetic data and large scale ads serving data from a real world online advertising exchange demonstrate the superior CTR prediction accuracy of our method compared to existing state-of-the-art methods. The results also show that our model can recommend high performance targeting features for online advertising campaigns.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129322555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MEET: a generalized framework for reciprocal recommender systems","authors":"Lei Li, Tao Li","doi":"10.1145/2396761.2396770","DOIUrl":"https://doi.org/10.1145/2396761.2396770","url":null,"abstract":"Reciprocal recommender systems refer to systems from which users can obtain recommendations of other individuals by satisfying preferences of both parties being involved. Different from the traditional user-item recommendation, reciprocal recommenders focus on the preferences of both parties simultaneously, as well as some special properties in terms of \"reciprocal\". In this paper, we propose MEET -- a generalized framework for reciprocal recommendation, in which we model the correlations of users as a bipartite graph that maintains both local and global \"reciprocal\" utilities. The local utility captures users' mutual preferences, whereas the global utility manages the overall quality of the entire reciprocal network. Extensive empirical evaluation on two real-world data sets (online dating and online recruiting) demonstrates the effectiveness of our proposed framework compared with existing recommendation algorithms. Our analysis also provides deep insights into the special aspects of reciprocal recommenders that differentiate them from user-item recommender systems.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129651140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Importance weighted passive learning","authors":"Shuaiqiang Wang, Xiaoming Xi, Yilong Yin","doi":"10.1145/2396761.2398611","DOIUrl":"https://doi.org/10.1145/2396761.2398611","url":null,"abstract":"Importance weighted active learning (IWAL) introduces a weighting scheme to measure the importance of each instance for correcting the sampling bias of the probability distributions between training and test datasets. However, the weighting scheme of IWAL involves the distribution of the test data, which can be straightforwardly estimated in active learning by interactively querying users for labels of selected test instances, but difficult for conventional learning where there are no interactions with users, referred as passive learning. In this paper, we investigate the insufficient sampling bias problem, i.e., bias occurs only because of insufficient samples, but the sampling process is unbiased. In doing this, we present two assumptions on the sampling bias, based on which we propose a practical weighting scheme for the empirical loss function in conventional passive learning, and present IWPL, an importance weighted passive learning framework. Furthermore, we provide IWSVM, an importance weighted SVM for validation. Extensive experiments demonstrate significant advantages of IWSVM on benchmarks and synthetic datasets.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130504567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher C. Yang, Hsinchun Chen, H. Wactlar, Combi Carlo, Xuning Tang
{"title":"SHB 2012: international workshop on smart health and wellbeing","authors":"Christopher C. Yang, Hsinchun Chen, H. Wactlar, Combi Carlo, Xuning Tang","doi":"10.1145/2396761.2398756","DOIUrl":"https://doi.org/10.1145/2396761.2398756","url":null,"abstract":"The Smart Health and Wellbeing workshop is organized to develop a platform for authors to discuss fundamental principles, algorithms or applications of intelligent data acquisition, processing and analysis of healthcare data. We are particularly interested in information and knowledge management papers, in which the approaches are accompanied by an in-depth experimental evaluation with real world data. This paper provides an overview of the workshop and the accepted contributions.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126919444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Full-text citation analysis: enhancing bibliometric and scientific publication ranking","authors":"Xiaozhong Liu, Jinsong Zhang, Chun Guo","doi":"10.1145/2396761.2398555","DOIUrl":"https://doi.org/10.1145/2396761.2398555","url":null,"abstract":"The goal of this paper is to use innovative text and graph mining algorithms along with full-text citation analysis and topic modeling to enhance classical bibliometric analysis and publication ranking. By utilizing citation contexts extracted from a large number of full-text publications, each citation or publication is represented by a probability distribution over a set of predefined topics, where each topic is labeled by an author contributed keyword. We then used publication/citation topic distribution to generate a citation graph with vertex prior and edge transitioning probability distributions. The publication importance score for each given topic is calculated by PageRank with edge and vertex prior distributions. Based on 104 topics (labeled with keywords) and their review papers, the cited publications of each review paper are assumed as \"important publications\" for ranking evaluation. The result shows that full text citation and publication content prior topic distribution along with the PageRank algorithm can significantly enhance bibliometric analysis and scientific publication ranking performance for academic IR system.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129037555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Kolesnikov, Yury Logachev, V. A. Topinskiy
{"title":"Predicting CTR of new ads via click prediction","authors":"Alexander Kolesnikov, Yury Logachev, V. A. Topinskiy","doi":"10.1145/2396761.2398688","DOIUrl":"https://doi.org/10.1145/2396761.2398688","url":null,"abstract":"Predicting CTR of ads on the search result page is an urgent topic. The reason for this is that choosing the right advertisement greatly affects revenue of the search engine and advertisers and user's satisfaction. For ads with the large click history it is quite clear how to predict CTR by utilizing statistical data. But for new ads with a poor click history such approach is not robust and reliable. We suggest a model for predicting CTR of such new ads. Contrary to the previous models of predicting CTR of new ads, our model uses events - clicks and skips1 instead of the observed CTR. In addition we have implemented several novel features, that resulted into the increase of the performance of our model. Offline and online experiments on the real search engine system demonstrated that our model outperforms the baseline and the approaches suggested in previous papers.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130611700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prakash Mandayam Comar, Lei Liu, Sabyasachi Saha, A. Nucci, P. Tan
{"title":"Weighted linear kernel with tree transformed features for malware detection","authors":"Prakash Mandayam Comar, Lei Liu, Sabyasachi Saha, A. Nucci, P. Tan","doi":"10.1145/2396761.2398622","DOIUrl":"https://doi.org/10.1145/2396761.2398622","url":null,"abstract":"Malware detection from network traffic flows is a challenging problem due to data irregularity issues such as imbalanced class distribution, noise, missing values, and heterogeneous types of features. To address these challenges, this paper presents a two-stage classification approach for malware detection. The framework initially employs random forest as a macro-level classifier to separate the malicious from non-malicious network flows, followed by a collection of one-class support vector machine classifiers to identify the specific type of malware. A novel tree-based feature construction approach is proposed to deal with data imperfection issues. As the performance of the support vector machine classifier often depends on the kernel function used to compute the similarity between every pair of data points, designing an appropriate kernel is essential for accurate identification of malware classes. We present a simple algorithm to construct a weighted linear kernel on the tree transformed features and demonstrate its effectiveness in detecting malware from real network traffic data.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130637947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}