{"title":"Mining high utility itemsets without candidate generation","authors":"Mengchi Liu, Jun-Feng Qu","doi":"10.1145/2396761.2396773","DOIUrl":"https://doi.org/10.1145/2396761.2396773","url":null,"abstract":"High utility itemsets refer to the sets of items with high utility like profit in a database, and efficient mining of high utility itemsets plays a crucial role in many real-life applications and is an important research issue in data mining area. To identify high utility itemsets, most existing algorithms first generate candidate itemsets by overestimating their utilities, and subsequently compute the exact utilities of these candidates. These algorithms incur the problem that a very large number of candidates are generated, but most of the candidates are found out to be not high utility after their exact utilities are computed. In this paper, we propose an algorithm, called HUI-Miner (High Utility Itemset Miner), for high utility itemset mining. HUI-Miner uses a novel structure, called utility-list, to store both the utility information about an itemset and the heuristic information for pruning the search space of HUI-Miner. By avoiding the costly generation and utility computation of numerous candidate itemsets, HUI-Miner can efficiently mine high utility itemsets from the utility-lists constructed from a mined database. We compared HUI-Miner with the state-of-the-art algorithms on various databases, and experimental results show that HUI-Miner outperforms these algorithms in terms of both running time and memory consumption.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131561423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Koopman, G. Zuccon, P. Bruza, Laurianne Sitbon, Michael Lawley
{"title":"An evaluation of corpus-driven measures of medical concept similarity for information retrieval","authors":"B. Koopman, G. Zuccon, P. Bruza, Laurianne Sitbon, Michael Lawley","doi":"10.1145/2396761.2398661","DOIUrl":"https://doi.org/10.1145/2396761.2398661","url":null,"abstract":"Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127021180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information-complete and redundancy-free keyword search over large data graphs","authors":"Byron J. Gao, Zhumin Chen, Qi Kang","doi":"10.1145/2396761.2398712","DOIUrl":"https://doi.org/10.1145/2396761.2398712","url":null,"abstract":"Keyword search over graphs has a wide array of applications in querying structured, semi-structured and unstructured data. Existing models typically use minimal trees or bounded subgraphs as query answers. While such models emphasize relevancy, they would suffer from incompleteness of information and redundancy among answers, making it difficult for users to effectively explore query answers. To overcome these drawbacks, we propose a novel cluster-based model, where query answers are relevancy-connected clusters. A cluster is a subgraph induced from a maximal set of relevancy-connected nodes. Such clusters are coherent and relevant, yet complete and redundancy free. They can be of arbitrary shape in contrast to the sphere-shaped bounded subgraphs in existing models. We also propose an efficient search algorithm and a corresponding graph index for large, disk-resident data graphs.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127673301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding top k most influential spatial facilities over uncertain objects","authors":"Liming Zhan, Ying Zhang, W. Zhang, Xuemin Lin","doi":"10.1145/2396761.2396878","DOIUrl":"https://doi.org/10.1145/2396761.2396878","url":null,"abstract":"Uncertainty is inherent in many important applications, such as location-based services (LBS), sensor monitoring and radio-frequency identification (RFID). Recently, considerable research efforts have been put into the field of uncertainty-aware spatial query processing. In this paper, we study the problem of finding top k most influential facilities over a set of uncertain objects, which is an important spatial query in the above applications. Based on the maximal utility principle, we propose a new ranking model to identify the top k most influential facilities, which carefully captures influence of facilities on the uncertain objects. By utilizing two uncertain object indexing techniques, R-tree and U-Quadtree, effective and efficient algorithms are proposed following the filtering and verification paradigm, which significantly improves the performance of the algorithms in terms of CPU and I/O costs. Comprehensive experiments on real datasets demonstrate the effectiveness and efficiency of our techniques.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127692478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Campos, Alejandro Bellogín, F. Díez, Iván Cantador
{"title":"Time feature selection for identifying active household members","authors":"P. Campos, Alejandro Bellogín, F. Díez, Iván Cantador","doi":"10.1145/2396761.2398628","DOIUrl":"https://doi.org/10.1145/2396761.2398628","url":null,"abstract":"Popular online rental services such as Netflix and MoviePilot often manage household accounts. A household account is usually shared by various users who live in the same house, but in general does not provide a mechanism by which current active users are identified, and thus leads to considerable difficulties for making effective personalized recommendations. The identification of the active household members, defined as the discrimination of the users from a given household who are interacting with a system (e.g. an on-demand video service), is thus an interesting challenge for the recommender systems research community. In this paper, we formulate the above task as a classification problem, and address it by means of global and local feature selection methods and classifiers that only exploit time features from past item consumption records. The results obtained from a series of experiments on a real dataset show that some of the proposed methods are able to select relevant time features, which allow simple classifiers to accurately identify active members of household accounts.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132720408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mike Symonds, P. Bruza, Laurianne Sitbon, I. Turner
{"title":"A tensor encoding model for semantic processing","authors":"Mike Symonds, P. Bruza, Laurianne Sitbon, I. Turner","doi":"10.1145/2396761.2398617","DOIUrl":"https://doi.org/10.1145/2396761.2398617","url":null,"abstract":"This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132728743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating locality preserving nonnegative matrix factorization","authors":"Guanhong Yao, Deng Cai","doi":"10.1145/2396761.2398618","DOIUrl":"https://doi.org/10.1145/2396761.2398618","url":null,"abstract":"Matrix factorization techniques have been frequently applied in information retrieval, computer vision and pattern recognition. Among them, Non-negative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts-based in the human brain. Locality Preserving Non-negative Matrix Factorization (LPNMF) is a recently proposed graph-based NMF extension which tries to preserves the intrinsic geometric structure of the data. Compared with the original NMF, LPNMF has more discriminating power on data representa- tion thanks to its geometrical interpretation and outstanding ability to discover the hidden topics. However, the computa- tional complexity of LPNMF is O(n3), where n is the number of samples. In this paper, we propose a novel approach called Accelerated LPNMF (A-LPNMF) to solve the com- putational issue of LPNMF. Specifically, A-LPNMF selects p (p j n) landmark points from the data and represents all the samples as the sparse linear combination of these landmarks. The non-negative factors which incorporates the geometric structure can then be efficiently computed. Experimental results on the real data sets demonstrate the effectiveness and efficiency of our proposed method.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"160 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132754557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PLEAD 2012: politics, elections and data","authors":"Ingmar Weber, A. Popescu, M. Pennacchiotti","doi":"10.1145/2396761.2398759","DOIUrl":"https://doi.org/10.1145/2396761.2398759","url":null,"abstract":"What is the role of the internet in politics general and during campaigns in particular? And what is the role of large amounts of user data in all of this? In the 2008 U.S. presidential campaign the Democrats were far more successful than the Republicans in utilizing online media for mobilization, co-ordination and fundraising. For the first time, social media and the Internet played a fundamental role in political campaigns. However, technical research in this area has been surprisingly limited and fragmented. The goal of this workshop is to bring together, for the first time, researchers working at the intersection of social network analysis, computational social science and political science, to share and discuss their ideas in a common forum; and to inspire further developments in this growing, fascinating field. The workshop has Filippo Menczer as keynote speaker, it includes technical presentations of accepted papers and concludes with a panel discussion where scientists and media experts from different fields can interact and share views.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133415280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sort-based query-adaptive loading of R-trees","authors":"Daniar Achakeev, B. Seeger, P. Widmayer","doi":"10.1145/2396761.2398577","DOIUrl":"https://doi.org/10.1145/2396761.2398577","url":null,"abstract":"Bulk-loading of R-trees has been an important problem in academia and industry for more than twenty years. Current algorithms create R-trees without any information about the expected query profile. However, query profiles are extremely useful for the design of efficient indexes. In this paper, we address this deficiency and present query-adaptive algorithms for building R-trees optimally designed for a given query profile. Since optimal R-tree loading is NP-hard (even without tuning the structure to a query profile), we provide efficient, easy to implement heuristics. Our sort-based algorithms for query-adaptive loading consist of two steps: First, sorting orders are identified resulting in better R-trees than those obtained from standard space-filling curves. Second, for a given sorting order, we propose a dynamic programming algorithm for generating R-trees in linear runtime. Our experimental results confirm that our algorithms generally create significantly better R-trees than the ones obtained from standard sort-based loading algorithms, even when the query profile is unknown.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"422 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133910577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring simultaneous keyword and key sentence extraction: improve graph-based ranking using wikipedia","authors":"Xun Wang, Lei Wang, Jiwei Li, Sujian Li","doi":"10.1145/2396761.2398706","DOIUrl":"https://doi.org/10.1145/2396761.2398706","url":null,"abstract":"Summarization and Keyword Selection are two important tasks in NLP community. Although both aim to summarize the source articles, they are usually treated separately by using sentences or words. In this paper, we propose a two-level graph based ranking algorithm to generate summarization and extract keywords at the same time. Previous works have reached a consensus that important sentence is composed by important keywords. In this paper, we further study the mutual impact between them through context analysis. We use Wikipedia to build a two-level concept-based graph, instead of traditional term-based graph, to express their homogenous relationship and heterogeneous relationship. We run PageRank and HITS rank on the graph to adjust both homogenous and heterogeneous relationships. A more reasonable relatedness value will be got for key sentence selection and keyword selection. We evaluate our algorithm on TAC 2011 data set. Traditional term-based approach achieves a score of 0.255 in ROUGE-1 and a score of 0.037 and ROUGE-2 and our approach can improve them to 0.323 and 0.048 separately.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132274391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}