Proceedings of the 21st ACM international conference on Information and knowledge management最新文献

筛选
英文 中文
Incorporating occupancy into frequent pattern mining for high quality pattern recommendation 将占用率纳入频繁的模式挖掘,以获得高质量的模式推荐
Linpeng Tang, Lei Zhang, Ping Luo, Min Wang
{"title":"Incorporating occupancy into frequent pattern mining for high quality pattern recommendation","authors":"Linpeng Tang, Lei Zhang, Ping Luo, Min Wang","doi":"10.1145/2396761.2396775","DOIUrl":"https://doi.org/10.1145/2396761.2396775","url":null,"abstract":"Mining interesting patterns from transaction databases has attracted a lot of research interest for more than a decade. Most of those studies use frequency, the number of times a pattern appears in a transaction database, as the key measure for pattern interestingness. In this paper, we introduce a new measure of pattern interestingness, occupancy. The measure of occupancy is motivated by some real-world pattern recommendation applications which require that any interesting pattern X should occupy a large portion of the transactions it appears in. Namely, for any supporting transaction t of pattern X, the number of items in X should be close to the total number of items in t. In these pattern recommendation applications, patterns with higher occupancy may lead to higher recall while patterns with higher frequency lead to higher precision. With the definition of occupancy we call a pattern dominant if its occupancy is above a user-specified threshold. Then, our task is to identify the qualified patterns which are both frequent and dominant. Additionally, we also formulate the problem of mining top-k qualified patterns: finding the qualified patterns with the top-k values of any function (e.g. weighted sum of both occupancy and support). The challenge to these tasks is that the monotone or anti-monotone property does not hold on occupancy. In other words, the value of occupancy does not increase or decrease monotonically when we add more items to a given itemset. Thus, we propose an algorithm called DOFIA (DOminant and Frequent Itemset mining Algorithm), which explores the upper bound properties on occupancy to reduce the search process. The tradeoff between bound tightness and computational complexity is also systematically addressed. Finally, we show the effectiveness of DOFIA in a real-world application on print-area recommendation for Web pages, and also demonstrate the efficiency of DOFIA on several large synthetic data sets.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122307766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Learning to discover complex mappings from web forms to ontologies 学习发现从web表单到本体的复杂映射
Yuan An, Xiaohua Hu, I. Song
{"title":"Learning to discover complex mappings from web forms to ontologies","authors":"Yuan An, Xiaohua Hu, I. Song","doi":"10.1145/2396761.2398427","DOIUrl":"https://doi.org/10.1145/2396761.2398427","url":null,"abstract":"In order to realize the Semantic Web, various structures on the Web including Web forms need to be annotated with and mapped to domain ontologies. We present a machine learning-based automatic approach for discovering complex mappings from Web forms to ontologies. A complex mapping associates a set of semantically related elements on a form to a set of semantically related elements in an ontology. Existing schema mapping solutions mainly rely on integrity constraints to infer complex schema mappings. However, it is difficult to extract rich integrity constraints from forms. We show how machine learning techniques can be used to automatically discover complex mappings between Web forms and ontologies. The challenge is how to capture and learn the complicated knowledge encoded in existing complex mappings. We develop an initial solution that takes a naive Bayesian approach. We evaluated the performance of the solution on various domains. Our experimental results show that the solution returns the expected mappings as the top-1 results usually among several hundreds candidate mappings for more than 80% of the test cases. Furthermore, the expected mappings are always returned as the top-k results with k<4. The experiments have demonstrated that the approach is effective and has the potential to save significant human efforts.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122313047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A new tool for multi-level partitioning in teradata 在teradata中用于多级分区的新工具
Young-Kyoon Suh, A. Ghazal, A. Crolotte, Pekka Kostamaa
{"title":"A new tool for multi-level partitioning in teradata","authors":"Young-Kyoon Suh, A. Ghazal, A. Crolotte, Pekka Kostamaa","doi":"10.1145/2396761.2398604","DOIUrl":"https://doi.org/10.1145/2396761.2398604","url":null,"abstract":"This paper introduces a new tool that recommends an optimized partitioning solution called Multi-Level Partitioned Primary Index (MLPPI) for a fact table based on the queries in the workload. The tool implements a new technique using a greedy algorithm for search space enumeration. The space is driven by predicates in the queries. This technique fits very well the Teradata MLPPI scheme, as it is based on a general framework using general expressions, ranges and case expressions for partition definitions. The cost model implemented in the tool is based on the Teradata optimizer, and it is used to prune the search space for reaching a final solution. The tool resides completely on the client, and interfaces the database through APIs as opposed to previous work that requires optimizer code extension. The APIs are used to simplify the workload queries, and to capture fact table predicates and costs necessary to make the recommendation. The predicate-driven method implemented by the tool is general, and it can be applied to any clustering or partitioning scheme based on simple field expressions or complex SQL predicates. Experimental results given a particular workload will show that the recommendation from the tool outperforms a human expert. The experiments also show that the solution is scalable both with the workload complexity and the size of the fact table.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126055669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A decentralized recommender system for effective web credibility assessment 一个分散的推荐系统,用于有效的网络可信度评估
Thanasis G. Papaioannou, Jean-Eudes Ranvier, Alexandra Olteanu, K. Aberer
{"title":"A decentralized recommender system for effective web credibility assessment","authors":"Thanasis G. Papaioannou, Jean-Eudes Ranvier, Alexandra Olteanu, K. Aberer","doi":"10.1145/2396761.2396851","DOIUrl":"https://doi.org/10.1145/2396761.2396851","url":null,"abstract":"An overwhelming and growing amount of data is available online. The problem of untrustworthy online information is augmented by its high economic potential and its dynamic nature, e.g. transient domain names, dynamic content, etc. In this paper, we address the problem of assessing the credibility of web pages by a decentralized social recommender system. Specifically, we concurrently employ i) item-based collaborative filtering (CF) based on specific web page features, ii) user-based CF based on friend ratings and iii) the ranking of the page in search results. These factors are appropriately combined into a single assessment based on adaptive weights that depend on their effectiveness for different topics and different fractions of malicious ratings. Simulation experiments with real traces of web page credibility evaluations suggest that our hybrid approach outperforms both its constituent components and classical content-based classification approaches.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126923812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
G-SPARQL: a hybrid engine for querying large attributed graphs G-SPARQL:用于查询大型属性图的混合引擎
S. Sakr, S. Elnikety, Yuxiong He
{"title":"G-SPARQL: a hybrid engine for querying large attributed graphs","authors":"S. Sakr, S. Elnikety, Yuxiong He","doi":"10.1145/2396761.2396806","DOIUrl":"https://doi.org/10.1145/2396761.2396806","url":null,"abstract":"We propose a SPARQL-like language, G-SPARQL, for querying attributed graphs. The language expresses types of queries which of large interest for applications which model their data as large graphs such as: pattern matching, reachability and shortest path queries. Each query can combine both of structural predicates and value-based predicates (on the attributes of the graph nodes and edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe a hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph is stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database while the execution of other parts of the query plan are processed using memory-based algorithms, as necessary. Experimental results on real datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121368521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Towards measuring the visualness of a concept 测量一个概念的可视性
Jin-Woo Jeong, Xin-Jing Wang, Dong-Ho Lee
{"title":"Towards measuring the visualness of a concept","authors":"Jin-Woo Jeong, Xin-Jing Wang, Dong-Ho Lee","doi":"10.1145/2396761.2398655","DOIUrl":"https://doi.org/10.1145/2396761.2398655","url":null,"abstract":"In this paper, we propose a new method to measure the visualness of a concept. The visualness of a concept is generally defined as what extent a concept has visual characteristics. Even though the visualness of a concept is important and useful for various image search tasks, it has not received much spotlight yet. In this work, we especially focus on how to measure the visualness of a complex concept such as \"round table\", \"dry bed\" rather than a simple concept like \"ball\", \"apple\". To measure the visualness, we first collect sample images of a complex concept using web image search engines, and then group the images based on the visual features. Finally, we compute visual purity and weighted entropy of the clusters, which will act as a visualness score for the concept. Through various experiments, we show and discuss interesting results about the visualness of a concept.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114256800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
On using category experts for improving the performance and accuracy in recommender systems 利用品类专家提高推荐系统的性能和准确性
Won-Seok Hwang, Ho-Jong Lee, Sang-Wook Kim, Minsoo Lee
{"title":"On using category experts for improving the performance and accuracy in recommender systems","authors":"Won-Seok Hwang, Ho-Jong Lee, Sang-Wook Kim, Minsoo Lee","doi":"10.1145/2396761.2398639","DOIUrl":"https://doi.org/10.1145/2396761.2398639","url":null,"abstract":"A variety of recommendation methods have been proposed to satisfy the performance and accuracy; however, it is fairly difficult to satisfy both of them because there is a trade-off between them. In this paper, we introduce the notion of category experts and propose the recommendation method by exploiting the ratings of category experts instead of those of the users similar to a target user. We also extend the method that uses both the category preference of a target user and his/her similarity to category experts. We show that our method significantly outperforms the existing methods in terms of performance and accuracy through extensive experiments with real-world data.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125320816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Fast multi-task learning for query spelling correction 快速多任务学习查询拼写纠正
Xuan Sun, Anshumali Shrivastava, Ping Li
{"title":"Fast multi-task learning for query spelling correction","authors":"Xuan Sun, Anshumali Shrivastava, Ping Li","doi":"10.1145/2396761.2396800","DOIUrl":"https://doi.org/10.1145/2396761.2396800","url":null,"abstract":"In this paper, we explore the use of a novel online multi-task learning framework for the task of search query spelling correction. In our procedure, correction candidates are initially generated by a ranker-based system and then re-ranked by our multi-task learning algorithm. With the proposed multi-task learning method, we are able to effectively transfer information from different and highly biased training datasets, for improving spelling correction on all datasets. Our experiments are conducted on three query spelling correction datasets including the well-known TREC benchmark dataset. The experimental results demonstrate that our proposed method considerably outperforms the existing baseline systems in terms of accuracy. Importantly, the proposed method is about one order of magnitude faster than baseline systems in terms of training speed. Compared to the commonly used online learning methods which typically require more than (e.g.,) 60 training passes, our proposed method is able to closely reach the empirical optimum in about 5 passes.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122908276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
TALMUD: transfer learning for multiple domains TALMUD:多领域的迁移学习
Orly Moreno, Bracha Shapira, L. Rokach, Guy Shani
{"title":"TALMUD: transfer learning for multiple domains","authors":"Orly Moreno, Bracha Shapira, L. Rokach, Guy Shani","doi":"10.1145/2396761.2396817","DOIUrl":"https://doi.org/10.1145/2396761.2396817","url":null,"abstract":"Most collaborative Recommender Systems (RS) operate in a single domain (such as movies, books, etc.) and are capable of providing recommendations based on historical usage data which is collected in the specific domain only. Cross-domain recommenders address the sparsity problem by using Machine Learning (ML) techniques to transfer knowledge from a dense domain into a sparse target domain. In this paper we propose a transfer learning technique that extracts knowledge from multiple domains containing rich data (e.g., movies and music) and generates recommendations for a sparse target domain (e.g., games). Our method learns the relatedness between the different source domains and the target domain, without requiring overlapping users between domains. The model integrates the appropriate amount of knowledge from each domain in order to enrich the target domain data. Experiments with several datasets reveal that, using multiple sources and the relatedness between domains improves accuracy of results.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123023422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Automated feature weighting in naive bayes for high-dimensional data classification 用于高维数据分类的朴素贝叶斯特征自动加权
Lifei Chen, Shengrui Wang
{"title":"Automated feature weighting in naive bayes for high-dimensional data classification","authors":"Lifei Chen, Shengrui Wang","doi":"10.1145/2396761.2398426","DOIUrl":"https://doi.org/10.1145/2396761.2398426","url":null,"abstract":"Naive Bayes (NB for short) is one of the popular methods for supervised classification in a knowledge management system. Currently, in many real-world applications, high-dimensional data pose a major challenge to conventional NB classifiers, due to noisy or redundant features and local relevance of these features to classes. In this paper, an automated feature weighting solution is proposed to result in a NB method effective in dealing with high-dimensional data. We first propose a locally weighted probability model, for Bayesian modeling in high-dimensional spaces, to implement a soft feature selection scheme. Then we propose an optimization algorithm to find the weights in linear time complexity, based on the Logitnormal priori distribution and the Maximum a Posteriori principle. Experimental studies show the effectiveness and suitability of the proposed model for high-dimensional data classification.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131228776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信