Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献

筛选
英文 中文
The new iris data: modular data generators 新的虹膜数据:模块化数据生成器
Iris Adä, M. Berthold
{"title":"The new iris data: modular data generators","authors":"Iris Adä, M. Berthold","doi":"10.1145/1835804.1835858","DOIUrl":"https://doi.org/10.1145/1835804.1835858","url":null,"abstract":"In this paper we introduce a modular, highly flexible, open-source environment for data generation. Using an existing graphical data flow tool, the user can combine various types of modules for numeric and categorical data generators. Additional functionality is added via the data processing framework in which the generator modules are embedded. The resulting data flows can be used to document, deploy, and reuse the resulting data generators. We describe the overall environment and individual modules and demonstrate how they can be used for the generation of a sample, complex customer/product database with corresponding shopping basket data, including various artifacts and outliers.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73848275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Session details: Research track 8: graph algorithms 会议细节:研究专题8:图算法
Tina Eliassi-Rad
{"title":"Session details: Research track 8: graph algorithms","authors":"Tina Eliassi-Rad","doi":"10.1145/3248788","DOIUrl":"https://doi.org/10.1145/3248788","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75517488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixture models for learning low-dimensional roles in high-dimensional data 用于在高维数据中学习低维角色的混合模型
Manas Somaiya, C. Jermaine, S. Ranka
{"title":"Mixture models for learning low-dimensional roles in high-dimensional data","authors":"Manas Somaiya, C. Jermaine, S. Ranka","doi":"10.1145/1835804.1835919","DOIUrl":"https://doi.org/10.1145/1835804.1835919","url":null,"abstract":"Archived data often describe entities that participate in multiple roles. Each of these roles may influence various aspects of the data. For example, a register transaction collected at a retail store may have been initiated by a person who is a woman, a mother, an avid reader, and an action movie fan. Each of these roles can influence various aspects of the customer's purchase: the fact that the customer is a mother may greatly influence the purchase of a toddler-sized pair of pants, but have no influence on the purchase of an action-adventure novel. The fact that the customer is an action move fan and an avid reader may influence the purchase of the novel, but will have no effect on the purchase of a shirt. In this paper, we present a generic, Bayesian framework for capturing exactly this situation. In our framework, it is assumed that multiple roles exist, and each data point corresponds to an entity (such as a retail customer, or an email, or a news article) that selects various roles which compete to influence the various attributes associated with the data point. We develop robust, MCMC algorithms for learning the models under the framework.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76468037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
MineFleet®: an overview of a widely adopted distributed vehicle performance data mining system MineFleet®:概述了广泛采用的分布式车辆性能数据挖掘系统
H. Kargupta, Kakali Sarkar, M. Gilligan
{"title":"MineFleet®: an overview of a widely adopted distributed vehicle performance data mining system","authors":"H. Kargupta, Kakali Sarkar, M. Gilligan","doi":"10.1145/1835804.1835812","DOIUrl":"https://doi.org/10.1145/1835804.1835812","url":null,"abstract":"This paper describes the MineFleet distributed vehicle performance data mining system designed for commercial fleets. MineFleet analyzes high throughput data streams onboard the vehicle, generates the analytics, sends those to the remote server over the wide-area wireless networks and offers them to the fleet managers using stand-alone and web-based user-interface. The paper describes the overall architecture of the system, business needs, and shares experience from successful large-scale commercial deployments. MineFleet is probably one of the first commercially successful distributed data stream mining systems. This patented technology has been adopted, productized, and commercially offered by many large companies in the mobile resource management and GPS fleet tracking industry. This paper offers an overview of the system and offers a detailed analysis of what made it work.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80212882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Session details: IG track 4: systems and infrastructure, medical 会议细节:IG专场4:系统和基础设施,医疗
T. Senator
{"title":"Session details: IG track 4: systems and infrastructure, medical","authors":"T. Senator","doi":"10.1145/3248780","DOIUrl":"https://doi.org/10.1145/3248780","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81703734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining predictions for accurate recommender systems 结合预测准确的推荐系统
Michael Jahrer, Andreas Töscher, R. Legenstein
{"title":"Combining predictions for accurate recommender systems","authors":"Michael Jahrer, Andreas Töscher, R. Legenstein","doi":"10.1145/1835804.1835893","DOIUrl":"https://doi.org/10.1145/1835804.1835893","url":null,"abstract":"We analyze the application of ensemble learning to recommender systems on the Netflix Prize dataset. For our analysis we use a set of diverse state-of-the-art collaborative filtering (CF) algorithms, which include: SVD, Neighborhood Based Approaches, Restricted Boltzmann Machine, Asymmetric Factor Model and Global Effects. We show that linearly combining (blending) a set of CF algorithms increases the accuracy and outperforms any single CF algorithm. Furthermore, we show how to use ensemble methods for blending predictors in order to outperform a single blending algorithm. The dataset and the source code for the ensemble blending are available online.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84512648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 285
Flexible constrained spectral clustering 柔性约束谱聚类
Xiang Wang, I. Davidson
{"title":"Flexible constrained spectral clustering","authors":"Xiang Wang, I. Davidson","doi":"10.1145/1835804.1835877","DOIUrl":"https://doi.org/10.1145/1835804.1835877","url":null,"abstract":"Constrained clustering has been well-studied for algorithms like K-means and hierarchical agglomerative clustering. However, how to encode constraints into spectral clustering remains a developing area. In this paper, we propose a flexible and generalized framework for constrained spectral clustering. In contrast to some previous efforts that implicitly encode Must-Link and Cannot-Link constraints by modifying the graph Laplacian or the resultant eigenspace, we present a more natural and principled formulation, which preserves the original graph Laplacian and explicitly encodes the constraints. Our method offers several practical advantages: it can encode the degree of belief (weight) in Must-Link and Cannot-Link constraints; it guarantees to lower-bound how well the given constraints are satisfied using a user-specified threshold; and it can be solved deterministically in polynomial time through generalized eigendecomposition. Furthermore, by inheriting the objective function from spectral clustering and explicitly encoding the constraints, much of the existing analysis of spectral clustering techniques is still valid. Consequently our work can be posed as a natural extension to unconstrained spectral clustering and be interpreted as finding the normalized min-cut of a labeled graph. We validate the effectiveness of our approach by empirical results on real-world data sets, with applications to constrained image segmentation and clustering benchmark data sets with both binary and degree-of-belief constraints.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78560089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 214
Compressed fisher linear discriminant analysis: classification of randomly projected data 压缩fisher线性判别分析:随机投影数据的分类
R. Durrant, A. Kabán
{"title":"Compressed fisher linear discriminant analysis: classification of randomly projected data","authors":"R. Durrant, A. Kabán","doi":"10.1145/1835804.1835945","DOIUrl":"https://doi.org/10.1145/1835804.1835945","url":null,"abstract":"We consider random projections in conjunction with classification, specifically the analysis of Fisher's Linear Discriminant (FLD) classifier in randomly projected data spaces. Unlike previous analyses of other classifiers in this setting, we avoid the unnatural effects that arise when one insists that all pairwise distances are approximately preserved under projection. We impose no sparsity or underlying low-dimensional structure constraints on the data; we instead take advantage of the class structure inherent in the problem. We obtain a reasonably tight upper bound on the estimated misclassification error on average over the random choice of the projection, which, in contrast to early distance preserving approaches, tightens in a natural way as the number of training examples increases. It follows that, for good generalisation of FLD, the required projection dimension grows logarithmically with the number of classes. We also show that the error contribution of a covariance misspecification is always no worse in the low-dimensional space than in the initial high-dimensional space. We contrast our findings to previous related work, and discuss our insights.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77779766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Mining advisor-advisee relationships from research publication networks 从研究出版网络中挖掘顾问与被顾问的关系
Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, Jingyi Guo
{"title":"Mining advisor-advisee relationships from research publication networks","authors":"Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, Jingyi Guo","doi":"10.1145/1835804.1835833","DOIUrl":"https://doi.org/10.1145/1835804.1835833","url":null,"abstract":"Information network contains abundant knowledge about relationships among people or entities. Unfortunately, such kind of knowledge is often hidden in a network where different kinds of relationships are not explicitly categorized. For example, in a research publication network, the advisor-advisee relationships among researchers are hidden in the coauthor network. Discovery of those relationships can benefit many interesting applications such as expert finding and research community analysis. In this paper, we take a computer science bibliographic network as an example, to analyze the roles of authors and to discover the likely advisor-advisee relationships. In particular, we propose a time-constrained probabilistic factor graph model (TPFG), which takes a research publication network as input and models the advisor-advisee relationship mining problem using a jointly likelihood objective function. We further design an efficient learning algorithm to optimize the objective function. Based on that our model suggests and ranks probable advisors for every author. Experimental results show that the proposed approach infer advisor-advisee relationships efficiently and achieves a state-of-the-art accuracy (80-90%). We also apply the discovered advisor-advisee relationships to bole search, a specific expert finding task and empirical study shows that the search performance can be effectively improved (+4.09% by NDCG@5).","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72800479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 200
The community-search problem and how to plan a successful cocktail party 社区搜索问题以及如何策划一场成功的鸡尾酒会
Mauro Sozio, A. Gionis
{"title":"The community-search problem and how to plan a successful cocktail party","authors":"Mauro Sozio, A. Gionis","doi":"10.1145/1835804.1835923","DOIUrl":"https://doi.org/10.1145/1835804.1835923","url":null,"abstract":"A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to the input graph. However, for many interesting applications one is interested in finding the community formed by a given set of nodes. In this paper we study a query-dependent variant of the community-detection problem, which we call the community-search problem: given a graph G, and a set of query nodes in the graph, we seek to find a subgraph of G that contains the query nodes and it is densely connected. We motivate a measure of density based on minimum degree and distance constraints, and we develop an optimum greedy algorithm for this measure. We proceed by characterizing a class of monotone constraints and we generalize our algorithm to compute optimum solutions satisfying any set of monotone constraints. Finally we modify the greedy algorithm and we present two heuristic algorithms that find communities of size no greater than a specified upper bound. Our experimental evaluation on real datasets demonstrates the efficiency of the proposed algorithms and the quality of the solutions we obtain.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87642481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 451
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信