Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献_第2页

Data Science from the Lab to the Field to the Enterprise 从实验室到现场再到企业的数据科学

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2790461

Christopher M. White

引用次数: 2

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes 一个机器学习框架来识别有不良学业成果风险的学生

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2788620

Himabindu Lakkaraju, Everaldo Aguiar, Carl Shan, David I Miller, Nasir Bhanpuri, R. Ghani, Kecia L. Addison

引用次数: 135

Personalizing LinkedIn Feed 个性化LinkedIn Feed

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2788614

D. Agarwal, Bee-Chung Chen, Qi He, Zhenhao Hua, Guy Lebanon, Yiming Ma, Pannagadatta K. Shivaswamy, Hsiao-Ping Tseng, Jaewon Yang, L. Zhang

引用次数: 39

Collective Opinion Spam Detection: Bridging Review Networks and Metadata 集体意见垃圾邮件检测:桥接评论网络和元数据

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783370

Shebuti Rayana, L. Akoglu

{"title":"Collective Opinion Spam Detection: Bridging Review Networks and Metadata","authors":"Shebuti Rayana, L. Akoglu","doi":"10.1145/2783258.2783370","DOIUrl":"https://doi.org/10.1145/2783258.2783370","url":null,"abstract":"Online reviews capture the testimonials of \"real\" people and help shape the decisions of other consumers. Due to the financial gains associated with positive reviews, however, opinion spam has become a widespread problem, with often paid spam reviewers writing fake reviews to unjustly promote or demote certain products or businesses. Existing approaches to opinion spam have successfully but separately utilized linguistic clues of deception, behavioral footprints, or relational ties between agents in a review system. In this work, we propose a new holistic approach called SPEAGLE that utilizes clues from all metadata (text, timestamp, rating) as well as relational data (network), and harness them collectively under a unified framework to spot suspicious users and reviews, as well as products targeted by spam. Moreover, our method can efficiently and seamlessly integrate semi-supervision, i.e., a (small) set of labels if available, without requiring any training or changes in its underlying algorithm. We demonstrate the effectiveness and scalability of SPEAGLE on three real-world review datasets from Yelp.com with filtered (spam) and recommended (non-spam) reviews, where it significantly outperforms several baselines and state-of-the-art methods. To the best of our knowledge, this is the largest scale quantitative evaluation performed to date for the opinion spam problem.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117094712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 473

MOOCS: What Have We Learned? mooc:我们学到了什么?

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2785465

D. Koller

引用次数: 5

Dimensionality Reduction Via Graph Structure Learning 基于图结构学习的降维方法

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783309

Qi Mao, Li Wang, S. Goodison, Yijun Sun

{"title":"Dimensionality Reduction Via Graph Structure Learning","authors":"Qi Mao, Li Wang, S. Goodison, Yijun Sun","doi":"10.1145/2783258.2783309","DOIUrl":"https://doi.org/10.1145/2783258.2783309","url":null,"abstract":"We present a new dimensionality reduction setting for a large family of real-world problems. Unlike traditional methods, the new setting aims to explicitly represent and learn an intrinsic structure from data in a high-dimensional space, which can greatly facilitate data visualization and scientific discovery in downstream analysis. We propose a new dimensionality-reduction framework that involves the learning of a mapping function that projects data points in the original high-dimensional space to latent points in a low-dimensional space that are then used directly to construct a graph. Local geometric information of the projected data is naturally captured by the constructed graph. As a showcase, we develop a new method to obtain a discriminative and compact feature representation for clustering problems. In contrast to assumptions used in traditional clustering methods, we assume that centers of clusters should be close to each other if they are connected in a learned graph, and other cluster centers should be distant. Extensive experiments are performed that demonstrate that the proposed method is able to obtain discriminative feature representations yielding superior clustering performance, and correctly recover the intrinsic structures of various real-world datasets including curves, hierarchies and a cancer progression path.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129670773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 85

Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning 公共卫生预测模型:预防儿童铅中毒

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2788629

E. Potash, J. Brew, Alexander Loewi, S. Majumdar, Andrew G. Reece, Joe Walsh, Eric Rozier, Emile Jorgenson, Raed Mansour, R. Ghani

{"title":"Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning","authors":"E. Potash, J. Brew, Alexander Loewi, S. Majumdar, Andrew G. Reece, Joe Walsh, Eric Rozier, Emile Jorgenson, Raed Mansour, R. Ghani","doi":"10.1145/2783258.2788629","DOIUrl":"https://doi.org/10.1145/2783258.2788629","url":null,"abstract":"Lead poisoning is a major public health problem that affects hundreds of thousands of children in the United States every year. A common approach to identifying lead hazards is to test all children for elevated blood lead levels and then investigate and remediate the homes of children with elevated tests. This can prevent exposure to lead of future residents, but only after a child has been poisoned. This paper describes joint work with the Chicago Department of Public Health (CDPH) in which we build a model that predicts the risk of a child to being poisoned so that an intervention can take place before that happens. Using two decades of blood lead level tests, home lead inspections, property value assessments, and census data, our model allows inspectors to prioritize houses on an intractably long list of potential hazards and identify children who are at the highest risk. This work has been described by CDPH as pioneering in the use of machine learning and predictive analytics in public health and has the potential to have a significant impact on both health and economic outcomes for communities across the US.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130500456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Estimating Local Intrinsic Dimensionality 局部固有维数的估计

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783405

L. Amsaleg, Oussama Chelly, T. Furon, S. Girard, M. Houle, K. Kawarabayashi, Michael Nett

引用次数: 116

MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams 在图形流中计算局部三角形的内存效率和精确采样

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783285

Yongsub Lim, U. Kang

{"title":"MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams","authors":"Yongsub Lim, U. Kang","doi":"10.1145/2783258.2783285","DOIUrl":"https://doi.org/10.1145/2783258.2783285","url":null,"abstract":"How can we estimate local triangle counts accurately in a graph stream without storing the whole graph? The local triangle counting which counts triangles for each node in a graph is a very important problem with wide applications in social network analysis, anomaly detection, web mining, etc. In this paper, we propose MASCOT, a memory-efficient and accurate method for local triangle estimation in a graph stream based on edge sampling. To develop MASCOT, we first present two naive local triangle counting algorithms in a graph stream: MASCOT-C and MASCOT-A. MASCOT-C is based on constant edge sampling, and MASCOT-A improves its accuracy by utilizing more memory spaces. MASCOT achieves both accuracy and memory-efficiency of the two algorithms by an unconditional triangle counting for a new edge, regardless of whether it is sampled or not. In contrast to the existing algorithm which requires prior knowledge on the target graph and appropriately set parameters, MASCOT requires only one simple parameter, the edge sampling probability. Through extensive experiments, we show that for the same number of edges sampled, MASCOT provides the best accuracy compared to the existing algorithm as well as MASCOT-C and MASCOT-A. Thanks to MASCOT, we also discover interesting anomalous patterns in real graphs, like core-peripheries in the web and ambiguous author names in DBLP.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127700187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 95

More Constraints, Smaller Coresets: Constrained Matrix Approximation of Sparse Big Data 更多约束，更小核心集:稀疏大数据的约束矩阵逼近

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783312

Dan Feldman, Tamir Tassa

{"title":"More Constraints, Smaller Coresets: Constrained Matrix Approximation of Sparse Big Data","authors":"Dan Feldman, Tamir Tassa","doi":"10.1145/2783258.2783312","DOIUrl":"https://doi.org/10.1145/2783258.2783312","url":null,"abstract":"We suggest a generic data reduction technique with provable guarantees for computing the low rank approximation of a matrix under some $ellz error, and constrained factorizations, such as the Non-negative Matrix Factorization (NMF). Our main algorithm reduces a given n x d matrix into a small, ε-dependent, weighted subset C of its rows (known as a coreset), whose size is independent of both n and d. We then prove that applying existing algorithms on the resulting coreset can be turned into (1+ε)-approximations for the original (large) input matrix. In particular, we provide the first linear time approximation scheme (LTAS) for the rank-one NMF. The coreset C can be computed in parallel and using only one pass over a possibly unbounded stream of row vectors. In this sense we improve the result in [4] (Best paper of STOC 2013). Moreover, since C is a subset of these rows, its construction time, as well as its sparsity (number of non-zeroes entries) and the sparsity of the resulting low rank approximation depend on the maximum sparsity of an input row, and not on the actual dimension d. In this sense, we improve the result of Libery [21](Best paper of KDD 2013) and answer affirmably, and in a more general setting, his open question of computing such a coreset. Source code is provided for reproducing the experiments and integration with existing and future algorithms.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131394927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22