Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献

筛选
英文 中文
Multi-task copula by sparse graph regression 稀疏图回归的多任务耦合
Tianyi Zhou, D. Tao
{"title":"Multi-task copula by sparse graph regression","authors":"Tianyi Zhou, D. Tao","doi":"10.1145/2623330.2623697","DOIUrl":"https://doi.org/10.1145/2623330.2623697","url":null,"abstract":"This paper proposes multi-task copula (MTC) that can handle a much wider class of tasks than mean regression with Gaussian noise in most former multi-task learning (MTL). While former MTL emphasizes shared structure among models, MTC aims at joint prediction to exploit inter-output correlation. Given input, the outputs of MTC are allowed to follow arbitrary joint continuous distribution. MTC captures the joint likelihood of multi-output by learning the marginal of each output firstly and then a sparse and smooth output dependency graph function. While the former can be achieved by classical MTL, learning graphs dynamically varying with input is quite a challenge. We address this issue by developing sparse graph regression (SpaGraphR), a non-parametric estimator incorporating kernel smoothing, maximum likelihood, and sparse graph structure to gain fast learning algorithm. It starts from a few seed graphs on a few input points, and then updates the graphs on other input points by a fast operator via coarse-to-fine propagation. Due to the power of copula in modeling semi-parametric distributions, SpaGraphR can model a rich class of dynamic non-Gaussian correlations. We show that MTC can address more flexible and difficult tasks that do not fit the assumptions of former MTL nicely, and can fully exploit their relatedness. Experiments on robotic control and stock price prediction justify its appealing performance in challenging MTL problems.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74828531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Sampling for big data: a tutorial 大数据抽样:教程
Graham Cormode, N. Duffield
{"title":"Sampling for big data: a tutorial","authors":"Graham Cormode, N. Duffield","doi":"10.1145/2623330.2630811","DOIUrl":"https://doi.org/10.1145/2623330.2630811","url":null,"abstract":"One response to the proliferation of large datasets has been to develop ingenious ways to throw resources at the problem, using massive fault tolerant storage architectures, parallel and graphical computation models such as MapReduce, Pregel and Giraph. However, not all environments can support this scale of resources, and not all queries need an exact response. This motivates the use of sampling to generate summary datasets that support rapid queries, and prolong the useful life of the data in storage. To be effective, sampling must mediate the tensions between resource constraints, data characteristics, and the required query accuracy. The state-of-the-art in sampling goes far beyond simple uniform selection of elements, to maximize the usefulness of the resulting sample. This tutorial reviews progress in sample design for large datasets, including streaming and graph-structured data. Applications are discussed to sampling network traffic and social networks.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78218032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
A case study: privacy preserving release of spatio-temporal density in paris 以巴黎为例:保护隐私的时空密度释放
G. Ács, C. Castelluccia
{"title":"A case study: privacy preserving release of spatio-temporal density in paris","authors":"G. Ács, C. Castelluccia","doi":"10.1145/2623330.2623361","DOIUrl":"https://doi.org/10.1145/2623330.2623361","url":null,"abstract":"With billions of handsets in use worldwide, the quantity of mobility data is gigantic. When aggregated they can help understand complex processes, such as the spread viruses, and built better transportation systems, prevent traffic congestion. While the benefits provided by these datasets are indisputable, they unfortunately pose a considerable threat to location privacy. In this paper, we present a new anonymization scheme to release the spatio-temporal density of Paris, in France, i.e., the number of individuals in 989 different areas of the city released every hour over a whole week. The density is computed from a call-data-record (CDR) dataset, provided by the French Telecom operator Orange, containing the CDR of roughly 2 million users over one week. Our scheme is differential private, and hence, provides provable privacy guarantee to each individual in the dataset. Our main goal with this case study is to show that, even with large dimensional sensitive data, differential privacy can provide practical utility with meaningful privacy guarantee, if the anonymization scheme is carefully designed. This work is part of the national project XData (http://xdata.fr) that aims at combining large (anonymized) datasets provided by different service providers (telecom, electricity, water management, postal service, etc.).","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77736582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 108
Predicting student risks through longitudinal analysis 通过纵向分析预测学生风险
Ashay Tamhane, S. Ikbal, Bikram Sengupta, Mayuri Duggirala, James J. Appleton
{"title":"Predicting student risks through longitudinal analysis","authors":"Ashay Tamhane, S. Ikbal, Bikram Sengupta, Mayuri Duggirala, James J. Appleton","doi":"10.1145/2623330.2623355","DOIUrl":"https://doi.org/10.1145/2623330.2623355","url":null,"abstract":"Poor academic performance in K-12 is often a precursor to unsatisfactory educational outcomes such as dropout, which are associated with significant personal and social costs. Hence, it is important to be able to predict students at risk of poor performance, so that the right personalized intervention plans can be initiated. In this paper, we report on a large-scale study to identify students at risk of not meeting acceptable levels of performance in one state-level and one national standardized assessment in Grade 8 of a major US school district. An important highlight of our study is its scale - both in terms of the number of students included, the number of years and the number of features, which provide a very solid grounding to the research. We report on our experience with handling the scale and complexity of data, and on the relative performance of various machine learning techniques we used for building predictive models. Our results demonstrate that it is possible to predict students at-risk of poor assessment performance with a high degree of accuracy, and to do so well in advance. These insights can be used to pro-actively initiate personalized intervention programs and improve the chances of student success.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81275150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Large scale predictive modeling for micro-simulation of 3G air interface load 3G空口载荷微观仿真的大规模预测建模
D. Radosavljevik, P. V. D. Putten
{"title":"Large scale predictive modeling for micro-simulation of 3G air interface load","authors":"D. Radosavljevik, P. V. D. Putten","doi":"10.1145/2623330.2623339","DOIUrl":"https://doi.org/10.1145/2623330.2623339","url":null,"abstract":"This paper outlines the approach developed together with the Radio Network Strategy & Design Department of a large European telecom operator in order to forecast the Air-Interface load in their 3G network, which is used for planning network upgrades and budgeting purposes. It is based on large scale intelligent data analysis and modeling at the level of thousands of individual radio cells resulting in 30,000 models per day. It has been embedded into a scenario simulation framework that is used by end users not experienced in data mining for studying and simulating the behavior of this complex networked system, as an example of a systematic approach to the deployment step in the KDD process. This system is already in use for two years in the country where it was developed and it is a part of a standard business process. In the last six months this national operator became a competence center for predictive modeling for micro-simulation of 3G air interface load for four other operators of the same parent company.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79539441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Does social good justify risking personal privacy? 社会利益是否证明冒个人隐私的风险是正当的?
R. Ramakrishnan, Geoffrey I. Webb
{"title":"Does social good justify risking personal privacy?","authors":"R. Ramakrishnan, Geoffrey I. Webb","doi":"10.1145/2623330.2630814","DOIUrl":"https://doi.org/10.1145/2623330.2630814","url":null,"abstract":"When data-driven improvements involve personally identifiable data, or even data that can be used to infer sensitive information about individuals, we face the dilemma that we potentially risk compromising privacy. As we see increased emphasis on using data mining to effect improvements in a range of socially beneficial activities, from improving matching of talented students to opportunities for higher education, or improving allocation of funds across competing school programs, or reducing hospitalization time following surgery, the dilemma can often be especially acute. The data involved often is personally identifiable or revealing and sensitive, and many of the institutions that must be involved in gathering and maintaining custody of the data are not equipped to adequately secure the data, raising the risk of privacy breaches. How should we approach this trade-off? Can we assess the risks? Can we control or mitigate them? Can we develop guidelines for when the risk is or is not worthwhile, and for how best to handle data in different common scenarios? Chairs Raghu Ramakrishnan and Geoffrey I. Webb bring this panel of leading data miners and privacy experts together to address these critical issues.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84037180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FEMA: flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery 动态行为模式发现的灵活进化多面分析
Meng Jiang, Peng Cui, Fei Wang, Xinran Xu, Wenwu Zhu, Shiqiang Yang
{"title":"FEMA: flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery","authors":"Meng Jiang, Peng Cui, Fei Wang, Xinran Xu, Wenwu Zhu, Shiqiang Yang","doi":"10.1145/2623330.2623644","DOIUrl":"https://doi.org/10.1145/2623330.2623644","url":null,"abstract":"Behavioral pattern discovery is increasingly being studied to understand human behavior and the discovered patterns can be used in many real world applications such as web search, recommender system and advertisement targeting. Traditional methods usually consider the behaviors as simple user and item connections, or represent them with a static model. In real world, however, human behaviors are actually complex and dynamic: they include correlations between user and multiple types of objects and also continuously evolve along time. These characteristics cause severe data sparsity and computational complexity problem, which pose great challenge to human behavioral analysis and prediction. In this paper, we propose a Flexible Evolutionary Multi-faceted Analysis (FEMA) framework for both behavior prediction and pattern mining. FEMA utilizes a flexible and dynamic factorization scheme for analyzing human behavioral data sequences, which can incorporate various knowledge embedded in different object domains to alleviate the sparsity problem. We give approximation algorithms for efficiency, where the bound of approximation loss is theoretically proved. We extensively evaluate the proposed method in two real datasets. For the prediction of human behaviors, the proposed FEMA significantly outperforms other state-of-the-art baseline methods by 17.4%. Moreover, FEMA is able to discover quite a number of interesting multi-faceted temporal patterns on human behaviors with good interpretability. More importantly, it can reduce the run time from hours to minutes, which is significant for industry to serve real-time applications.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78367183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
FBLG: a simple and effective approach for temporal dependence discovery from time series data FBLG:从时间序列数据中发现时间相关性的一种简单有效的方法
Dehua Cheng, M. T. Bahadori, Yan Liu
{"title":"FBLG: a simple and effective approach for temporal dependence discovery from time series data","authors":"Dehua Cheng, M. T. Bahadori, Yan Liu","doi":"10.1145/2623330.2623709","DOIUrl":"https://doi.org/10.1145/2623330.2623709","url":null,"abstract":"Discovering temporal dependence structure from multivariate time series has established its importance in many applications. We observe that when we look in reversed order of time, the temporal dependence structure of the time series is usually preserved after switching the roles of cause and effect. Inspired by this observation, we create a new time series by reversing the time stamps of original time series and combine both time series to improve the performance of temporal dependence recovery. We also provide theoretical justification for the proposed algorithm for several existing time series models. We test our approach on both synthetic and real world datasets. The experimental results confirm that this surprisingly simple approach is indeed effective under various circumstances.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87496703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Dynamics of news events and social media reaction 动态的新闻事件和社会媒体的反应
Mikalai Tsytsarau, Themis Palpanas, M. Castellanos
{"title":"Dynamics of news events and social media reaction","authors":"Mikalai Tsytsarau, Themis Palpanas, M. Castellanos","doi":"10.1145/2623330.2623670","DOIUrl":"https://doi.org/10.1145/2623330.2623670","url":null,"abstract":"The analysis of social sentiment expressed on the Web is becoming increasingly relevant to a variety of applications, and it is important to understand the underlying mechanisms which drive the evolution of sentiments in one way or another, in order to be able to predict these changes in the future. In this paper, we study the dynamics of news events and their relation to changes of sentiment expressed on relevant topics. We propose a novel framework, which models the behavior of news and social media in response to events as a convolution between event's importance and media response function, specific to media and event type. This framework is suitable for detecting time and duration of events, as well as their impact and dynamics, from time series of publication volume. These data can greatly enhance events analysis; for instance, they can help distinguish important events from unimportant, or predict sentiment and stock market shifts. As an example of such application, we extracted news events for a variety of topics and then correlated this data with the corresponding sentiment time series, revealing the connection between sentiment shifts and event dynamics.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82898967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Community detection in graphs through correlation 通过相关性在图中进行社区检测
Lian Duan, W. Street, Yanchi Liu, Haibing Lu
{"title":"Community detection in graphs through correlation","authors":"Lian Duan, W. Street, Yanchi Liu, Haibing Lu","doi":"10.1145/2623330.2623629","DOIUrl":"https://doi.org/10.1145/2623330.2623629","url":null,"abstract":"Community detection is an important task for social networks, which helps us understand the functional modules on the whole network. Among different community detection methods based on graph structures, modularity-based methods are very popular recently, but suffer a well-known resolution limit problem. This paper connects modularity-based methods with correlation analysis by subtly reformatting their math formulas and investigates how to fully make use of correlation analysis to change the objective function of modularity-based methods, which provides a more natural and effective way to solve the resolution limit problem. In addition, a novel theoretical analysis on the upper bound of different objective functions helps us understand their bias to different community sizes, and experiments are conducted on both real life and simulated data to validate our findings.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89057154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信