Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献

筛选
英文 中文
Probabilistic Graphical Models of Dyslexia 阅读障碍的概率图形模型
Yair Lakretz, Gal Chechik, N. Friedmann, M. Rosen-Zvi
{"title":"Probabilistic Graphical Models of Dyslexia","authors":"Yair Lakretz, Gal Chechik, N. Friedmann, M. Rosen-Zvi","doi":"10.1145/2783258.2788604","DOIUrl":"https://doi.org/10.1145/2783258.2788604","url":null,"abstract":"Reading is a complex cognitive process, errors in which may assume diverse forms. In this study, introducing a novel approach, we use two families of probabilistic graphical models to analyze patterns of reading errors made by dyslexic people: an LDA-based model and two Naëve Bayes models which differ by their assumptions about the generation process of reading errors. The models are trained on a large corpus of reading errors. Results show that a Naëve Bayes model achieves highest accuracy compared to labels given by clinicians (AUC = 0.801 ± 0.05), thus providing the first automated and objective diagnosis tool for dyslexia which is solely based on reading errors data. Results also show that the LDA-based model best captures patterns of reading errors and could therefore contribute to the understanding of dyslexia and to future improvement of the diagnostic procedure. Finally, we draw on our results to shed light on a theoretical debate about the definition and heterogeneity of dyslexia. Our results support a model assuming multiple dyslexia subtypes, that of a heterogeneous view of dyslexia.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133335090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The Effectiveness of Marketing Strategies in Social Media: Evidence from Promotional Events 社会化媒体营销策略的有效性:来自促销活动的证据
Panagiotis Adamopoulos, Vilma Todri
{"title":"The Effectiveness of Marketing Strategies in Social Media: Evidence from Promotional Events","authors":"Panagiotis Adamopoulos, Vilma Todri","doi":"10.1145/2783258.2788597","DOIUrl":"https://doi.org/10.1145/2783258.2788597","url":null,"abstract":"This paper studies a novel social media venture and seeks to understand the effectiveness of marketing strategies in social media platforms by evaluating their impact on participating brands and organizations. We use a real-world data set and employ a promising research approach combining econometric with predictive modeling techniques in a causal estimation framework that allows for more accurate counterfactuals. Based on the results of the presented analysis and focusing on the long-term business value of marketing strategies in social media, we find that promotional events leveraging implicit or explicit advocacy in social media platforms result in significant abnormal returns for the participating brand, in terms of expanding the social media fan base of the firm. The effect is also economically significant as it corresponds to an increase of several thousand additional new followers per day for an average size brand. We also precisely quantify the impact of various promotion characteristics and demonstrate what types of promotions are more effective and for which brands, while suggesting specific tactical strategies. For instance, despite the competition for consumers' attention, brands and marketers should broadcast marketing messages on social networks during the time of peak usage in order to maximize their returns. Overall, we provide actionable insights with major implications for firms and social media platforms and contribute to the related literature as we discover new rich findings enabled by the employed causal estimation framework.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"644 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123045330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints 一种有效的序列约束半监督聚类算法
Jinfeng Yi, Lijun Zhang, Tianbao Yang, W. Liu, Jun Wang
{"title":"An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints","authors":"Jinfeng Yi, Lijun Zhang, Tianbao Yang, W. Liu, Jun Wang","doi":"10.1145/2783258.2783389","DOIUrl":"https://doi.org/10.1145/2783258.2783389","url":null,"abstract":"Semi-supervised clustering leverages side information such as pairwise constraints to guide clustering procedures. Despite promising progress, existing semi-supervised clustering approaches overlook the condition of side information being generated sequentially, which is a natural setting arising in numerous real-world applications such as social network and e-commerce system analysis. Given emerged new constraints, classical semi-supervised clustering algorithms need to re-optimize their objectives over all data samples and constraints in availability, which prevents them from efficiently updating the obtained data partitions. To address this challenge, we propose an efficient dynamic semi-supervised clustering framework that casts the clustering problem into a search problem over a feasible convex set, i.e., a convex hull with its extreme points being an ensemble of m data partitions. According to the principle of ensemble clustering, the optimal partition lies in the convex hull, and can thus be uniquely represented by an m-dimensional probability simplex vector. As such, the dynamic semi-supervised clustering problem is simplified to the problem of updating a probability simplex vector subject to the newly received pairwise constraints. We then develop a computationally efficient updating procedure to update the probability simplex vector in O(m2) time, irrespective of the data size n. Our empirical studies on several real-world benchmark datasets show that the proposed algorithm outperforms the state-of-the-art semi-supervised clustering algorithms with visible performance gain and significantly reduced running time.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121406598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Machine Learning and Causal Inference for Policy Evaluation 政策评估的机器学习和因果推理
S. Athey
{"title":"Machine Learning and Causal Inference for Policy Evaluation","authors":"S. Athey","doi":"10.1145/2783258.2785466","DOIUrl":"https://doi.org/10.1145/2783258.2785466","url":null,"abstract":"A large literature on causal inference in statistics, econometrics, biostatistics, and epidemiology (see, e.g., Imbens and Rubin [2015] for a recent survey) has focused on methods for statistical estimation and inference in a setting where the researcher wishes to answer a question about the (counterfactual) impact of a change in a policy, or \"treatment\" in the terminology of the literature. The policy change has not necessarily been observed before, or may have been observed only for a subset of the population; examples include a change in minimum wage law or a change in a firm's price. The goal is then to estimate the impact of small set of \"treatments\" using data from randomized experiments or, more commonly, \"observational\" studies (that is, non-experimental data). The literature identifies a variety of assumptions that, when satisfied, allow the researcher to draw the same types of conclusions that would be available from a randomized experiment. To estimate causal effects given non-random assignment of individuals to alternative policies in observational studies, popular techniques include propensity score weighting, matching, and regression analysis; all of these methods adjust for differences in observed attributes of individuals. Another strand of literature in econometrics, referred to as \"structural modeling,\" fully specifies the preferences of actors as well as a behavioral model, and estimates those parameters from data (for applications to auction-based electronic commerce, see Athey and Haile [2007] and Athey and Nekipelov [2012]). In both cases, parameter estimates are interpreted as \"causal,\" and they are used to make predictions about the effect of policy changes. In contrast, the supervised machine learning literature has traditionally focused on prediction, providing data-driven approaches to building rich models and relying on cross-validation as a powerful tool for model selection. These methods have been highly successful in practice. This talk will review several recent papers that attempt to bring the tools of supervised machine learning to bear on the problem of policy evaluation, where the papers are connected by three themes. The first theme is that it important for both estimation and inference to distinguish between parts of the model that relate to the causal question of interest, and \"attributes,\" that is, features or variables that describe attributes of individual units that are held fixed when policies change. Specifically, we propose to divide the features of a model into causal features, whose values may be manipulated in a counterfactual policy environment, and attributes. A second theme is that relative to conventional tools from the policy evaluation literature, tools from supervised machine learning can be particularly effective at modeling the association of outcomes with attributes, as well as in modeling how causal effects vary with attributes. A final theme is that modifications of existing methods may","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115254149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
Discovery of Meaningful Rules in Time Series 发现时间序列中的有意义规则
Mohammad Shokoohi-Yekta, Yanping Chen, Bilson J. L. Campana, Bing Hu, J. Zakaria, Eamonn J. Keogh
{"title":"Discovery of Meaningful Rules in Time Series","authors":"Mohammad Shokoohi-Yekta, Yanping Chen, Bilson J. L. Campana, Bing Hu, J. Zakaria, Eamonn J. Keogh","doi":"10.1145/2783258.2783306","DOIUrl":"https://doi.org/10.1145/2783258.2783306","url":null,"abstract":"The ability to make predictions about future events is at the heart of much of science; so, it is not surprising that prediction has been a topic of great interest in the data mining community for the last decade. Most of the previous work has attempted to predict the future based on the current value of a stream. However, for many problems the actual values are irrelevant, whereas the shape of the current time series pattern may foretell the future. The handful of research efforts that consider this variant of the problem have met with limited success. In particular, it is now understood that most of these efforts allow the discovery of spurious rules. We believe the reason why rule discovery in real-valued time series has failed thus far is because most efforts have more or less indiscriminately applied the ideas of symbolic stream rule discovery to real-valued rule discovery. In this work, we show why these ideas are not directly suitable for rule discovery in time series. Beyond our novel definitions/representations, which allow for meaningful and extendable specifications of rules, we further show novel algorithms that allow us to quickly discover high quality rules in very large datasets that accurately predict the occurrence of future events.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116229914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Data Driven Science: SIGKDD Panel 数据驱动科学:SIGKDD小组
K. Morik, H. Durrant-Whyte, Gary Hill, Dietmar Müller, T. Berger-Wolf
{"title":"Data Driven Science: SIGKDD Panel","authors":"K. Morik, H. Durrant-Whyte, Gary Hill, Dietmar Müller, T. Berger-Wolf","doi":"10.1145/2783258.2788703","DOIUrl":"https://doi.org/10.1145/2783258.2788703","url":null,"abstract":"The panel session 'Data Driven Science' discusses application and use of knowledge discovery, machine learning and data analytics in science disciplines; in natural, physical, medical and social science; from physics to geology, and from neuroscience to population health. Knowledge discovery methods are finding broad application in all areas of scientific endeavor, to explore experimental data, to discover new models, to propose new scientific theories and ideas. In addition, the availability of ever larger scientific data sets is driving a new data-driven paradigm for modeling of complex phenomena in physical, natural and social sciences. The purpose of this panel is to bring together users of knowledge discovery, machine learning and data analytics methods across the science disciplines, to understand what tools and methods are proving effective in areas such as data exploration and modeling, to uncover common problems that can be addressed in the KDD community, and to explore the emerging data-driven paradigm in science.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126691923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Predicting Voice Elicited Emotions 预测声音引发的情绪
Y. Li, Jose D. Contreras, Luis J. Salazar
{"title":"Predicting Voice Elicited Emotions","authors":"Y. Li, Jose D. Contreras, Luis J. Salazar","doi":"10.1145/2783258.2788619","DOIUrl":"https://doi.org/10.1145/2783258.2788619","url":null,"abstract":"We present the research, and product development and deployment, of Voice Analyzer' by Jobaline Inc. This is a patent pending technology that analyzes voice data and predicts human emotions elicited by the paralinguistic elements of a voice. Human voice characteristics, such as tone, complement the verbal communication. In several contexts of communication, \"how\" things are said is just as important as \"what\" is being said. This paper provides an overview of our deployed system, the raw data, the data processing steps, and the prediction algorithms we experimented with. A case study is included where, given a voice clip, our model predicts the degree in which a listener will find the voice \"engaging\". Our prediction results were verified through independent market research with 75% in agreement on how an average listener would feel. One application of Jobaline Voice Analyzer technology is for assisting companies to hire workers in the service industry where customers' emotional response to workers' voice may affect the service outcome. Jobaline Voice Analyzer is deployed in production as a product offer to our clients to help them identify workers who will better engage with their customers. We will also share some discoveries and lessons learned.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124311739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
VC-Dimension and Rademacher Averages: From Statistical Learning Theory to Sampling Algorithms vc维和Rademacher平均:从统计学习理论到抽样算法
Matteo Riondato, E. Upfal
{"title":"VC-Dimension and Rademacher Averages: From Statistical Learning Theory to Sampling Algorithms","authors":"Matteo Riondato, E. Upfal","doi":"10.1145/2783258.2789984","DOIUrl":"https://doi.org/10.1145/2783258.2789984","url":null,"abstract":"Rademacher Averages and the Vapnik-Chervonenkis dimension are fundamental concepts from statistical learning theory. They allow to study simultaneous deviation bounds of empirical averages from their expectations for classes of functions, by considering properties of the functions, of their domain (the dataset), and of the sampling process. In this tutorial, we survey the use of Rademacher Averages and the VC-dimension in sampling-based algorithms for graph analysis and pattern mining. We start from their theoretical foundations at the core of machine learning, then show a generic recipe for formulating data mining problems in a way that allows to use these concepts in efficient randomized algorithms for those problems. Finally, we show examples of the application of the recipe to graph problems (connectivity, shortest paths, betweenness centrality) and pattern mining. Our goal is to expose the usefulness of these techniques for the data mining researcher, and to encourage research in the area.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"456 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124324585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Structured Hedging for Resource Allocations with Leverage 杠杆资源配置的结构性对冲
Nicholas Johnson, A. Banerjee
{"title":"Structured Hedging for Resource Allocations with Leverage","authors":"Nicholas Johnson, A. Banerjee","doi":"10.1145/2783258.2783378","DOIUrl":"https://doi.org/10.1145/2783258.2783378","url":null,"abstract":"Data mining algorithms for computing solutions to online resource allocation (ORA) problems have focused on budgeting resources currently in possession, e.g., investing in the stock market with cash on hand or assigning current employees to projects. In several settings, one can leverage borrowed resources with which tasks can be accomplished more efficiently and cheaply. Additionally, a variety of opposing allocation types or positions may be available with which one can hedge the allocation to alleviate risk from external changes. In this paper, we present a formulation for hedging online resource allocations with leverage and propose an efficient data mining algorithm (SHERAL). We pose the problem as a constrained online convex optimization problem. The key novel components of our formulation are (1) a loss function for general leveraging and opposing allocation positions and (2) a penalty function which hedges between structurally dependent allocation positions to control risk. We instantiate the problem in the context of portfolio selection and evaluate the effectiveness of the formulation through extensive experiments on five datasets in comparison with existing algorithms and several variants.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"301 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131869422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Probabilistic Community and Role Model for Social Networks 社会网络的概率社区和角色模型
Yu Han, Jie Tang
{"title":"Probabilistic Community and Role Model for Social Networks","authors":"Yu Han, Jie Tang","doi":"10.1145/2783258.2783274","DOIUrl":"https://doi.org/10.1145/2783258.2783274","url":null,"abstract":"Numerous models have been proposed for modeling social networks to explore their structure or to address application problems, such as community detection and behavior prediction. However, the results are still far from satisfactory. One of the biggest challenges is how to capture all the information of a social network in a unified manner, such as links, communities, user attributes, roles and behaviors. In this paper, we propose a unified probabilistic framework, the Community Role Model (CRM), to model a social network. CRM incorporates all the information of nodes and edges that form a social network. We propose methods based on Gibbs sampling and an EM algorithm to estimate the model's parameters and fit our model to real social networks. Real data experiments show that CRM can be used not only to represent a social network, but also to handle various application problems with better performance than a baseline model, without any modification to the model, showing its great advantages.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"397 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131767374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信