Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献_第4页

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy 一种新的可容许剪枝策略加速动态时间翘曲聚类

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783286

Nurjahan Begum, Liudmila Ulanova, Jun Wang, Eamonn J. Keogh

{"title":"Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy","authors":"Nurjahan Begum, Liudmila Ulanova, Jun Wang, Eamonn J. Keogh","doi":"10.1145/2783258.2783286","DOIUrl":"https://doi.org/10.1145/2783258.2783286","url":null,"abstract":"Clustering time series is a useful operation in its own right, and an important subroutine in many higher-level data mining analyses, including data editing for classifiers, summarization, and outlier detection. While it has been noted that the general superiority of Dynamic Time Warping (DTW) over Euclidean Distance for similarity search diminishes as we consider ever larger datasets, as we shall show, the same is not true for clustering. Thus, clustering time series under DTW remains a computationally challenging task. In this work, we address this lethargy in two ways. We propose a novel pruning strategy that exploits both upper and lower bounds to prune off a large fraction of the expensive distance calculations. This pruning strategy is admissible; giving us provably identical results to the brute force algorithm, but is at least an order of magnitude faster. For datasets where even this level of speedup is inadequate, we show that we can use a simple heuristic to order the unavoidable calculations in a most-useful-first ordering, thus casting the clustering as an anytime algorithm. We demonstrate the utility of our ideas with both single and multidimensional case studies in the domains of astronomy, speech physiology, medicine and entomology.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128889635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 107

Gas Concentration Reconstruction for Coal-Fired Boilers Using Gaussian Process 基于高斯过程的燃煤锅炉瓦斯浓度重建

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2788617

Chao Yuan, Matthias Behmann, B. Meerbeck

{"title":"Gas Concentration Reconstruction for Coal-Fired Boilers Using Gaussian Process","authors":"Chao Yuan, Matthias Behmann, B. Meerbeck","doi":"10.1145/2783258.2788617","DOIUrl":"https://doi.org/10.1145/2783258.2788617","url":null,"abstract":"The goal of combustion optimization of a coal-fired boiler is to improve its operating efficiency while reducing emissions at the same time. Being able to take measurements for key combustion ingredients, such as O2, CO, H2O is crucial for the feedback loop needed by this task. One state-of-the-art laser technique, namely, Tunable Diode Laser Absorption Spectroscopy (TDLAS) is able to measure the average value of gas concentration along a laser beam path. A active research direction in TDLAS is how to reconstruct gas concentration images based on these path averages. However, in reality the number of such paths is usually very limited, leading to an extremely under-constrained estimation problem. Another overlooked aspect of the problem is that how can we arrange paths such that the reconstructed image is more accurate? We propose a Bayesian approach based on Gaussian process (GP) to address both image reconstruction and path arrangement problems, simultaneously. Specifically, we use the GP posterior mean as the reconstructed image, and average posterior pixel variance as our objective function to optimize the path arrangement. Our algorithms have been integrated in Siemens SPPA-P3000 control system that provides real-time combustion optimization of boilers around the world.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129037506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 第21届ACM SIGKDD知识发现与数据挖掘国际会议论文集

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258

Longbing Cao, Chengqi Zhang, T. Joachims, G. Webb, D. Margineantu, Graham J. Williams

{"title":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","authors":"Longbing Cao, Chengqi Zhang, T. Joachims, G. Webb, D. Margineantu, Graham J. Williams","doi":"10.1145/2783258","DOIUrl":"https://doi.org/10.1145/2783258","url":null,"abstract":"It is our great pleasure to welcome you to the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Theannual ACM SIGKDD conference is the premier international forum for data science, data mining, knowledge discovery and big data. It brings together researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. \u0000 \u0000KDD-2015 features 4 plenary keynote presentations, 12 invited talks, 228 paper presentations, a discussion panel, a poster session, 14 workshops, 12 tutorials, 27 exhibition booths, the KDD Cup competition, and a banquet at the Dockside Pavilion at the Sydney Darling Harbour. As always, KDD-2015 attracted presenters and delegates from around the world. It is with great pleasure that we bring this international conference for the first time to the southern hemisphere. \u0000 \u0000This year we again had a strong set of submissions. There were 819 submissions to the Research Track, of which 160 papers were accepted. There were 189 submissions to the Industry & Government Track, of which 68 papers were accepted. All papers submitted to the Research Track and to the Industry & Government Tracks were subjected to a rigorous review process. They were initially screened by the Chairs of the respective tracks, and a small number of papers that did not comply with the formatting requirements or which violated the dual submission policy were summarily rejected. At least three reviewers and a metareviewer were assigned to all remaining papers based on the results of a bidding process. The authors were able to read the reviews and provide a response. The meta-reviewers then had an opportunity to consider all reviews and author responses. This then initiated a discussion during which all reviewers of a paper had the opportunity to read each other's reviews and the author responses and to update their reviews as appropriate. In a few cases, the meta-reviewers added another reviewer at this stage to gain expert opinion on specific issues. The meta-reviewers then made recommendations on acceptance or rejection to the track chairs. The track chairs then assessed the meta-reviews, reviews, author responses and discussions to make a final decision. In a few cases, they also solicited further expert reviews and meta-reviews to resolve specific questions. Thus, all papers were assessed by at least four and up to seven discipline experts. All accepted papers were presented both as a 20-minute talk and as a poster. \u0000 \u0000The Industry & Government Invited Talk Track features 12 talks from world renowned experts who have played a significant role in developing and deploying large-scale data mining applications and systems in their respective fields with clearly measurable and meaningful impact. We trust that this opportunity for the KDD community to hear directly from senior leaders in industry and government will inspire new advances and broader interdisciplinary collaboration between resea","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129278832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

Dynamically Modeling Patient's Health State from Electronic Medical Records: A Time Series Approach 从电子病历动态建模患者健康状态:一种时间序列方法

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783289

Karla L. Caballero Barajas, R. Akella

{"title":"Dynamically Modeling Patient's Health State from Electronic Medical Records: A Time Series Approach","authors":"Karla L. Caballero Barajas, R. Akella","doi":"10.1145/2783258.2783289","DOIUrl":"https://doi.org/10.1145/2783258.2783289","url":null,"abstract":"In this paper, we present a method to dynamically estimate the probability of mortality inside the Intensive Care Unit (ICU) by combining heterogeneous data. We propose a method based on Generalized Linear Dynamic Models that models the probability of mortality as a latent state that evolves over time. This framework allows us to combine different types of features (lab results, vital signs readings, doctor and nurse notes, etc) into a single state, which is updated each time new patient data is observed. In addition, we include the use of text features, based on medical noun phrase extraction and Statistical Topic Models. These features provide context about the patient that cannot be captured when only numerical features are used. We fill out the missing values using a Regularized Expectation Maximization based method assuming temporal data. We test our proposed approach using 15,000 Electronic Medical Records (EMRs) obtained from the MIMIC II public dataset. Experimental results show that the proposed model allows us to detect an increase in the probability of mortality before it occurs. We report an AUC 0.8657. Our proposed model clearly outperforms other methods of the literature in terms of sensitivity with 0.7885 compared to 0.6559 of Naive Bayes and F-score with 0.5929 compared to 0.4662 of Apache III score after 24 hours.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125361861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 88

From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks 从基础设施到文化:大规模社交网络中的A/B测试挑战

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2788602

Ya Xu, Nanyu Chen, A. Fernandez, Omar Sinno, Anmol Bhasin

{"title":"From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks","authors":"Ya Xu, Nanyu Chen, A. Fernandez, Omar Sinno, Anmol Bhasin","doi":"10.1145/2783258.2788602","DOIUrl":"https://doi.org/10.1145/2783258.2788602","url":null,"abstract":"A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used among online websites, including social network sites such as Facebook, LinkedIn, and Twitter to make data-driven decisions. At LinkedIn, we have seen tremendous growth of controlled experiments over time, with now over 400 concurrent experiments running per day. General A/B testing frameworks and methodologies, including challenges and pitfalls, have been discussed extensively in several previous KDD work [7, 8, 9, 10]. In this paper, we describe in depth the experimentation platform we have built at LinkedIn and the challenges that arise particularly when running A/B tests at large scale in a social network setting. We start with an introduction of the experimentation platform and how it is built to handle each step of the A/B testing process at LinkedIn, from designing and deploying experiments to analyzing them. It is then followed by discussions on several more sophisticated A/B testing scenarios, such as running offline experiments and addressing the network effect, where one user's action can influence that of another. Lastly, we talk about features and processes that are crucial for building a strong experimentation culture.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121535198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 201

Clouded Intelligence 的情报

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2790454

Joseph Sirosh

引用次数: 0

FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation FaitCrowd:用于众包数据聚合的细粒度真相发现

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783314

Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, Jiawei Han

{"title":"FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation","authors":"Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, Jiawei Han","doi":"10.1145/2783258.2783314","DOIUrl":"https://doi.org/10.1145/2783258.2783314","url":null,"abstract":"In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose FaitCrowd, a fine grained truth discovery model for the task of aggregating conflicting data collected from multiple users/sources. FaitCrowd jointly models the process of generating question content and sources' provided answers in a probabilistic model to estimate both topical expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, FaitCrowd demonstrates better ability to obtain true answers for the questions compared with existing approaches. Experimental results on two real-world datasets show that FaitCrowd can significantly reduce the error rate of aggregation compared with the state-of-the-art multi-source aggregation approaches due to its ability of learning topical expertise from question content and collected answers.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116589808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 173

A Deep Hybrid Model for Weather Forecasting 一种用于天气预报的深度混合模型

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783275

Aditya Grover, Ashish Kapoor, E. Horvitz

引用次数: 226

Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection 基于汤普森采样的专家选择实时出价预测

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2788586

E. Ikonomovska, Sina Jafarpour, Ali Dasdan

{"title":"Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection","authors":"E. Ikonomovska, Sina Jafarpour, Ali Dasdan","doi":"10.1145/2783258.2788586","DOIUrl":"https://doi.org/10.1145/2783258.2788586","url":null,"abstract":"We study online meta-learners for real-time bid prediction that predict by selecting a single best predictor among several subordinate prediction algorithms, here called \"experts\". These predictors belong to the family of context-dependent past performance estimators that make a prediction only when the instance to be predicted falls within their areas of expertise. Within the advertising ecosystem, it is very common for the contextual information to be incomplete, hence, it is natural for some of the experts to abstain from making predictions on some of the instances. Experts' areas of expertise can overlap, which makes their predictions less suitable for merging; as such, they lend themselves better to the problem of best expert selection. In addition, their performance varies over time, which gives the expert selection problem a non-stochastic, adversarial flavor. In this paper we propose to use probability sampling (via Thompson Sampling) as a meta-learning algorithm that samples from the pool of experts for the purpose of bid prediction. We show performance results from the comparison of our approach to multiple state-of-the-art algorithms using exploration scavenging on a log file of over 300 million ad impressions, as well as comparison to a baseline rule-based model using production traffic from a leading DSP platform.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122802750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Whither Social Networks for Web Search? 网络搜索的社交网络去向何方?

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2788571

R. Agrawal, Behzad Golshan, E. Papalexakis

引用次数: 17