Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献_第3页

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform TFX:基于tensorflow的生产规模机器学习平台

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098021

Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, M. Ispir, Vihan Jain, L. Koc, C. Koo, Lukasz Lew, Clemens Mewald, A. Modi, N. Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, M. Wicke, Jarek Wilkiewicz, Xin Zhang, Martin A. Zinkevich

{"title":"TFX: A TensorFlow-Based Production-Scale Machine Learning Platform","authors":"Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, M. Ispir, Vihan Jain, L. Koc, C. Koo, Lukasz Lew, Clemens Mewald, A. Modi, N. Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, M. Wicke, Jarek Wilkiewicz, Xin Zhang, Martin A. Zinkevich","doi":"10.1145/3097983.3098021","DOIUrl":"https://doi.org/10.1145/3097983.3098021","url":null,"abstract":"Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components---a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt. We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions. We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134459681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 355

A Taxi Order Dispatch Model based On Combinatorial Optimization 基于组合优化的出租车订单调度模型

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098138

Lingyu Zhang, Tao Hu, Yue Min, Guobin Wu, Junying Zhang, Pengcheng Feng, Pinghua Gong, Jieping Ye

{"title":"A Taxi Order Dispatch Model based On Combinatorial Optimization","authors":"Lingyu Zhang, Tao Hu, Yue Min, Guobin Wu, Junying Zhang, Pengcheng Feng, Pinghua Gong, Jieping Ye","doi":"10.1145/3097983.3098138","DOIUrl":"https://doi.org/10.1145/3097983.3098138","url":null,"abstract":"Taxi-booking apps have been very popular all over the world as they provide convenience such as fast response time to the users. The key component of a taxi-booking app is the dispatch system which aims to provide optimal matches between drivers and riders. Traditional dispatch systems sequentially dispatch taxis to riders and aim to maximize the driver acceptance rate for each individual order. However, the traditional systems may lead to a low global success rate, which degrades the rider experience when using the app. In this paper, we propose a novel system that attempts to optimally dispatch taxis to serve multiple bookings. The proposed system aims to maximize the global success rate, thus it optimizes the overall travel efficiency, leading to enhanced user experience. To further enhance users' experience, we also propose a method to predict destinations of a user once the taxi-booking APP is started. The proposed method employs the Bayesian framework to model the distribution of a user's destination based on his/her travel histories. We use rigorous A/B tests to compare our new taxi dispatch method with state-of-the-art models using data collected in Beijing. Experimental results show that the proposed method is significantly better than other state-of-the art models in terms of global success rate (increased from 80% to 84%). Moreover, we have also achieved significant improvement on other metrics such as user's waiting-time and pick-up distance. For our destination prediction algorithm, we show that our proposed model is superior to the baseline model by improving the top-3 accuracy from 89% to 93%. The proposed taxi dispatch and destination prediction algorithms are both deployed in our online systems and serve tens of millions of users everyday.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133104970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 180

FLAP: An End-to-End Event Log Analysis Platform for System Management 面向系统管理的端到端事件日志分析平台

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098022

Tao Li, Yexi Jiang, Chunqiu Zeng, Bin Xia, Zheng Liu, Wubai Zhou, Xiaolong Zhu, Wentao Wang, L. Zhang, Junying Wu, Li Xue, Dewei Bao

{"title":"FLAP: An End-to-End Event Log Analysis Platform for System Management","authors":"Tao Li, Yexi Jiang, Chunqiu Zeng, Bin Xia, Zheng Liu, Wubai Zhou, Xiaolong Zhu, Wentao Wang, L. Zhang, Junying Wu, Li Xue, Dewei Bao","doi":"10.1145/3097983.3098022","DOIUrl":"https://doi.org/10.1145/3097983.3098022","url":null,"abstract":"Many systems, such as distributed operating systems, complex networks, and high throughput web-based applications, are continuously generating large volume of event logs. These logs contain useful information to help system administrators to understand the system running status and to pinpoint the system failures. Generally, due to the scale and complexity of modern systems, the generated logs are beyond the analytic power of human beings. Therefore, it is imperative to develop a comprehensive log analysis system to support effective system management. Although a number of log mining techniques have been proposed to address specific log analysis use cases, few research and industrial efforts have been paid on providing integrated systems with an end-to-end solution to facilitate the log analysis routines. In this paper, we design and implement an integrated system, called FIU Log Analysis Platform (a.k.a. FLAP), that aims to facilitate the data analytics for system event logs. FLAP provides an end-to-end solution that utilizes advanced data mining techniques to assist log analysts to conveniently, timely, and accurately conduct event log knowledge discovery, system status investigation, and system failure diagnosis. Specifically, in FLAP, state-of-the-art template learning techniques are used to extract useful information from unstructured raw logs; advanced data transformation techniques are proposed and leveraged for event transformation and storage; effective event pattern mining, event summarization, event querying, and failure prediction techniques are designed and integrated for log analytics; and user-friendly interfaces are utilized to present the informative analysis results intuitively and vividly. Since 2016, FLAP has been used by Huawei Technologies Co. Ltd for internal event log analysis, and has provided effective support in its system operation and workflow optimization.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115603187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

Inferring the Strength of Social Ties: A Community-Driven Approach 推断社会关系的强度:社区驱动的方法

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098199

Polina Rozenshtein, Nikolaj Tatti, A. Gionis

{"title":"Inferring the Strength of Social Ties: A Community-Driven Approach","authors":"Polina Rozenshtein, Nikolaj Tatti, A. Gionis","doi":"10.1145/3097983.3098199","DOIUrl":"https://doi.org/10.1145/3097983.3098199","url":null,"abstract":"Online social networks are growing and becoming denser.The social connections of a given person may have very high variability: from close friends and relatives to acquaintances to people who hardly know. Inferring the strength of social ties is an important ingredient for modeling the interaction of users in a network and understanding their behavior. Furthermore, the problem has applications in computational social science, viral marketing, and people recommendation. In this paper we study the problem of inferring the strength of social ties in a given network. Our work is motivated by a recent approach by Sintos et. al [24], which leverages the Strong Triadic Closure} STC principle, a hypothesis rooted in social psychology. To guide our inference process, in addition to the network structure, we also consider as input a collection of tight communities. Those are sets of vertices that we expect to be connected via strong ties. Such communities appear in different situations, e.g., when being part of a community implies a strong connection to one of the existing members. We consider two related problem formalizations that reflect the assumptions of our setting: small number of STC violations and strong-tie connectivity in the input communities. We show that both problem formulations are NP-hard. We also show that one problem formulation is hard to approximate, while for the second we develop an algorithm with approximation guarantee. We validate the proposed method on real-world datasets by comparing with baselines that optimize STC violations and community connectivity separately.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114374181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Not All Passes Are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data 并非所有的传球都是平等的:从跟踪数据客观地衡量足球中传球的风险和回报

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098051

P. Power, Héctor Ruiz, Xinyu Wei, P. Lucey

引用次数: 82

A Practical Exploration System for Search Advertising 一个实用的搜索广告探索系统

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098041

P. Shah, Ming Yang, Sachidanand Alle, A. Ratnaparkhi, B. Shahshahani, Rohit Chandra

引用次数: 11

Multi-view Learning over Retinal Thickness and Visual Sensitivity on Glaucomatous Eyes 青光眼视网膜厚度与视敏的多视点学习

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098194

Toshimitsu Uesaka, K. Morino, Hiroki Sugiura, Taichi Kiwaki, Hiroshi Murata, R. Asaoka, K. Yamanishi

{"title":"Multi-view Learning over Retinal Thickness and Visual Sensitivity on Glaucomatous Eyes","authors":"Toshimitsu Uesaka, K. Morino, Hiroki Sugiura, Taichi Kiwaki, Hiroshi Murata, R. Asaoka, K. Yamanishi","doi":"10.1145/3097983.3098194","DOIUrl":"https://doi.org/10.1145/3097983.3098194","url":null,"abstract":"Dense measurements of visual-field, which is necessary to detect glaucoma, is known as very costly and labor intensive. Recently, measurement of retinal-thickness can be less costly than measurement of visual-field. Thus, it is sincerely desired that the retinal-thickness could be transformed into visual-sensitivity data somehow. In this paper, we propose two novel methods to estimate the sensitivity of the visual-field with SITA-Standard mode 10-2 resolution using retinal-thickness data measured with optical coherence tomography (OCT). The first method called Affine-Structured Non-negative Matrix Factorization (ASNMF) which is able to cope with both the estimation of visual-field and the discovery of deep glaucoma knowledge. While, the second is based on Convolutional Neural Networks (CNNs) which demonstrates very high estimation performance. These methods are kinds of multi-view learning methods because they utilize visual-field and retinal thickness data simultaneously. We experimentally tested the performance of our methods from several perspectives. We found that ASNMF worked better for relatively small data size while CNNs did for relatively large data size. In addition, some clinical knowledge are discovered via ASNMF. To the best of our knowledge, this is the first paper to address the dense estimation of the visual-field based on the retinal-thickness data.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122659621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Resolving the Bias in Electronic Medical Records 解决电子病历中的偏见

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098149

Kaiping Zheng, Jinyang Gao, K. Ngiam, B. Ooi, J. Yip

引用次数: 45

A Quasi-experimental Estimate of the Impact of P2P Transportation Platforms on Urban Consumer Patterns P2P交通平台对城市消费模式影响的准实验研究

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098058

Zhe Zhang, Beibei Li

引用次数: 15

Estimation of Recent Ancestral Origins of Individuals on a Large Scale 大规模个体近世祖先起源的估计

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI: 10.1145/3097983.3098042

Ross E. Curtis, A. Girshick

引用次数: 2