Wubai Zhou, Wei Xue, Ramesh Baral, Qing Wang, Chunqiu Zeng, Tao Li, Jian Xu, Zheng Liu, L. Shwartz, G. Grabarnik
{"title":"STAR: A System for Ticket Analysis and Resolution","authors":"Wubai Zhou, Wei Xue, Ramesh Baral, Qing Wang, Chunqiu Zeng, Tao Li, Jian Xu, Zheng Liu, L. Shwartz, G. Grabarnik","doi":"10.1145/3097983.3098190","DOIUrl":"https://doi.org/10.1145/3097983.3098190","url":null,"abstract":"In large scale and complex IT service environments, a problematic incident is logged as a ticket and contains the ticket summary (system status and problem description). The system administrators log the step-wise resolution description when such tickets are resolved. The repeating service events are most likely resolved by inferring similar historical tickets. With the availability of reasonably large ticket datasets, we can have an automated system to recommend the best matching resolution for a given ticket summary. In this paper, we first identify the challenges in real-world ticket analysis and develop an integrated framework to efficiently handle those challenges. The framework first quantifies the quality of ticket resolutions using a regression model built on carefully designed features. The tickets, along with their quality scores obtained from the resolution quality quantification, are then used to train a deep neural network ranking model that outputs the matching scores of ticket summary and resolution pairs. This ranking model allows us to leverage the resolution quality in historical tickets when recommending resolutions for an incoming incident ticket. In addition, the feature vectors derived from the deep neural ranking model can be effectively used in other ticket analysis tasks, such as ticket classification and clustering. The proposed framework is extensively evaluated with a large real-world dataset.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115320971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, M. Ispir, Vihan Jain, L. Koc, C. Koo, Lukasz Lew, Clemens Mewald, A. Modi, N. Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, M. Wicke, Jarek Wilkiewicz, Xin Zhang, Martin A. Zinkevich
{"title":"TFX: A TensorFlow-Based Production-Scale Machine Learning Platform","authors":"Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, M. Ispir, Vihan Jain, L. Koc, C. Koo, Lukasz Lew, Clemens Mewald, A. Modi, N. Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, M. Wicke, Jarek Wilkiewicz, Xin Zhang, Martin A. Zinkevich","doi":"10.1145/3097983.3098021","DOIUrl":"https://doi.org/10.1145/3097983.3098021","url":null,"abstract":"Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components---a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt. We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions. We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134459681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingyu Zhang, Tao Hu, Yue Min, Guobin Wu, Junying Zhang, Pengcheng Feng, Pinghua Gong, Jieping Ye
{"title":"A Taxi Order Dispatch Model based On Combinatorial Optimization","authors":"Lingyu Zhang, Tao Hu, Yue Min, Guobin Wu, Junying Zhang, Pengcheng Feng, Pinghua Gong, Jieping Ye","doi":"10.1145/3097983.3098138","DOIUrl":"https://doi.org/10.1145/3097983.3098138","url":null,"abstract":"Taxi-booking apps have been very popular all over the world as they provide convenience such as fast response time to the users. The key component of a taxi-booking app is the dispatch system which aims to provide optimal matches between drivers and riders. Traditional dispatch systems sequentially dispatch taxis to riders and aim to maximize the driver acceptance rate for each individual order. However, the traditional systems may lead to a low global success rate, which degrades the rider experience when using the app. In this paper, we propose a novel system that attempts to optimally dispatch taxis to serve multiple bookings. The proposed system aims to maximize the global success rate, thus it optimizes the overall travel efficiency, leading to enhanced user experience. To further enhance users' experience, we also propose a method to predict destinations of a user once the taxi-booking APP is started. The proposed method employs the Bayesian framework to model the distribution of a user's destination based on his/her travel histories. We use rigorous A/B tests to compare our new taxi dispatch method with state-of-the-art models using data collected in Beijing. Experimental results show that the proposed method is significantly better than other state-of-the art models in terms of global success rate (increased from 80% to 84%). Moreover, we have also achieved significant improvement on other metrics such as user's waiting-time and pick-up distance. For our destination prediction algorithm, we show that our proposed model is superior to the baseline model by improving the top-3 accuracy from 89% to 93%. The proposed taxi dispatch and destination prediction algorithms are both deployed in our online systems and serve tens of millions of users everyday.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133104970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inferring the Strength of Social Ties: A Community-Driven Approach","authors":"Polina Rozenshtein, Nikolaj Tatti, A. Gionis","doi":"10.1145/3097983.3098199","DOIUrl":"https://doi.org/10.1145/3097983.3098199","url":null,"abstract":"Online social networks are growing and becoming denser.The social connections of a given person may have very high variability: from close friends and relatives to acquaintances to people who hardly know. Inferring the strength of social ties is an important ingredient for modeling the interaction of users in a network and understanding their behavior. Furthermore, the problem has applications in computational social science, viral marketing, and people recommendation. In this paper we study the problem of inferring the strength of social ties in a given network. Our work is motivated by a recent approach by Sintos et. al [24], which leverages the Strong Triadic Closure} STC principle, a hypothesis rooted in social psychology. To guide our inference process, in addition to the network structure, we also consider as input a collection of tight communities. Those are sets of vertices that we expect to be connected via strong ties. Such communities appear in different situations, e.g., when being part of a community implies a strong connection to one of the existing members. We consider two related problem formalizations that reflect the assumptions of our setting: small number of STC violations and strong-tie connectivity in the input communities. We show that both problem formulations are NP-hard. We also show that one problem formulation is hard to approximate, while for the second we develop an algorithm with approximation guarantee. We validate the proposed method on real-world datasets by comparing with baselines that optimize STC violations and community connectivity separately.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114374181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Shah, Ming Yang, Sachidanand Alle, A. Ratnaparkhi, B. Shahshahani, Rohit Chandra
{"title":"A Practical Exploration System for Search Advertising","authors":"P. Shah, Ming Yang, Sachidanand Alle, A. Ratnaparkhi, B. Shahshahani, Rohit Chandra","doi":"10.1145/3097983.3098041","DOIUrl":"https://doi.org/10.1145/3097983.3098041","url":null,"abstract":"In this paper, we describe an exploration system that was implemented by the search-advertising team of a prominent web-portal to address the cold ads problem. The cold ads problem refers to the situation where, when new ads are injected into the system by advertisers, the system is unable to assign an accurate quality to the ad (in our case, the click probability). As a consequence, the advertiser may suffer from low impression volumes for these cold ads, and the overall system may perform sub-optimally if the click probabilities for new ads are not learnt rapidly. We designed a new exploration system that was adapted to search advertising and the serving constraints of the system. In this paper, we define the problem, discuss the design details of the exploration system, new evaluation criteria, and present the performance metrics that were observed by us.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124846929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Toshimitsu Uesaka, K. Morino, Hiroki Sugiura, Taichi Kiwaki, Hiroshi Murata, R. Asaoka, K. Yamanishi
{"title":"Multi-view Learning over Retinal Thickness and Visual Sensitivity on Glaucomatous Eyes","authors":"Toshimitsu Uesaka, K. Morino, Hiroki Sugiura, Taichi Kiwaki, Hiroshi Murata, R. Asaoka, K. Yamanishi","doi":"10.1145/3097983.3098194","DOIUrl":"https://doi.org/10.1145/3097983.3098194","url":null,"abstract":"Dense measurements of visual-field, which is necessary to detect glaucoma, is known as very costly and labor intensive. Recently, measurement of retinal-thickness can be less costly than measurement of visual-field. Thus, it is sincerely desired that the retinal-thickness could be transformed into visual-sensitivity data somehow. In this paper, we propose two novel methods to estimate the sensitivity of the visual-field with SITA-Standard mode 10-2 resolution using retinal-thickness data measured with optical coherence tomography (OCT). The first method called Affine-Structured Non-negative Matrix Factorization (ASNMF) which is able to cope with both the estimation of visual-field and the discovery of deep glaucoma knowledge. While, the second is based on Convolutional Neural Networks (CNNs) which demonstrates very high estimation performance. These methods are kinds of multi-view learning methods because they utilize visual-field and retinal thickness data simultaneously. We experimentally tested the performance of our methods from several perspectives. We found that ASNMF worked better for relatively small data size while CNNs did for relatively large data size. In addition, some clinical knowledge are discovered via ASNMF. To the best of our knowledge, this is the first paper to address the dense estimation of the visual-field based on the retinal-thickness data.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122659621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of Recent Ancestral Origins of Individuals on a Large Scale","authors":"Ross E. Curtis, A. Girshick","doi":"10.1145/3097983.3098042","DOIUrl":"https://doi.org/10.1145/3097983.3098042","url":null,"abstract":"The last ten years have seen an exponential growth of direct-to-consumer genomics. One popular feature of these tests is the report of a distant ancestral inference profile-a breakdown of the regions of the world where the test-taker's ancestors may have lived. While current methods and products generally focus on the more distant past (e.g., thousands of years ago), we have recently demonstrated that by leveraging network analysis tools such as community detection, more recent ancestry can be identified. However, using a network analysis tool like community detection on a large network with potentially millions of nodes is not feasible in a live production environment where hundreds or thousands of new genotypes are processed every day. In this study, we describe a classification method that leverages network features to assign individuals to communities in a large network corresponding to recent ancestry. We recently launched a beta version of this research as a new product feature at AncestryDNA.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130898787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Quasi-experimental Estimate of the Impact of P2P Transportation Platforms on Urban Consumer Patterns","authors":"Zhe Zhang, Beibei Li","doi":"10.1145/3097983.3098058","DOIUrl":"https://doi.org/10.1145/3097983.3098058","url":null,"abstract":"With the pervasiveness of mobile technology and location-based computing, new forms of smart urban transportation, such as Uber & Lyft, have become increasingly popular. These new forms of urban infrastructure can influence individuals' movement frictions and patterns, in turn influencing local consumption patterns and the economic performance of local businesses. To gain insights about future impact of urban transportation changes, in this paper, we utilize a novel dataset and econometric analysis methods to present a quasi-experimental examination of how the emerging growth of peer-to-peer car sharing services may have affected local consumer mobility and consumption patterns.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129490546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaiping Zheng, Jinyang Gao, K. Ngiam, B. Ooi, J. Yip
{"title":"Resolving the Bias in Electronic Medical Records","authors":"Kaiping Zheng, Jinyang Gao, K. Ngiam, B. Ooi, J. Yip","doi":"10.1145/3097983.3098149","DOIUrl":"https://doi.org/10.1145/3097983.3098149","url":null,"abstract":"Electronic Medical Records (EMR) are the most fundamental resources used in healthcare data analytics. Since people visit hospital more frequently when they feel sick and doctors prescribe lab examinations when they feel necessary, we argue that there could be a strong bias in EMR observations compared with the hidden conditions of patients. Directly using such EMR for analytical tasks without considering the bias may lead to misinterpretation. To this end, we propose a general method to resolve the bias by transforming EMR to regular patient hidden condition series using a Hidden Markov Model (HMM) variant. Compared with the biased EMR series with irregular time stamps, the unbiased regular time series is much easier to be processed by most analytical models and yields better results. Extensive experimental results demonstrate that our bias resolving method imputes missing data more accurately than baselines and improves the performance of the state-of-the-art methods on typical medical data analytics.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129675971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Not All Passes Are Created Equal: Objectively Measuring the Risk and Reward of Passes in Soccer from Tracking Data","authors":"P. Power, Héctor Ruiz, Xinyu Wei, P. Lucey","doi":"10.1145/3097983.3098051","DOIUrl":"https://doi.org/10.1145/3097983.3098051","url":null,"abstract":"In soccer, the most frequent event that occurs is a pass. For a trained eye, there are a myriad of adjectives which could describe this event (e.g., \"majestic pass\", \"conservative\" to \"poor-ball\"). However, as these events are needed to be coded live and in real-time (most often by human annotators), the current method of grading passes is restricted to the binary labels 0 (unsuccessful) or 1 (successful). Obviously, this is sub-optimal because the quality of a pass needs to be measured on a continuous spectrum (i.e., 0 to 100%) and not a binary value. Additionally, a pass can be measured across multiple dimensions, namely: i) risk -- the likelihood of executing a pass in a given situation, and ii) reward -- the likelihood of a pass creating a chance. In this paper, we show how we estimate both the risk and reward of a pass across two seasons of tracking data captured from a recent professional soccer league with state-of-the-art performance, then showcase various use cases of our deployed passing system.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124830588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}