{"title":"TUBE","authors":"Daheng Wang, Tianwen Jiang, N. Chawla, Meng Jiang","doi":"10.1145/3292500.3330867","DOIUrl":"https://doi.org/10.1145/3292500.3330867","url":null,"abstract":"identification of presymptomatic NF2 mutation carriers by DNA diagnosis permits improved genetic counselling and clinical management in at-risk subjects. The early detection of VS by gadolinium-enhanced","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132293455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, S. Saito, Shuji Suzuki, Kota Uenishi, Brian K. Vogel, Hiroyuki Yamazaki Vincent
{"title":"Chainer: A Deep Learning Framework for Accelerating the Research Cycle","authors":"Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, S. Saito, Shuji Suzuki, Kota Uenishi, Brian K. Vogel, Hiroyuki Yamazaki Vincent","doi":"10.1145/3292500.3330756","DOIUrl":"https://doi.org/10.1145/3292500.3330756","url":null,"abstract":"Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through Define-by-Run, and also provides add-on packages for state-of-the-art computer vision models as well as distributed training.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115852391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mateusz Fedoryszak, Brent Frederick, V. Rajaram, Changtao Zhong
{"title":"Real-time Event Detection on Social Data Streams","authors":"Mateusz Fedoryszak, Brent Frederick, V. Rajaram, Changtao Zhong","doi":"10.1145/3292500.3330689","DOIUrl":"https://doi.org/10.1145/3292500.3330689","url":null,"abstract":"Social networks are quickly becoming the primary medium for discussing what is happening around real-world events. The information that is generated on social platforms like Twitter can produce rich data streams for immediate insights into ongoing matters and the conversations around them. To tackle the problem of event detection, we model events as a list of clusters of trending entities over time. We describe a real-time system for discovering events that is modular in design and novel in scale and speed: it applies clustering on a large stream with millions of entities per minute and produces a dynamically updated set of events. In order to assess clustering methodologies, we build an evaluation dataset derived from a snapshot of the full Twitter Firehose and propose novel metrics for measuring clustering quality. Through experiments and system profiling, we highlight key results from the offline and online pipelines. Finally, we visualize a high profile event on Twitter to show the importance of modeling the evolution of events, especially those detected from social data streams.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125280830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Aharon, O. Somekh, Avi Shahar, Assaf Singer, Baruch Trayvas, Hadas Vogel, Dobrislav Dobrev
{"title":"Carousel Ads Optimization in Yahoo Gemini Native","authors":"M. Aharon, O. Somekh, Avi Shahar, Assaf Singer, Baruch Trayvas, Hadas Vogel, Dobrislav Dobrev","doi":"10.1145/3292500.3330740","DOIUrl":"https://doi.org/10.1145/3292500.3330740","url":null,"abstract":"Yahoo's native advertising (also known as Gemini native) serves billions of ad impressions daily, reaching a yearly run-rate of many hundred of millions USD. Driving Gemini native models for predicting both click probability (pCTR) and conversion probability (pCONV) is OFFSET - a feature enhanced collaborative-filtering (CF) based event prediction algorithm. The predicted pCTRs are then used in Gemini native auctions to determine which ads to present for each serving event. A fast growing segment of Gemini native is Carousel ads that include several cards (or assets) which are used to populate several slots within the ad. Since Carousel ad slots are not symmetrical and some are more conspicuous than others, it is beneficial to render assets to slots in a way that maximizes revenue. In this work we present a post-auction successive elimination based approach for ranking assets according to their click trough rate (CTR) and render the carousel accordingly, placing higher CTR assets in more conspicuous slots. After a successful online bucket showing 8.6% CTR and 4.3% CPM (or revenue) lifts over a control bucket that uses predefined advertisers assets-to-slots mapping, the carousel asset optimization (CAO) system was pushed to production and is serving all Gemini native traffic since. A few months after CAO deployment, we have already measured an almost 40% increase in carousel ads revenue. Moreover, the entire revenue growth is related to CAO traffic increase due to additional advertiser demand, which demonstrates a high advertisers' satisfaction of the product.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132222660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active Deep Learning for Activity Recognition with Context Aware Annotator Selection","authors":"H. S. Hossain, Nirmalya Roy","doi":"10.1145/3292500.3330688","DOIUrl":"https://doi.org/10.1145/3292500.3330688","url":null,"abstract":"Machine learning models are bounded by the credibility of ground truth data used for both training and testing. Regardless of the problem domain, this ground truth annotation is objectively manual and tedious as it needs considerable amount of human intervention. With the advent of Active Learning with multiple annotators, the burden can be somewhat mitigated by actively acquiring labels of most informative data instances. However, multiple annotators with varying degrees of expertise poses new set of challenges in terms of quality of the label received and availability of the annotator. Due to limited amount of ground truth information addressing the variabilities of Activity of Daily Living (ADLs), activity recognition models using wearable and mobile devices are still not robust enough for real-world deployment. In this paper, we first propose an active learning combined deep model which updates its network parameters based on the optimization of a joint loss function. We then propose a novel annotator selection model by exploiting the relationships among the users while considering their heterogeneity with respect to their expertise, physical and spatial context. Our proposed model leverages model-free deep reinforcement learning in a partially observable environment setting to capture the action-reward interaction among multiple annotators. Our experiments in real-world settings exhibit that our active deep model converges to optimal accuracy with fewer labeled instances and achieves ~8% improvement in accuracy in fewer iterations.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133040318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Earth Observations from a New Generation of Geostationary Satellites","authors":"R. Nemani","doi":"10.1145/3292500.3340413","DOIUrl":"https://doi.org/10.1145/3292500.3340413","url":null,"abstract":"The latest generation of geostationary satellites carry sensors such as the Advanced Baseline Imager (GOES-16/17) and the Advanced Himawari Imager (Himawari-8/9) that closely mimic the spatial and spectral characteristics of widely used polar orbiting sensors such as EOS/MODIS. More importantly, they provide observations at 1-5-15 minute intervals, instead of twice a day from MODIS, offering unprecedented opportunities for monitoring large parts of the Earth. In addition to serving the needs of weather forecasting, these observations offer new and exciting opportunities in managing solar power, fighting wildfires, and tracking air pollution. Creation of actionable information in near realtime from these data streams is a challenge that is best addressed through collaborative efforts among the industry, academia and government agencies.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133211958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequential Anomaly Detection using Inverse Reinforcement Learning","authors":"Min-hwan Oh, G. Iyengar","doi":"10.1145/3292500.3330932","DOIUrl":"https://doi.org/10.1145/3292500.3330932","url":null,"abstract":"One of the most interesting application scenarios in anomaly detection is when sequential data are targeted. For example, in a safety-critical environment, it is crucial to have an automatic detection system to screen the streaming data gathered by monitoring sensors and to report abnormal observations if detected in real-time. Oftentimes, stakes are much higher when these potential anomalies are intentional or goal-oriented. We propose an end-to-end framework for sequential anomaly detection using inverse reinforcement learning (IRL), whose objective is to determine the decision-making agent's underlying function which triggers his/her behavior. The proposed method takes the sequence of actions of a target agent (and possibly other meta information) as input. The agent's normal behavior is then understood by the reward function which is inferred via IRL. We use a neural network to represent a reward function. Using a learned reward function, we evaluate whether a new observation from the target agent follows a normal pattern. In order to construct a reliable anomaly detection method and take into consideration the confidence of the predicted anomaly score, we adopt a Bayesian approach for IRL. The empirical study on publicly available real-world data shows that our proposed method is effective in identifying anomalies.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122070924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optuna: A Next-generation Hyperparameter Optimization Framework","authors":"Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama","doi":"10.1145/3292500.3330701","DOIUrl":"https://doi.org/10.1145/3292500.3330701","url":null,"abstract":"The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to light-weight experiment conducted via interactive interface. In order to prove our point, we will introduce Optuna, an optimization software which is a culmination of our effort in the development of a next generation optimization software. As an optimization software designed with define-by-run principle, Optuna is particularly the first of its kind. We will present the design-techniques that became necessary in the development of the software that meets the above criteria, and demonstrate the power of our new design through experimental results and real world applications. Our software is available under the MIT license (https://github.com/pfnet/optuna/).","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114933029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PinText: A Multitask Text Embedding System in Pinterest","authors":"Jinfeng Zhuang, Yu Liu","doi":"10.1145/3292500.3330671","DOIUrl":"https://doi.org/10.1145/3292500.3330671","url":null,"abstract":"Text embedding is a fundamental component for extracting text features in production-level data mining and machine learning systems given textual information is the most ubiqutious signals. However, practitioners often face the tradeoff between effectiveness of underlying embedding algorithms and cost of training and maintaining various embedding results in large-scale applications. In this paper, we propose a multitask text embedding solution called PinText for three major vertical surfaces including homefeed, related pins, and search in Pinterest, which consolidates existing text embedding algorithms into a single solution and produces state-of-the-art performance. Specifically, we learn word level semantic vectors by enforcing that the similarity between positive engagement pairs is larger than the similarity between a randomly sampled background pairs. Based on the learned semantic vectors, we derive embedding vector of a user, a pin, or a search query by simply averaging its word level vectors. In this common compact vector space, we are able to do unified nearest neighbor search with hashing by Hadoop jobs or dockerized images on Kubernetes cluster. Both offline evaluation and online experiments show effectiveness of this PinText system and save storage cost of multiple open-sourced embeddings significantly.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"665 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116100463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoli Tang, Tengyun Wang, Haizhi Yang, Hengjie Song
{"title":"AKUPM","authors":"Xiaoli Tang, Tengyun Wang, Haizhi Yang, Hengjie Song","doi":"10.1145/3292500.3330705","DOIUrl":"https://doi.org/10.1145/3292500.3330705","url":null,"abstract":"Recently, much attention has been paid to the usage of knowledge graph within the context of recommender systems to alleviate the data sparsity and cold-start problems. However, when incorporating entities from a knowledge graph to represent users, most existing works are unaware of the relationships between these entities and users. As a result, the recommendation results may suffer a lot from some unrelated entities. In this paper, we investigate how to explore these relationships which are essentially determined by the interactions among entities. Firstly, we categorize the interactions among entities into two types: inter-entity-interaction and intra-entity-interaction. Inter-entity-interaction is the interactions among entities that affect their importances to represent users. And intra-entity-interaction is the interactions within an entity that describe the different characteristics of this entity when involved in different relations. Then, considering these two types of interactions, we propose a novel model named Attention-enhanced Knowledge-aware User Preference Model (AKUPM) for click-through rate (CTR) prediction. More specifically, a self-attention network is utilized to capture the inter-entity-interaction by learning appropriate importance of each entity w.r.t the user. Moreover, the intra-entity-interaction is modeled by projecting each entity into its connected relation spaces to obtain the suitable characteristics. By doing so, AKUPM is able to figure out the most related part of incorporated entities (i.e., filter out the unrelated entities). Extensive experiments on two real-world public datasets demonstrate that AKUPM achieves substantial gains in terms of common evaluation metrics (e.g., AUC, ACC and Recall@top-K) over several state-of-the-art baselines.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115580271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}