{"title":"Deep Reinforcement Learning with Applications in Transportation","authors":"Zhiwei Qin, Jian Tang, Jieping Ye","doi":"10.1145/3292500.3332299","DOIUrl":"https://doi.org/10.1145/3292500.3332299","url":null,"abstract":"This tutorial aims to provide the audience with a guided introduction to deep reinforcement learning (DRL) with specially curated application case studies in transportation. The tutorial covers both theory and practice, with more emphasis on the practical aspects of DRL that are pertinent to tackle transportation challenges. Some core examples include online ride order dispatching, fleet management, traffic signals control, route planning, and autonomous driving.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127588757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhige Li, Derek Yang, Li Zhao, Jiang Bian, Tao Qin, Tie-Yan Liu
{"title":"Individualized Indicator for All: Stock-wise Technical Indicator Optimization with Stock Embedding","authors":"Zhige Li, Derek Yang, Li Zhao, Jiang Bian, Tao Qin, Tie-Yan Liu","doi":"10.1145/3292500.3330833","DOIUrl":"https://doi.org/10.1145/3292500.3330833","url":null,"abstract":"As one of the most important investing approaches, technical analysis attempts to forecast stock movement by interpreting the inner rules from historic price and volume data. To address the vital noisy nature of financial market, generic technical analysis develops technical trading indicators, as mathematical summarization of historic price and volume data, to form up the foundation for robust and profitable investment strategies. However, an observation reveals that stocks with different properties have different affinities over technical indicators, which discloses a big challenge for the indicator-oriented stock selection and investment. To address this problem, in this paper, we design a Technical Trading Indicator Optimization(TTIO) framework that manages to optimize the original technical indicator by leveraging stock-wise properties. To obtain effective representations of stock properties, we propose a Skip-gram architecture to learn stock embedding inspired by a valuable knowledge repository formed by fund manager's collective investment behaviors. Based on the learned stock representations, TTIO further learns a re-scaling network to optimize the indicator's performance. Extensive experiments on real-world stock market data demonstrate that our method can obtain the very stock representations that are invaluable for technical indicator optimization since the optimized indicators can result in strong investing signals than original ones.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125393858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas Monath, M. Zaheer, Daniel Silva, A. McCallum, Amr Ahmed
{"title":"Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space","authors":"Nicholas Monath, M. Zaheer, Daniel Silva, A. McCallum, Amr Ahmed","doi":"10.1145/3292500.3330997","DOIUrl":"https://doi.org/10.1145/3292500.3330997","url":null,"abstract":"Hierarchical clustering is typically performed using algorithmic-based optimization searching over the discrete space of trees. While these optimization methods are often effective, their discreteness restricts them from many of the benefits of their continuous counterparts, such as scalable stochastic optimization and the joint optimization of multiple objectives or components of a model (e.g. end-to-end training). In this paper, we present an approach for hierarchical clustering that searches over continuous representations of trees in hyperbolic space by running gradient descent. We compactly represent uncertainty over tree structures with vectors in the Poincare ball. We show how the vectors can be optimized using an objective related to recently proposed cost functions for hierarchical clustering (Dasgupta, 2016; Wang and Wang, 2018). Using our method with a mini-batch stochastic gradient descent inference procedure, we are able to outperform prior work on clustering millions of ImageNet images by 15 points of dendrogram purity. Further, our continuous tree representation can be jointly optimized in multi-task learning applications offering a 9 point improvement over baseline methods.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125960383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PerDREP","authors":"Sanjoy Dey, Ping Zhang, D. Sow, Kenney Ng","doi":"10.1145/3292500.3330928","DOIUrl":"https://doi.org/10.1145/3292500.3330928","url":null,"abstract":"In contrast to the one-size-fits-all approach to medicine, precision medicine will allow targeted prescriptions based on the specific profile of the patient thereby avoiding adverse reactions and ineffective but expensive treatments. Longitudinal observational data such as Electronic Health Records (EHRs) have become an emerging data source for personalized medicine. In this paper, we propose a unified computational framework, called PerDREP, to predict the unique response patterns of each individual patient from EHR data. PerDREP models individual responses of each patient to the drug exposure by introducing a linear system to account for patients' heterogeneity, and incorporates a patient similarity graph as a network regularization. We formulate PerDREP as a convex optimization problem and develop an iterative gradient descent method to solve it. In the experiments, we identify the effect of drugs on Glycated hemoglobin test results. The experimental results provide evidence that the proposed method is not only more accurate than state-of-the-art methods, but is also able to automatically cluster patients into multiple coherent groups, thus paving the way for personalized medicine.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126103787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Community Detection on Large Complex Attribute Network","authors":"Chen Zhe, Aixin Sun, Xiaokui Xiao","doi":"10.1145/3292500.3330721","DOIUrl":"https://doi.org/10.1145/3292500.3330721","url":null,"abstract":"A large payment network contains millions of merchants and billions of transactions, and the merchants are described in a large number of attributes with incomplete values. Understanding its community structures is crucial to ensure its sustainable and long lasting. Knowing a merchant's community is also important from many applications - risk management, compliance, legal and marketing. To detect communities, an algorithm has to take advances from both attribute and topological information. Further, the method has to be able to handle incomplete and complex attributes. In this paper, we propose a framework named AGGMMR to effectively address the challenges come from scalability, mixed attributes, and incomplete value. We evaluate our proposed framework on four benchmark datasets against five strong baselines. More importantly, we provide a case study of running AGGMMR on a large network from PayPal which contains $100 million$ merchants with $1.5 billion$ transactions. The results demonstrate AGGMMR's effectiveness and practicability.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128092369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas Monath, Ari Kobren, A. Krishnamurthy, Michael R. Glass, A. McCallum
{"title":"Scalable Hierarchical Clustering with Tree Grafting","authors":"Nicholas Monath, Ari Kobren, A. Krishnamurthy, Michael R. Glass, A. McCallum","doi":"10.1145/3292500.3330929","DOIUrl":"https://doi.org/10.1145/3292500.3330929","url":null,"abstract":"We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the linkage function is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125923592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Class-Conditional GANs with Active Sampling","authors":"Ming-Kun Xie, Sheng-Jun Huang","doi":"10.1145/3292500.3330883","DOIUrl":"https://doi.org/10.1145/3292500.3330883","url":null,"abstract":"Class-conditional variants of Generative adversarial networks (GANs) have recently achieved a great success due to its ability of selectively generating samples for given classes, as well as improving generation quality. However, its training requires a large set of class-labeled data, which is often expensive and difficult to collect in practice. In this paper, we propose an active sampling method to reduce the labeling cost for effectively training the class-conditional GANs. On one hand, the most useful examples are selected for external human labeling to jointly reduce the difficulty of model learning and alleviate the missing of adversarial training; on the other hand, fake examples are actively sampled for internal model retraining to enhance the adversarial training between the discriminator and generator. By incorporating the two strategies into a unified framework, we provide a cost-effective approach to train class-conditional GANs, which achieves higher generation quality with less training examples. Experiments on multiple datasets, diverse GAN configurations and various metrics demonstrate the effectiveness of our approaches.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124258918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ari Biswas, T. T. Pham, Michael Vogelsong, Benjamin Snyder, Houssam Nassif
{"title":"Seeker","authors":"Ari Biswas, T. T. Pham, Michael Vogelsong, Benjamin Snyder, Houssam Nassif","doi":"10.1145/3292500.3330733","DOIUrl":"https://doi.org/10.1145/3292500.3330733","url":null,"abstract":"Seeker®, our interactive application security testing solution, gives you unparalleled visibility into your web app security posture and identifies vulnerability trends against compliance standards (e.g., OWASP Top 10, PCI DSS, GDPR, and CWE/SANS Top 25). Seeker enables security teams to identify and track sensitive data to ensure that it is handled securely and not stored in log files or databases with weak or no encryption. Seeker’s seamless integration into CI/CD workflows enables fast IAST security testing at DevOps speed.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123512740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhipeng Luo, Jianqiang Huang, Ke Hu, Xue Li, Peng Zhang
{"title":"AccuAir","authors":"Zhipeng Luo, Jianqiang Huang, Ke Hu, Xue Li, Peng Zhang","doi":"10.1145/3292500.3330787","DOIUrl":"https://doi.org/10.1145/3292500.3330787","url":null,"abstract":"Since air pollution seriously affects human heath and daily life, the air quality prediction has attracted increasing attention and become an active and important research topic. In this paper, we present AccuAir, our winning solution to the KDD Cup 2018 of Fresh Air, where the proposed solution has won the 1st place in two tracks, and the 2nd place in the other one. Our solution got the best accuracy on average in all the evaluation days. The task is to accurately predict the air quality (as indicated by the concentration of PM2.5, PM10 or O3) of the next 48 hours for each monitoring station in Beijing and London. Aiming at a cutting-edge solution, we first presents an analysis of the air quality data, identifying the fundamental challenges, such as the long-term but suddenly changing air quality, and complex spatial-temporal correlations in different stations. To address the challenges, we carefully design both global and local air quality features, and develop three prediction models including LightGBM, Gated-DNN and Seq2Seq, each with novel ingredients developed for better solving the problem. Specifically, a spatial-temporal gate is proposed in our Gated-DNN model, to effectively capture the spatial-temporal correlations as well as temporal relatedness, making the prediction more sensitive to spatial and temporal signals. In addition, the Seq2Seq model is adapted in such a way that the encoder summarizes useful historical features while the decoder concatenate weather forecast as input, which significantly improves prediction accuracy. Assembling all these components together, the ensemble of three models outperforms all competing methods in terms of the prediction accuracy of 31 days average, 10 days average and 24-48 hours.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123713527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"4 Perspectives in Human-Centered Machine Learning","authors":"Carlos Guestrin","doi":"10.1145/3292500.3340399","DOIUrl":"https://doi.org/10.1145/3292500.3340399","url":null,"abstract":"Machine learning (ML) has had a tremendous impact in across the world over the last decade. As we think about ML solving complex tasks, sometimes at super-human levels, it is easy to forget that there is no machine learning without humans in the loop. Humans define tasks and metrics, develop and program algorithms, collect and label data, debug and optimize systems, and are (usually) ultimately the users of the ML-based applications we are developing. In this talk, we will cover 4 human-centered perspectives in the ML development process, along with methods and systems, to empower humans to maximize the ultimate impact of their ML-based applications.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121935685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}