Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献_第10页

Effective and Real-time In-App Activity Analysis in Encrypted Internet Traffic Streams 加密互联网流量流中有效和实时的应用内活动分析

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3098049

Junming Liu, Yanjie Fu, Jingci Ming, Y. Ren, Leilei Sun, Hui Xiong

{"title":"Effective and Real-time In-App Activity Analysis in Encrypted Internet Traffic Streams","authors":"Junming Liu, Yanjie Fu, Jingci Ming, Y. Ren, Leilei Sun, Hui Xiong","doi":"10.1145/3097983.3098049","DOIUrl":"https://doi.org/10.1145/3097983.3098049","url":null,"abstract":"The mobile in-App service analysis, aiming at classifying mobile internet traffic into different types of service usages, has become a challenging and emergent task for mobile service providers due to the increasing adoption of secure protocols for in-App services. While some efforts have been made for the classification of mobile internet traffic, existing methods rely on complex feature construction and large storage cache, which lead to low processing speed, and thus not practical for online real-time scenarios. To this end, we develop an iterative analyzer for classifying encrypted mobile traffic in a real-time way. Specifically, we first select an optimal set of most discriminative features from raw features extracted from traffic packet sequences by a novel Maximizing Inner activity similarity and Minimizing Different activity similarity (MIMD) measurement. To develop the online analyzer, we first represent a traffic flow with a series of time windows, which are described by the optimal feature vector and are updated iteratively at the packet level. Instead of extracting feature elements from a series of raw traffic packets, our feature elements are updated when a new traffic packet is observed and the storage of raw traffic packets is not required. The time windows generated from the same service usage activity are grouped by our proposed method, namely, recursive time continuity constrained KMeans clustering (rCKC). The feature vectors of cluster centers are then fed into a random forest classifier to identify corresponding service usages. Finally, we provide extensive experiments on real-world Internet traffic data from Wechat, Whatsapp, and Facebook to demonstrate the effectiveness and efficiency of our approach. The results show that the proposed analyzer provides high accuracy in real-world scenarios, and has low storage cache requirement as well as fast processing speed.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116402638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

metapath2vec: Scalable Representation Learning for Heterogeneous Networks metapath2vec:异构网络的可扩展表示学习

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3098036

Yuxiao Dong, N. Chawla, A. Swami

引用次数: 1732

On Finding Socially Tenuous Groups for Online Social Networks 寻找在线社交网络的社会脆弱群体

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3097995

Chih-Ya Shen, Liang-Hao Huang, De-Nian Yang, Hong-Han Shuai, Wang-Chien Lee, Ming-Syan Chen

引用次数: 23

It Takes More than Math and Engineering to Hit the Bullseye with Data 用数据击中靶心需要的不仅仅是数学和工程

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3105816

P. Desai

{"title":"It Takes More than Math and Engineering to Hit the Bullseye with Data","authors":"P. Desai","doi":"10.1145/3097983.3105816","DOIUrl":"https://doi.org/10.1145/3097983.3105816","url":null,"abstract":"Adopting algorithmic decision-making in a large and complex enterprise such as a Fortune 50 retailer like Target takes much more than clean, reliable data and great data mining capabilities. Yet data practitioners too often start with advanced math and fancy algorithms, rather than working hand-in-hand with business partners to identify and understand the biggest business problems. (Then teams should move onto how algorithms can be applied to those problems.) Another key step for data scientists at large organizations: ensuring that their business partners -- the merchants, marketers and supply chain experts -- have a base-line understanding of advanced models as well as the proper analytical support tools. Obtaining widespread buy-in and enthusiasm also requires providing a user-friendly interface for business partners with optionality and flexibility that allows the intelligence to be applied to the many varied issues facing a modern retailer, from personalization to supply chain transformation to decisions on assortment and pricing. This talk will explore effective practices and processes -- the do's and don'ts -- for data scientists to succeed in large, complex organizations like a retailer with 1,800+ stores, major marketing campaigns across multiple channels and a fast growing online business.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123425061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Functional Annotation of Human Protein Coding Isoforms via Non-convex Multi-Instance Learning 基于非凸多实例学习的人类蛋白质编码异构体功能标注

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3097984

Tingjin Luo, Weizhong Zhang, Shang Qiu, Yang Yang, Dong-yun Yi, Guangtao Wang, Jieping Ye, Jie Wang

{"title":"Functional Annotation of Human Protein Coding Isoforms via Non-convex Multi-Instance Learning","authors":"Tingjin Luo, Weizhong Zhang, Shang Qiu, Yang Yang, Dong-yun Yi, Guangtao Wang, Jieping Ye, Jie Wang","doi":"10.1145/3097983.3097984","DOIUrl":"https://doi.org/10.1145/3097983.3097984","url":null,"abstract":"Functional annotation of human genes is fundamentally important for understanding the molecular basis of various genetic diseases. A major challenge in determining the functions of human genes lies in the functional diversity of proteins, that is, a gene can perform different functions as it may consist of multiple protein coding isoforms (PCIs). Therefore, differentiating functions of PCIs can significantly deepen our understanding of the functions of genes. However, due to the lack of isoform-level gold-standards (ground-truth annotation), many existing functional annotation approaches are developed at gene-level. In this paper, we propose a novel approach to differentiate the functions of PCIs by integrating sparse simplex projection---that is, a nonconvex sparsity-inducing regularizer---with the framework of multi-instance learning (MIL). Specifically, we label the genes that are annotated to the function under consideration as positive bags and the genes without the function as negative bags. Then, by sparse projections onto simplex, we learn a mapping that embeds the original bag space to a discriminative feature space. Our framework is flexible to incorporate various smooth and non-smooth loss functions such as logistic loss and hinge loss. To solve the resulting highly nontrivial non-convex and non-smooth optimization problem, we further develop an efficient block coordinate descent algorithm. Extensive experiments on human genome data demonstrate that the proposed approaches significantly outperform the state-of-the-art methods in terms of functional annotation accuracy of human PCIs and efficiency.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125563651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Learning from Multiple Teacher Networks 从多个教师网络中学习

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3098135

Shan You, Chang Xu, Chao Xu, D. Tao

{"title":"Learning from Multiple Teacher Networks","authors":"Shan You, Chang Xu, Chao Xu, D. Tao","doi":"10.1145/3097983.3098135","DOIUrl":"https://doi.org/10.1145/3097983.3098135","url":null,"abstract":"Training thin deep networks following the student-teacher learning paradigm has received intensive attention because of its excellent performance. However, to the best of our knowledge, most existing work mainly considers one single teacher network. In practice, a student may access multiple teachers, and multiple teacher networks together provide comprehensive guidance that is beneficial for training the student network. In this paper, we present a method to train a thin deep network by incorporating multiple teacher networks not only in output layer by averaging the softened outputs (dark knowledge) from different networks, but also in the intermediate layers by imposing a constraint about the dissimilarity among examples. We suggest that the relative dissimilarity between intermediate representations of different examples serves as a more flexible and appropriate guidance from teacher networks. Then triplets are utilized to encourage the consistence of these relative dissimilarity relationships between the student network and teacher networks. Moreover, we leverage a voting strategy to unify multiple relative dissimilarity information provided by multiple teacher networks, which realizes their incorporation in the intermediate layers. Extensive experimental results demonstrated that our method is capable of generating a well-performed student network, with the classification accuracy comparable or even superior to all teacher networks, yet having much fewer parameters and being much faster in running.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116722074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 266

Patient Subtyping via Time-Aware LSTM Networks 通过时间感知LSTM网络进行患者亚型分型

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3097997

Inci M. Baytas, Cao Xiao, Xi Sheryl Zhang, Fei Wang, Anil K. Jain, Jiayu Zhou

{"title":"Patient Subtyping via Time-Aware LSTM Networks","authors":"Inci M. Baytas, Cao Xiao, Xi Sheryl Zhang, Fei Wang, Anil K. Jain, Jiayu Zhou","doi":"10.1145/3097983.3097997","DOIUrl":"https://doi.org/10.1145/3097983.3097997","url":null,"abstract":"In the study of various diseases, heterogeneity among patients usually leads to different progression patterns and may require different types of therapeutic intervention. Therefore, it is important to study patient subtyping, which is grouping of patients into disease characterizing subtypes. Subtyping from complex patient data is challenging because of the information heterogeneity and temporal dynamics. Long-Short Term Memory (LSTM) has been successfully used in many domains for processing sequential data, and recently applied for analyzing longitudinal patient records. The LSTM units are designed to handle data with constant elapsed times between consecutive elements of a sequence. Given that time lapse between successive elements in patient records can vary from days to months, the design of traditional LSTM may lead to suboptimal performance. In this paper, we propose a novel LSTM unit called Time-Aware LSTM (T-LSTM) to handle irregular time intervals in longitudinal patient records. We learn a subspace decomposition of the cell memory which enables time decay to discount the memory content according to the elapsed time. We propose a patient subtyping model that leverages the proposed T-LSTM in an auto-encoder to learn a powerful single representation for sequential records of patients, which are then used to cluster patients into clinical subtypes. Experiments on synthetic and real world datasets show that the proposed T-LSTM architecture captures the underlying structures in the sequences with time irregularities.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"24 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114050270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 475

Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks 基于元图的异构信息网络推荐融合

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3098063

Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, Lee

{"title":"Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks","authors":"Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, Lee","doi":"10.1145/3097983.3098063","DOIUrl":"https://doi.org/10.1145/3097983.3098063","url":null,"abstract":"Heterogeneous Information Network (HIN) is a natural and general representation of data in modern large commercial recommender systems which involve heterogeneous types of data. HIN based recommenders face two problems: how to represent the high-level semantics of recommendations and how to fuse the heterogeneous information to make recommendations. In this paper, we solve the two problems by first introducing the concept of meta-graph to HIN-based recommendation, and then solving the information fusion problem with a \"matrix factorization (MF) + factorization machine (FM)\" approach. For the similarities generated by each meta-graph, we perform standard MF to generate latent features for both users and items. With different meta-graph based features, we propose to use FM with Group lasso (FMG) to automatically learn from the observed ratings to effectively select useful meta-graph based features. Experimental results on two real-world datasets, Amazon and Yelp, show the effectiveness of our approach compared to state-of-the-art FM and other HIN-based recommendation algorithms.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121843772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 436

Mining Big Data in NeuroGenetics to Understand Muscular Dystrophy 在神经遗传学中挖掘大数据来理解肌肉萎缩症

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3105813

A. Berglund

引用次数: 0

Groups-Keeping Solution Path Algorithm for Sparse Regression with Automatic Feature Grouping 具有自动特征分组的稀疏回归保群解路径算法

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI: 10.1145/3097983.3098010

Bin Gu, Guodong Liu, Heng Huang

{"title":"Groups-Keeping Solution Path Algorithm for Sparse Regression with Automatic Feature Grouping","authors":"Bin Gu, Guodong Liu, Heng Huang","doi":"10.1145/3097983.3098010","DOIUrl":"https://doi.org/10.1145/3097983.3098010","url":null,"abstract":"Feature selection is one of the most important data mining research topics with many applications. In practical problems, features often have group structure to effect the outcomes. Thus, it is crucial to automatically identify homogenous groups of features for high-dimensional data analysis. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is an important sparse regression approach with automatic feature grouping and selection by ℓ1 norm and pairwise ℓ∞ norm. However, due to over-complex representation of the penalty (especially the pairwise ℓ∞ norm), so far OSCAR has no solution path algorithm which is mostly useful for tuning the model. To address this challenge, in this paper, we propose a groups-keeping solution path algorithm to solve the OSCAR model (OscarGKPath). Given a set of homogenous groups of features and an accuracy bound ε, OscarGKPath can fit the solutions in an interval of regularization parameters while keeping the feature groups. The entire solution path can be obtained by combining multiple such intervals. We prove that all solutions in the solution path produced by OscarGKPath can strictly satisfy the given accuracy bound ε. The experimental results on benchmark datasets not only confirm the effectiveness of our OscarGKPath algorithm, but also show the superiority of our OscarGKPath in cross validation compared with the existing batch algorithm.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115849459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10