Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献

筛选
英文 中文
Multi-Modality Disease Modeling via Collective Deep Matrix Factorization 基于集体深度矩阵分解的多模态疾病建模
Qi Wang, Mengying Sun, L. Zhan, P. Thompson, Shuiwang Ji, Jiayu Zhou
{"title":"Multi-Modality Disease Modeling via Collective Deep Matrix Factorization","authors":"Qi Wang, Mengying Sun, L. Zhan, P. Thompson, Shuiwang Ji, Jiayu Zhou","doi":"10.1145/3097983.3098164","DOIUrl":"https://doi.org/10.1145/3097983.3098164","url":null,"abstract":"Alzheimer's disease (AD), one of the most common causes of dementia, is a severe irreversible neurodegenerative disease that results in loss of mental functions. The transitional stage between the expected cognitive decline of normal aging and AD, mild cognitive impairment (MCI), has been widely regarded as a suitable time for possible therapeutic intervention. The challenging task of MCI detection is therefore of great clinical importance, where the key is to effectively fuse predictive information from multiple heterogeneous data sources collected from the patients. In this paper, we propose a framework to fuse multiple data modalities for predictive modeling using deep matrix factorization, which explores the non-linear interactions among the modalities and exploits such interactions to transfer knowledge and enable high performance prediction. Specifically, the proposed collective deep matrix factorization decomposes all modalities simultaneously to capture non-linear structures of the modalities in a supervised manner, and learns a modality specific component for each modality and a modality invariant component across all modalities. The modality invariant component serves as a compact feature representation of patients that has high predictive power. The modality specific components provide an effective means to explore imaging genetics, yielding insights into how imaging and genotype interact with each other non-linearly in the AD pathology. Extensive empirical studies using various data modalities provided by Alzheimer's Disease Neuroimaging Initiative (ADNI) demonstrate the effectiveness of the proposed method for fusing heterogeneous modalities.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115978355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Embedding-based News Recommendation for Millions of Users 面向数百万用户的嵌入式新闻推荐
Shumpei Okura, Yukihiro Tagami, Shingo Ono, Akira Tajima
{"title":"Embedding-based News Recommendation for Millions of Users","authors":"Shumpei Okura, Yukihiro Tagami, Shingo Ono, Akira Tajima","doi":"10.1145/3097983.3098108","DOIUrl":"https://doi.org/10.1145/3097983.3098108","url":null,"abstract":"It is necessary to understand the content of articles and user preferences to make effective news recommendations. While ID-based methods, such as collaborative filtering and low-rank factorization, are well known for making recommendations, they are not suitable for news recommendations because candidate articles expire quickly and are replaced with new ones within short spans of time. Word-based methods, which are often used in information retrieval settings, are good candidates in terms of system performance but have issues such as their ability to cope with synonyms and orthographical variants and define \"queries\" from users' historical activities. This paper proposes an embedding-based method to use distributed representations in a three step end-to-end manner: (i) start with distributed representations of articles based on a variant of a denoising autoencoder, (ii) generate user representations by using a recurrent neural network (RNN) with browsing histories as input sequences, and (iii) match and list articles for users based on inner-product operations by taking system performance into consideration. The proposed method performed well in an experimental offline evaluation using past access data on Yahoo! JAPAN's homepage. We implemented it on our actual news distribution system based on these experimental results and compared its online performance with a method that was conventionally incorporated into the system. As a result, the click-through rate (CTR) improved by 23% and the total duration improved by 10%, compared with the conventionally incorporated method. Services that incorporated the method we propose are already open to all users and provide recommendations to over ten million individual users per day who make billions of accesses per month.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127529618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 370
Scalable Top-n Local Outlier Detection 可扩展的Top-n局部离群点检测
Yizhou Yan, Lei Cao, Elke A. Rundensteiner
{"title":"Scalable Top-n Local Outlier Detection","authors":"Yizhou Yan, Lei Cao, Elke A. Rundensteiner","doi":"10.1145/3097983.3098191","DOIUrl":"https://doi.org/10.1145/3097983.3098191","url":null,"abstract":"Local Outlier Factor (LOF) method that labels all points with their respective LOF scores to indicate their status is known to be very effective for identifying outliers in datasets with a skewed distribution. Since outliers by definition are the absolute minority in a dataset, the concept of Top-N local outlier was proposed to discover the n points with the largest LOF scores. The detection of the Top-N local outliers is prohibitively expensive, since it requires huge number of high complexity k-nearest neighbor (kNN) searches. In this work, we present the first scalable Top-N local outlier detection approach called TOLF. The key innovation of TOLF is a multi-granularity pruning strategy that quickly prunes most points from the set of potential outlier candidates without computing their exact LOF scores or even without conducting any kNN search for them. Our customized density-aware indexing structure not only effectively supports the pruning strategy, but also accelerates the $k$NN search. Our extensive experimental evaluation on OpenStreetMap, SDSS, and TIGER datasets demonstrates the effectiveness of TOLF - up to 35 times faster than the state-of-the-art methods.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127556078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Functional Zone Based Hierarchical Demand Prediction For Bike System Expansion 基于功能区的自行车系统扩展分层需求预测
Junming Liu, Leilei Sun, Qiao Li, Jingci Ming, Yanchi Liu, Hui Xiong
{"title":"Functional Zone Based Hierarchical Demand Prediction For Bike System Expansion","authors":"Junming Liu, Leilei Sun, Qiao Li, Jingci Ming, Yanchi Liu, Hui Xiong","doi":"10.1145/3097983.3098180","DOIUrl":"https://doi.org/10.1145/3097983.3098180","url":null,"abstract":"Bike sharing systems, aiming at providing the missing links in public transportation systems, are becoming popular in urban cities. Many providers of bike sharing systems are ready to expand their bike stations from the existing service area to surrounding regions. A key to success for a bike sharing systems expansion is the bike demand prediction for expansion areas. There are two major challenges in this demand prediction problem: First. the bike transition records are not available for the expansion area and second. station level bike demand have big variances across the urban city. Previous research efforts mainly focus on discovering global features, assuming the station bike demands react equally to the global features, which brings large prediction error when the urban area is large and highly diversified. To address these challenges, in this paper, we develop a hierarchical station bike demand predictor which analyzes bike demands from functional zone level to station level. Specifically, we first divide the studied bike stations into functional zones by a novel Bi-clustering algorithm which is designed to cluster bike stations with similar POI characteristics and close geographical distances together. Then, the hourly bike check-ins and check-outs of functional zones are predicted by integrating three influential factors: distance preference, zone-to-zone preference, and zone characteristics. The station demand is estimated by studying the demand distributions among the stations within the same functional zone. Finally, the extensive experimental results on the NYC Citi Bike system with two expansion stages show the advantages of our approach on station demand and balance prediction for bike sharing system expansions.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126826088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Formative Essay Feedback Using Predictive Scoring Models 使用预测评分模型的形成性论文反馈
Bronwyn Woods, David Adamson, Shayne Miel, Elijah Mayfield
{"title":"Formative Essay Feedback Using Predictive Scoring Models","authors":"Bronwyn Woods, David Adamson, Shayne Miel, Elijah Mayfield","doi":"10.1145/3097983.3098160","DOIUrl":"https://doi.org/10.1145/3097983.3098160","url":null,"abstract":"A major component of secondary education is learning to write effectively, a skill which is bolstered by repeated practice with formative guidance. However, providing focused feedback to every student on multiple drafts of each essay throughout the school year is a challenge for even the most dedicated of teachers. This paper first establishes a new ordinal essay scoring model and its state of the art performance compared to recent results in the Automated Essay Scoring field. Extending this model, we describe a method for using prediction on realistic essay variants to give rubric-specific formative feedback to writers. This method is used in Revision Assistant, a deployed data-driven educational product that provides immediate, rubric-specific, sentence-level feedback to students to supplement teacher guidance. We present initial evaluations of this feedback generation, both offline and in deployment.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126293404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Google Vizier: A Service for Black-Box Optimization Google Vizier:一项黑盒优化服务
D. Golovin, Benjamin Solnik, Subhodeep Moitra, G. Kochanski, J. Karro, D. Sculley
{"title":"Google Vizier: A Service for Black-Box Optimization","authors":"D. Golovin, Benjamin Solnik, Subhodeep Moitra, G. Kochanski, J. Karro, D. Sculley","doi":"10.1145/3097983.3098043","DOIUrl":"https://doi.org/10.1145/3097983.3098043","url":null,"abstract":"Any sufficiently complex system acts as a black box when it becomes easier to experiment with than to understand. Hence, black-box optimization has become increasingly important as systems have become more complex. In this paper we describe Google Vizier, a Google-internal service for performing black-box optimization that has become the de facto parameter tuning engine at Google. Google Vizier is used to optimize many of our machine learning models and other systems, and also provides core capabilities to Google's Cloud Machine Learning HyperTune subsystem. We discuss our requirements, infrastructure design, underlying algorithms, and advanced features such as transfer learning and automated early stopping that the service provides.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120822566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 663
Aspect Based Recommendations: Recommending Items with the Most Valuable Aspects Based on User Reviews 基于方面的推荐:基于用户评论推荐具有最有价值方面的项目
Konstantin Bauman, B. Liu, A. Tuzhilin
{"title":"Aspect Based Recommendations: Recommending Items with the Most Valuable Aspects Based on User Reviews","authors":"Konstantin Bauman, B. Liu, A. Tuzhilin","doi":"10.1145/3097983.3098170","DOIUrl":"https://doi.org/10.1145/3097983.3098170","url":null,"abstract":"In this paper, we propose a recommendation technique that not only can recommend items of interest to the user as traditional recommendation systems do but also specific aspects of consumption of the items to further enhance the user experience with those items. For example, it can recommend the user to go to a specific restaurant (item) and also order some specific foods there, e.g., seafood (an aspect of consumption). Our method is called Sentiment Utility Logistic Model (SULM). As its name suggests, SULM uses sentiment analysis of user reviews. It first predicts the sentiment that the user may have about the item based on what he/she might express about the aspects of the item and then identifies the most valuable aspects of the user's potential experience with that item. Furthermore, the method can recommend items together with those most important aspects over which the user has control and can potentially select them, such as the time to go to a restaurant, e.g. lunch vs. dinner, and what to order there, e.g., seafood. We tested the proposed method on three applications (restaurant, hotel, and beauty & spa) and experimentally showed that those users who followed our recommendations of the most valuable aspects while consuming the items, had better experiences, as defined by the overall rating.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134583411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 160
Small Batch or Large Batch?: Gaussian Walk with Rebound Can Teach 小批量还是大批量?:有反弹的高斯步可以教
Peifeng Yin, Ping Luo, Taiga Nakamura
{"title":"Small Batch or Large Batch?: Gaussian Walk with Rebound Can Teach","authors":"Peifeng Yin, Ping Luo, Taiga Nakamura","doi":"10.1145/3097983.3098147","DOIUrl":"https://doi.org/10.1145/3097983.3098147","url":null,"abstract":"Efficiency of large-scale learning is a hot topic in both academic and industry. The stochastic gradient descent (SGD) algorithm, and its extension mini-batch SGD, allow the model to be updated without scanning the whole data set. However, the use of approximate gradient leads to the uncertainty issue, slowing down the decreasing of objective function. Furthermore, such uncertainty may result in a high frequency of meaningless update on the model, causing a communication issue in parallel learning environment. In this work, we develop a batch-adaptive stochastic gradient descent (BA-SGD) algorithm, which can dynamically choose a proper batch size as learning proceeds. Particularly on the basis of Taylor extension and central limit theorem, it models the decrease of objective value as a Gaussian random walk game with rebound. In this game, a heuristic strategy of determining batch size is adopted to maximize the utility of each incremental sampling. By evaluation on multiple real data sets, we demonstrate that by smartly choosing the batch size, the BA-SGD not only conserves the fast convergence of SGD algorithm but also avoids too frequent model updates.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130772789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Robust Spectral Clustering for Noisy Data: Modeling Sparse Corruptions Improves Latent Embeddings 噪声数据的鲁棒谱聚类:稀疏损坏建模改进潜在嵌入
Aleksandar Bojchevski, Yves Matkovic, Stephan Günnemann
{"title":"Robust Spectral Clustering for Noisy Data: Modeling Sparse Corruptions Improves Latent Embeddings","authors":"Aleksandar Bojchevski, Yves Matkovic, Stephan Günnemann","doi":"10.1145/3097983.3098156","DOIUrl":"https://doi.org/10.1145/3097983.3098156","url":null,"abstract":"Spectral clustering is one of the most prominent clustering approaches. However, it is highly sensitive to noisy input data. In this work, we propose a robust spectral clustering technique able to handle such scenarios. To achieve this goal, we propose a sparse and latent decomposition of the similarity graph used in spectral clustering. In our model, we jointly learn the spectral embedding as well as the corrupted data - thus, enhancing the clustering performance overall. We propose algorithmic solutions to all three established variants of spectral clustering, each showing linear complexity in the number of edges. Our experimental analysis confirms the significant potential of our approach for robust spectral clustering. Supplementary material is available at www.kdd.in.tum.de/RSC.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133371318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Evaluating U.S. Electoral Representation with a Joint Statistical Model of Congressional Roll-Calls, Legislative Text, and Voter Registration Data 用国会唱名、立法文本和选民登记数据的联合统计模型评估美国选举代表性
Zhengming Xing, Sunshine Hillygus, L. Carin
{"title":"Evaluating U.S. Electoral Representation with a Joint Statistical Model of Congressional Roll-Calls, Legislative Text, and Voter Registration Data","authors":"Zhengming Xing, Sunshine Hillygus, L. Carin","doi":"10.1145/3097983.3098151","DOIUrl":"https://doi.org/10.1145/3097983.3098151","url":null,"abstract":"Extensive information on 3 million randomly sampled United States citizens is used to construct a statistical model of constituent preferences for each U.S. congressional district. This model is linked to the legislative voting record of the legislator from each district, yielding an integrated model for constituency data, legislative roll-call votes, and the text of the legislation. The model is used to examine the extent to which legislators' voting records are aligned with constituent preferences, and the implications of that alignment (or lack thereof) on subsequent election outcomes. The analysis is based on a Bayesian formalism, with fast inference via a stochastic variational Bayesian analysis.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115089281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信