Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献_第5页

Direct optimization of ranking measures for learning to rank models 直接优化学习排序模型的排序方法

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487630

Ming Tan, Tian Xia, L. Guo, Shaojun Wang

引用次数: 29

SVMpAUCtight: a new support vector method for optimizing partial AUC based on a tight convex upper bound svmppauctight:一种基于紧凸上界优化局部AUC的支持向量方法

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487674

H. Narasimhan, S. Agarwal

{"title":"SVMpAUCtight: a new support vector method for optimizing partial AUC based on a tight convex upper bound","authors":"H. Narasimhan, S. Agarwal","doi":"10.1145/2487575.2487674","DOIUrl":"https://doi.org/10.1145/2487575.2487674","url":null,"abstract":"The area under the ROC curve (AUC) is a well known performance measure in machine learning and data mining. In an increasing number of applications, however, ranging from ranking applications to a variety of important bioinformatics applications, performance is measured in terms of the partial area under the ROC curve between two specified false positive rates. In recent work, we proposed a structural SVM based approach for optimizing this performance measure (Narasimhan and Agarwal, 2013). In this paper, we develop a new support vector method, SVMpAUCtight, that optimizes a tighter convex upper bound on the partial AUC loss, which leads to both improved accuracy and reduced computational complexity. In particular, by rewriting the empirical partial AUC risk as a maximum over subsets of negative instances, we derive a new formulation, where a modified form of the earlier optimization objective is evaluated on each of these subsets, leading to a tighter hinge relaxation on the partial AUC loss. As with our previous method, the resulting optimization problem can be solved using a cutting-plane algorithm, but the new method has better run time guarantees. We also discuss a projected subgradient method for solving this problem, which offers additional computational savings in certain settings. We demonstrate on a wide variety of bioinformatics tasks, ranging from protein-protein interaction prediction to drug discovery tasks, that the proposed method does, in many cases, perform significantly better on the partial AUC measure than the previous structural SVM approach. In addition, we also develop extensions of our method to learn sparse and group sparse models, often of interest in biological applications.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89343994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

When TEDDY meets GrizzLY: temporal dependency discovery for triggering road deicing operations 当TEDDY遇到GrizzLY:触发道路除冰操作的时间依赖性发现

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487706

C. Robardet, Vasile-Marian Scuturici, M. Plantevit, A. Fraboulet

引用次数: 1

An efficient ADMM algorithm for multidimensional anisotropic total variation regularization problems 多维各向异性全变分正则化问题的一种高效ADMM算法

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487586

Sen Yang, Jie Wang, Wei Fan, Xiatian Zhang, Peter Wonka, Jieping Ye

{"title":"An efficient ADMM algorithm for multidimensional anisotropic total variation regularization problems","authors":"Sen Yang, Jie Wang, Wei Fan, Xiatian Zhang, Peter Wonka, Jieping Ye","doi":"10.1145/2487575.2487586","DOIUrl":"https://doi.org/10.1145/2487575.2487586","url":null,"abstract":"Total variation (TV) regularization has important applications in signal processing including image denoising, image deblurring, and image reconstruction. A significant challenge in the practical use of TV regularization lies in the nondifferentiable convex optimization, which is difficult to solve especially for large-scale problems. In this paper, we propose an efficient alternating augmented Lagrangian method (ADMM) to solve total variation regularization problems. The proposed algorithm is applicable for tensors, thus it can solve multidimensional total variation regularization problems. One appealing feature of the proposed algorithm is that it does not need to solve a linear system of equations, which is often the most expensive part in previous ADMM-based methods. In addition, each step of the proposed algorithm involves a set of independent and smaller problems, which can be solved in parallel. Thus, the proposed algorithm scales to large size problems. Furthermore, the global convergence of the proposed algorithm is guaranteed, and the time complexity of the proposed algorithm is O(dN/ε) on a d-mode tensor with N entries for achieving an ε-optimal solution. Extensive experimental results demonstrate the superior performance of the proposed algorithm in comparison with current state-of-the-art methods.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82468564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

Collaborative boosting for activity classification in microblogs 微博活动分类的协同提升

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487661

Yangqiu Song, Zhengdong Lu, C. Leung, Qiang Yang

{"title":"Collaborative boosting for activity classification in microblogs","authors":"Yangqiu Song, Zhengdong Lu, C. Leung, Qiang Yang","doi":"10.1145/2487575.2487661","DOIUrl":"https://doi.org/10.1145/2487575.2487661","url":null,"abstract":"Users' daily activities, such as dining and shopping, inherently reflect their habits, intents and preferences, thus provide invaluable information for services such as personalized information recommendation and targeted advertising. Users' activity information, although ubiquitous on social media, has largely been unexploited. This paper addresses the task of user activity classification in microblogs, where users can publish short messages and maintain social networks online. We identify the importance of modeling a user's individuality, and that of exploiting opinions of the user's friends for accurate activity classification. In this light, we propose a novel collaborative boosting framework comprising a text-to-activity classifier for each user, and a mechanism for collaboration between classifiers of users having social connections. The collaboration between two classifiers includes exchanging their own training instances and their dynamically changing labeling decisions. We propose an iterative learning procedure that is formulated as gradient descent in learning function space, while opinion exchange between classifiers is implemented with a weighted voting in each learning iteration. We show through experiments that on real-world data from Sina Weibo, our method outperforms existing off-the-shelf algorithms that do not take users' individuality or social connections into account.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81621931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Speeding up large-scale learning with a social prior 加速具有社会先验的大规模学习

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487587

Deepayan Chakrabarti, R. Herbrich

引用次数: 2

Massively parallel expectation maximization using graphics processing units 使用图形处理单元实现大规模并行期望最大化

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487628

M. C. Altinigneli, C. Plant, C. Böhm

{"title":"Massively parallel expectation maximization using graphics processing units","authors":"M. C. Altinigneli, C. Plant, C. Böhm","doi":"10.1145/2487575.2487628","DOIUrl":"https://doi.org/10.1145/2487575.2487628","url":null,"abstract":"Composed of several hundreds of processors, the Graphics Processing Unit (GPU) has become a very interesting platform for computationally demanding tasks on massive data. A special hierarchy of processors and fast memory units allow very powerful and efficient parallelization but also demands novel parallel algorithms. Expectation Maximization (EM) is a widely used technique for maximum likelihood estimation. In this paper, we propose an innovative EM clustering algorithm particularly suited for the GPU platform on NVIDIA's Fermi architecture. The central idea of our algorithm is to allow the parallel threads exchanging their local information in an asynchronous way and thus updating their cluster representatives on demand by a technique called Asynchronous Model Updates (Async-EM). Async-EM enables our algorithm not only to accelerate convergence but also to reduce the overhead induced by memory bandwidth limitations and synchronization requirements. We demonstrate (1) how to reformulate the EM algorithm to be able to exchange information using Async-EM and (2) how to exploit the special memory and processor architecture of a modern GPU in order to share this information among threads in an optimal way. As a perspective Async-EM is not limited to EM but can be applied to a variety of algorithms.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"201 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90299323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Unsupervised link prediction using aggregative statistics on heterogeneous social networks 基于聚合统计的异构社会网络无监督链接预测

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487614

Tsung-Ting Kuo, Rui Yan, Yu-Yang Huang, Perng-Hwa Kung, Shou-de Lin

{"title":"Unsupervised link prediction using aggregative statistics on heterogeneous social networks","authors":"Tsung-Ting Kuo, Rui Yan, Yu-Yang Huang, Perng-Hwa Kung, Shou-de Lin","doi":"10.1145/2487575.2487614","DOIUrl":"https://doi.org/10.1145/2487575.2487614","url":null,"abstract":"The concern of privacy has become an important issue for online social networks. In services such as Foursquare.com, whether a person likes an article is considered private and therefore not disclosed; only the aggregative statistics of articles (i.e., how many people like this article) is revealed. This paper tries to answer a question: can we predict the opinion holder in a heterogeneous social network without any labeled data? This question can be generalized to a link prediction with aggregative statistics problem. This paper devises a novel unsupervised framework to solve this problem, including two main components: (1) a three-layer factor graph model and three types of potential functions; (2) a ranked-margin learning and inference algorithm. Finally, we evaluate our method on four diverse prediction scenarios using four datasets: preference (Foursquare), repost (Twitter), response (Plurk), and citation (DBLP). We further exploit nine unsupervised models to solve this problem as baselines. Our approach not only wins out in all scenarios, but on the average achieves 9.90% AUC and 12.59% NDCG improvement over the best competitors. The resources are available at http://www.csie.ntu.edu.tw/~d97944007/aggregative/","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90371022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

Improving quality control by early prediction of manufacturing outcomes 通过对生产结果的早期预测来改善质量控制

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2488192

S. Weiss, Amit Dhurandhar, R. Baseman

{"title":"Improving quality control by early prediction of manufacturing outcomes","authors":"S. Weiss, Amit Dhurandhar, R. Baseman","doi":"10.1145/2487575.2488192","DOIUrl":"https://doi.org/10.1145/2487575.2488192","url":null,"abstract":"We describe methods for continual prediction of manufactured product quality prior to final testing. In our most expansive modeling approach, an estimated final characteristic of a product is updated after each manufacturing operation. Our initial application is for the manufacture of microprocessors, and we predict final microprocessor speed. Using these predictions, early corrective manufacturing actions may be taken to increase the speed of expected slow wafers (a collection of microprocessors) or reduce the speed of fast wafers. Such predictions may also be used to initiate corrective supply chain management actions. Developing statistical learning models for this task has many complicating factors: (a) a temporally unstable population (b) missing data that is a result of sparsely sampled measurements and (c) relatively few available measurements prior to corrective action opportunities. In a real manufacturing pilot application, our automated models selected 125 fast wafers in real-time. As predicted, those wafers were significantly faster than average. During manufacture, downstream corrective processing restored 25 nominally unacceptable wafers to normal operation.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78043618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Model selection in markovian processes 马尔可夫过程中的模型选择

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI: 10.1145/2487575.2487613

Assaf Hallak, Dotan Di Castro, Shie Mannor

{"title":"Model selection in markovian processes","authors":"Assaf Hallak, Dotan Di Castro, Shie Mannor","doi":"10.1145/2487575.2487613","DOIUrl":"https://doi.org/10.1145/2487575.2487613","url":null,"abstract":"When analyzing data that originated from a dynamical system, a common practice is to encompass the problem in the well known frameworks of Markov Decision Processes (MDPs) and Reinforcement Learning (RL). The state space in these solutions is usually chosen in some heuristic fashion and the formed MDP can then be used to simulate and predict data, as well as indicate the best possible action in each state. The model chosen to characterize the data affects the complexity and accuracy of any further action we may wish to apply, yet few methods that rely on the dynamic structure to select such a model were suggested. In this work we address the problem of how to use time series data to choose from a finite set of candidate discrete state spaces, where these spaces are constructed by a domain expert. We formalize the notion of model selection consistency in the proposed setup. We then discuss the difference between our proposed framework and the classical Maximum Likelihood (ML) framework, and give an example where ML fails. Afterwards, we suggest alternative selection criteria and show them to be weakly consistent. We then define weak consistency for a model construction algorithm and show a simple algorithm that is weakly consistent. Finally, we test the performance of the suggested criteria and algorithm on both simulated and real world data.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81982832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25