Robust Multi-Task Feature Learning.

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining Pub Date : 2012-08-12 DOI:10.1145/2339530.2339672

Pinghua Gong, Jieping Ye, Changshui Zhang

{"title":"Robust Multi-Task Feature Learning.","authors":"Pinghua Gong, Jieping Ye, Changshui Zhang","doi":"10.1145/2339530.2339672","DOIUrl":null,"url":null,"abstract":"<p><p>Multi-task learning (MTL) aims to improve the performance of multiple related tasks by exploiting the intrinsic relationships among them. Recently, multi-task feature learning algorithms have received increasing attention and they have been successfully applied to many applications involving high-dimensional data. However, they assume that all tasks share a common set of features, which is too restrictive and may not hold in real-world applications, since outlier tasks often exist. In this paper, we propose a Robust MultiTask Feature Learning algorithm (rMTFL) which simultaneously captures a common set of features among relevant tasks and identifies outlier tasks. Specifically, we decompose the weight (model) matrix for all tasks into two components. We impose the well-known group Lasso penalty on row groups of the first component for capturing the shared features among relevant tasks. To simultaneously identify the outlier tasks, we impose the same group Lasso penalty but on column groups of the second component. We propose to employ the accelerated gradient descent to efficiently solve the optimization problem in rMTFL, and show that the proposed algorithm is scalable to large-size problems. In addition, we provide a detailed theoretical analysis on the proposed rMTFL formulation. Specifically, we present a theoretical bound to measure how well our proposed rMTFL approximates the true evaluation, and provide bounds to measure the error between the estimated weights of rMTFL and the underlying true weights. Moreover, by assuming that the underlying true weights are above the noise level, we present a sound theoretical result to show how to obtain the underlying true shared features and outlier tasks (sparsity patterns). Empirical studies on both synthetic and real-world data demonstrate that our proposed rMTFL is capable of simultaneously capturing shared features among tasks and identifying outlier tasks.</p>","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783219/pdf/nihms497474.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2339530.2339672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-task learning (MTL) aims to improve the performance of multiple related tasks by exploiting the intrinsic relationships among them. Recently, multi-task feature learning algorithms have received increasing attention and they have been successfully applied to many applications involving high-dimensional data. However, they assume that all tasks share a common set of features, which is too restrictive and may not hold in real-world applications, since outlier tasks often exist. In this paper, we propose a Robust MultiTask Feature Learning algorithm (rMTFL) which simultaneously captures a common set of features among relevant tasks and identifies outlier tasks. Specifically, we decompose the weight (model) matrix for all tasks into two components. We impose the well-known group Lasso penalty on row groups of the first component for capturing the shared features among relevant tasks. To simultaneously identify the outlier tasks, we impose the same group Lasso penalty but on column groups of the second component. We propose to employ the accelerated gradient descent to efficiently solve the optimization problem in rMTFL, and show that the proposed algorithm is scalable to large-size problems. In addition, we provide a detailed theoretical analysis on the proposed rMTFL formulation. Specifically, we present a theoretical bound to measure how well our proposed rMTFL approximates the true evaluation, and provide bounds to measure the error between the estimated weights of rMTFL and the underlying true weights. Moreover, by assuming that the underlying true weights are above the noise level, we present a sound theoretical result to show how to obtain the underlying true shared features and outlier tasks (sparsity patterns). Empirical studies on both synthetic and real-world data demonstrate that our proposed rMTFL is capable of simultaneously capturing shared features among tasks and identifying outlier tasks.

查看原文本刊更多论文

稳健的多任务特征学习

多任务学习（MTL）旨在通过利用多个相关任务之间的内在关系来提高这些任务的性能。最近，多任务特征学习算法受到越来越多的关注，并成功应用于许多涉及高维数据的应用中。然而，这些算法假设所有任务都有一组共同的特征，这限制性太大，在实际应用中可能不成立，因为离群任务经常存在。在本文中，我们提出了一种鲁棒多任务特征学习算法（rMTFL），它能同时捕捉相关任务的共同特征集，并识别离群任务。具体来说，我们将所有任务的权重（模型）矩阵分解为两个部分。我们对第一部分的行组施加众所周知的组 Lasso 惩罚，以捕捉相关任务之间的共同特征。为了同时识别离群任务，我们对第二个分量的列组施加同样的组 Lasso 惩罚。我们建议采用加速梯度下降法来高效解决 rMTFL 中的优化问题，并证明所建议的算法可扩展至大型问题。此外，我们还对提出的 rMTFL 公式进行了详细的理论分析。具体来说，我们提出了一个理论边界来衡量我们提出的 rMTFL 在多大程度上逼近了真实评估，并提供了衡量 rMTFL 估计权重与底层真实权重之间误差的边界。此外，通过假设底层真实权重高于噪声水平，我们提出了一个合理的理论结果，说明如何获得底层真实的共享特征和离群任务（稀疏模式）。对合成数据和真实世界数据的实证研究表明，我们提出的 rMTFL 能够同时捕捉任务间的共享特征并识别离群任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

自引率

0.00%

发文量