Adapting linear discriminant analysis to the paradigm of learning from label proportions

2016 IEEE Symposium Series on Computational Intelligence (SSCI) Pub Date : 2016-12-01 DOI:10.1109/SSCI.2016.7850150

M. Pérez-Ortiz, Pedro Antonio Gutiérrez, Mariano Carbonero-Ruz, C. Hervás‐Martínez

{"title":"Adapting linear discriminant analysis to the paradigm of learning from label proportions","authors":"M. Pérez-Ortiz, Pedro Antonio Gutiérrez, Mariano Carbonero-Ruz, C. Hervás‐Martínez","doi":"10.1109/SSCI.2016.7850150","DOIUrl":null,"url":null,"abstract":"The recently coined term “learning from label proportions” refers to a new learning paradigm where training data is given by groups (also denoted as “bags”), and the only known information is the label proportion of each bag. The aim is then to construct a classification model to predict the class label of an individual instance, which differentiates this paradigm from the one of multi-instance learning. This learning setting presents very different applications in political science, marketing, healthcare and, in general, all fields in relation with anonymous data. In this paper, two new strategies are proposed to tackle this kind of problems. Both proposals are based on the optimisation of pattern class memberships using the data distribution in each bag and the known label proportions. To do so, linear discriminant analysis has been reformulated to work with non-crisp class memberships. The experimental part of this paper sets different objetives: 1) study the difference in performance, comparing our proposals and the fully supervised setting, 2) analyse the potential benefits of refining class memberships by the proposed approaches, and 3) test the influence of other factors in the performance, such as the number of classes or the bag size. The results of these experiments are promising, but further research should be encouraged for studying more complex data configurations.","PeriodicalId":120288,"journal":{"name":"2016 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI.2016.7850150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The recently coined term “learning from label proportions” refers to a new learning paradigm where training data is given by groups (also denoted as “bags”), and the only known information is the label proportion of each bag. The aim is then to construct a classification model to predict the class label of an individual instance, which differentiates this paradigm from the one of multi-instance learning. This learning setting presents very different applications in political science, marketing, healthcare and, in general, all fields in relation with anonymous data. In this paper, two new strategies are proposed to tackle this kind of problems. Both proposals are based on the optimisation of pattern class memberships using the data distribution in each bag and the known label proportions. To do so, linear discriminant analysis has been reformulated to work with non-crisp class memberships. The experimental part of this paper sets different objetives: 1) study the difference in performance, comparing our proposals and the fully supervised setting, 2) analyse the potential benefits of refining class memberships by the proposed approaches, and 3) test the influence of other factors in the performance, such as the number of classes or the bag size. The results of these experiments are promising, but further research should be encouraged for studying more complex data configurations.

查看原文本刊更多论文

将线性判别分析应用于标签比例学习范式

最近创造的术语“从标签比例中学习”指的是一种新的学习范式，其中训练数据是按组(也表示为“袋”)给出的，唯一已知的信息是每个袋的标签比例。目的是构建一个分类模型来预测单个实例的类标签，这将该范式与多实例学习范式区分开来。这种学习环境在政治学、市场营销、医疗保健以及与匿名数据相关的所有领域中都有非常不同的应用。本文提出了两种新的策略来解决这类问题。这两种方案都基于使用每个包中的数据分布和已知标签比例来优化模式类隶属关系。为了做到这一点，线性判别分析已经被重新制定，以处理非清晰的类成员。本文的实验部分设定了不同的目标:1)研究性能的差异，比较我们的建议和完全监督的设置，2)分析通过提出的方法精炼类成员的潜在好处，3)测试其他因素对性能的影响，如类的数量或包的大小。这些实验的结果是有希望的，但应该鼓励进一步的研究，以研究更复杂的数据配置。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Symposium Series on Computational Intelligence (SSCI)

自引率

0.00%

发文量