Adapting linear discriminant analysis to the paradigm of learning from label proportions

M. Pérez-Ortiz, Pedro Antonio Gutiérrez, Mariano Carbonero-Ruz, C. Hervás‐Martínez
{"title":"Adapting linear discriminant analysis to the paradigm of learning from label proportions","authors":"M. Pérez-Ortiz, Pedro Antonio Gutiérrez, Mariano Carbonero-Ruz, C. Hervás‐Martínez","doi":"10.1109/SSCI.2016.7850150","DOIUrl":null,"url":null,"abstract":"The recently coined term “learning from label proportions” refers to a new learning paradigm where training data is given by groups (also denoted as “bags”), and the only known information is the label proportion of each bag. The aim is then to construct a classification model to predict the class label of an individual instance, which differentiates this paradigm from the one of multi-instance learning. This learning setting presents very different applications in political science, marketing, healthcare and, in general, all fields in relation with anonymous data. In this paper, two new strategies are proposed to tackle this kind of problems. Both proposals are based on the optimisation of pattern class memberships using the data distribution in each bag and the known label proportions. To do so, linear discriminant analysis has been reformulated to work with non-crisp class memberships. The experimental part of this paper sets different objetives: 1) study the difference in performance, comparing our proposals and the fully supervised setting, 2) analyse the potential benefits of refining class memberships by the proposed approaches, and 3) test the influence of other factors in the performance, such as the number of classes or the bag size. The results of these experiments are promising, but further research should be encouraged for studying more complex data configurations.","PeriodicalId":120288,"journal":{"name":"2016 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI.2016.7850150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The recently coined term “learning from label proportions” refers to a new learning paradigm where training data is given by groups (also denoted as “bags”), and the only known information is the label proportion of each bag. The aim is then to construct a classification model to predict the class label of an individual instance, which differentiates this paradigm from the one of multi-instance learning. This learning setting presents very different applications in political science, marketing, healthcare and, in general, all fields in relation with anonymous data. In this paper, two new strategies are proposed to tackle this kind of problems. Both proposals are based on the optimisation of pattern class memberships using the data distribution in each bag and the known label proportions. To do so, linear discriminant analysis has been reformulated to work with non-crisp class memberships. The experimental part of this paper sets different objetives: 1) study the difference in performance, comparing our proposals and the fully supervised setting, 2) analyse the potential benefits of refining class memberships by the proposed approaches, and 3) test the influence of other factors in the performance, such as the number of classes or the bag size. The results of these experiments are promising, but further research should be encouraged for studying more complex data configurations.
将线性判别分析应用于标签比例学习范式
最近创造的术语“从标签比例中学习”指的是一种新的学习范式,其中训练数据是按组(也表示为“袋”)给出的,唯一已知的信息是每个袋的标签比例。目的是构建一个分类模型来预测单个实例的类标签,这将该范式与多实例学习范式区分开来。这种学习环境在政治学、市场营销、医疗保健以及与匿名数据相关的所有领域中都有非常不同的应用。本文提出了两种新的策略来解决这类问题。这两种方案都基于使用每个包中的数据分布和已知标签比例来优化模式类隶属关系。为了做到这一点,线性判别分析已经被重新制定,以处理非清晰的类成员。本文的实验部分设定了不同的目标:1)研究性能的差异,比较我们的建议和完全监督的设置,2)分析通过提出的方法精炼类成员的潜在好处,3)测试其他因素对性能的影响,如类的数量或包的大小。这些实验的结果是有希望的,但应该鼓励进一步的研究,以研究更复杂的数据配置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信