标签比例的概率学习方法在美国总统选举中的应用

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI:10.1109/ICDM.2017.54

Tao Sun, D. Sheldon, Brendan OConnor

{"title":"标签比例的概率学习方法在美国总统选举中的应用","authors":"Tao Sun, D. Sheldon, Brendan OConnor","doi":"10.1109/ICDM.2017.54","DOIUrl":null,"url":null,"abstract":"Ecological inference (EI) is a classical problem from political science to model voting behavior of individuals given only aggregate election results. Flaxman et al. recently formulated EI as machine learning problem using distribution regression, and applied it to analyze US presidential elections. However, distribution regression unnecessarily aggregates individual-level covariates available from census microdata, and ignores known structure of the aggregation mechanism. We instead formulate the problem as learning with label proportions (LLP), and develop a new, probabilistic, LLP method to solve it. Our model is the straightforward one where individual votes are latent variables. We use cardinality potentials to efficiently perform exact inference over latent variables during learning, and introduce a novel message-passing algorithm to extend cardinality potentials to multivariate probability models for use within multiclass LLP problems. We show experimentally that LLP outperforms distribution regression for predicting individual-level attributes, and that our method is as good as or better than existing state-of-the-art LLP methods.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election\",\"authors\":\"Tao Sun, D. Sheldon, Brendan OConnor\",\"doi\":\"10.1109/ICDM.2017.54\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ecological inference (EI) is a classical problem from political science to model voting behavior of individuals given only aggregate election results. Flaxman et al. recently formulated EI as machine learning problem using distribution regression, and applied it to analyze US presidential elections. However, distribution regression unnecessarily aggregates individual-level covariates available from census microdata, and ignores known structure of the aggregation mechanism. We instead formulate the problem as learning with label proportions (LLP), and develop a new, probabilistic, LLP method to solve it. Our model is the straightforward one where individual votes are latent variables. We use cardinality potentials to efficiently perform exact inference over latent variables during learning, and introduce a novel message-passing algorithm to extend cardinality potentials to multivariate probability models for use within multiclass LLP problems. We show experimentally that LLP outperforms distribution regression for predicting individual-level attributes, and that our method is as good as or better than existing state-of-the-art LLP methods.\",\"PeriodicalId\":254086,\"journal\":{\"name\":\"2017 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2017.54\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2017.54","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

生态推理(Ecological inference, EI)是在给定总选举结果的情况下对个人投票行为进行建模的政治学经典问题。Flaxman等人最近使用分布回归将EI表述为机器学习问题，并将其应用于分析美国总统选举。然而，分布回归不必要地聚集了人口普查微观数据中可用的个人水平协变量，并忽略了聚集机制的已知结构。我们将这个问题表述为带标签比例的学习(LLP)，并开发了一种新的、概率的LLP方法来解决它。我们的模型很简单，其中个人投票是潜在变量。我们使用基数势在学习过程中有效地执行对潜在变量的精确推理，并引入了一种新的消息传递算法，将基数势扩展到多类LLP问题的多变量概率模型中。我们通过实验证明，在预测个人层面属性方面，LLP优于分布回归，并且我们的方法与现有的最先进的LLP方法一样好，甚至更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election

Ecological inference (EI) is a classical problem from political science to model voting behavior of individuals given only aggregate election results. Flaxman et al. recently formulated EI as machine learning problem using distribution regression, and applied it to analyze US presidential elections. However, distribution regression unnecessarily aggregates individual-level covariates available from census microdata, and ignores known structure of the aggregation mechanism. We instead formulate the problem as learning with label proportions (LLP), and develop a new, probabilistic, LLP method to solve it. Our model is the straightforward one where individual votes are latent variables. We use cardinality potentials to efficiently perform exact inference over latent variables during learning, and introduce a novel message-passing algorithm to extend cardinality potentials to multivariate probability models for use within multiclass LLP problems. We show experimentally that LLP outperforms distribution regression for predicting individual-level attributes, and that our method is as good as or better than existing state-of-the-art LLP methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量