线性概率模型(LPM)和大数据:好、坏、丑

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic) Pub Date : 2016-10-11 DOI:10.2139/ssrn.2353841

S. Chatla, Galit Shmueli

{"title":"线性概率模型(LPM)和大数据:好、坏、丑","authors":"S. Chatla, Galit Shmueli","doi":"10.2139/ssrn.2353841","DOIUrl":null,"url":null,"abstract":"Linear regression is among the most popular statistical models in social sciences research. Linear probability models (LPMs) - linear regression models applied to a binary outcome - are used in various disciplines. Surprisingly, LPMs are rare in the IS literature, where logit and probit models are typically used for binary outcomes. LPMs have been examined with respect to specific aspects, but a thorough evaluation of their practical pros and cons for different research goals under different scenarios is missing. We perform an extensive simulation study to evaluate the advantages and dangers of LPMs, especially in the realm of Big Data that now affects IS research. We evaluate LPM for the three common uses of binary outcome models: inference and estimation, prediction and classification, and selection bias. We compare its performance to logit and probit, under different sample sizes, error distributions, and more. We find that coefficient directions, statistical significance, and marginal effects yield results similar to logit and probit. Although LPM coefficients are biased, they are consistent for the true parameters up to a multiplicative scalar. Coefficient bias can be corrected by assuming an error distribution. For classification and selection bias, LPM is on par with logit and probit in terms of class separation and ranking, and is a viable alternative in selection models. It is lacking when the predicted probabilities are directly of interest, because predicted probabilities can exceed the unit interval. We illustrate some of these results through by modeling price in online auctions, using data from eBay.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Linear Probability Models (LPM) and Big Data: The Good, the Bad, and the Ugly\",\"authors\":\"S. Chatla, Galit Shmueli\",\"doi\":\"10.2139/ssrn.2353841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Linear regression is among the most popular statistical models in social sciences research. Linear probability models (LPMs) - linear regression models applied to a binary outcome - are used in various disciplines. Surprisingly, LPMs are rare in the IS literature, where logit and probit models are typically used for binary outcomes. LPMs have been examined with respect to specific aspects, but a thorough evaluation of their practical pros and cons for different research goals under different scenarios is missing. We perform an extensive simulation study to evaluate the advantages and dangers of LPMs, especially in the realm of Big Data that now affects IS research. We evaluate LPM for the three common uses of binary outcome models: inference and estimation, prediction and classification, and selection bias. We compare its performance to logit and probit, under different sample sizes, error distributions, and more. We find that coefficient directions, statistical significance, and marginal effects yield results similar to logit and probit. Although LPM coefficients are biased, they are consistent for the true parameters up to a multiplicative scalar. Coefficient bias can be corrected by assuming an error distribution. For classification and selection bias, LPM is on par with logit and probit in terms of class separation and ranking, and is a viable alternative in selection models. It is lacking when the predicted probabilities are directly of interest, because predicted probabilities can exceed the unit interval. We illustrate some of these results through by modeling price in online auctions, using data from eBay.\",\"PeriodicalId\":384078,\"journal\":{\"name\":\"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.2353841\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.2353841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

线性回归是社会科学研究中最常用的统计模型之一。线性概率模型(lpm) -应用于二元结果的线性回归模型-用于各种学科。令人惊讶的是，lpm在IS文献中很少见，其中logit和probit模型通常用于二进制结果。lpm已经从特定的方面进行了研究，但是对于不同场景下不同研究目标的实际优缺点的全面评估仍然缺失。我们进行了广泛的模拟研究，以评估lpm的优势和危险，特别是在现在影响IS研究的大数据领域。我们评估了二元结果模型的三种常见用途:推理和估计，预测和分类，以及选择偏差。在不同的样本量、误差分布等情况下，我们将其性能与logit和probit进行比较。我们发现系数方向、统计显著性和边际效应产生的结果与logit和probit相似。尽管LPM系数是有偏差的，但它们对于真正的参数是一致的，直到一个乘法标量。系数偏差可以通过假设误差分布来修正。对于分类和选择偏差，LPM在类分离和排序方面与logit和probit相当，是一种可行的选择模型。当预测概率是直接感兴趣的，因为预测概率可能超过单位区间时，它是缺乏的。我们通过使用eBay的数据对在线拍卖中的价格进行建模来说明其中的一些结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Linear Probability Models (LPM) and Big Data: The Good, the Bad, and the Ugly

Linear regression is among the most popular statistical models in social sciences research. Linear probability models (LPMs) - linear regression models applied to a binary outcome - are used in various disciplines. Surprisingly, LPMs are rare in the IS literature, where logit and probit models are typically used for binary outcomes. LPMs have been examined with respect to specific aspects, but a thorough evaluation of their practical pros and cons for different research goals under different scenarios is missing. We perform an extensive simulation study to evaluate the advantages and dangers of LPMs, especially in the realm of Big Data that now affects IS research. We evaluate LPM for the three common uses of binary outcome models: inference and estimation, prediction and classification, and selection bias. We compare its performance to logit and probit, under different sample sizes, error distributions, and more. We find that coefficient directions, statistical significance, and marginal effects yield results similar to logit and probit. Although LPM coefficients are biased, they are consistent for the true parameters up to a multiplicative scalar. Coefficient bias can be corrected by assuming an error distribution. For classification and selection bias, LPM is on par with logit and probit in terms of class separation and ranking, and is a viable alternative in selection models. It is lacking when the predicted probabilities are directly of interest, because predicted probabilities can exceed the unit interval. We illustrate some of these results through by modeling price in online auctions, using data from eBay.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

自引率

0.00%

发文量