{"title":"厨房的水槽","authors":"Gary Smith","doi":"10.1093/oso/9780198824305.003.0009","DOIUrl":null,"url":null,"abstract":"Back in the 1980s, I talked to an economics professor who made forecasts for a large bank based on simple correlations like the one in Figure 1. If he wanted to forecast consumer spending, he made a scatter plot of income and spending and used a transparent ruler to draw a line that seemed to fit the data. If the scatter looked like Figure 1, then when income went up, he predicted that spending would go up. The problem with his simple scatter plots is that the world is not simple. Income affects spending, but so does wealth. What if this professor happened to draw his scatter plot using data from a historical period in which income rose (increasing spending) but the stock market crashed (reducing spending) and the wealth effect was more powerful than the income effect, so that spending declined, as in Figure 2? The professor’s scatter plot of spending and income will indicate that an increase in income reduces spending. Then, when he tries to forecast spending for a period when income and wealth both increase, his prediction of a decline in spending will be disastrously wrong. Multiple regression to the rescue. Multiple regression models have multiple explanatory variables. For example, a model of consumer spending might be: C = a + bY + cW where C is consumer spending, Y is household income, and W is wealth. The order in which the explanatory variables are listed does not matter. What does matter is which variables are included in the model and which are left out. A large part of the art of regression analysis is choosing explanatory variables that are important and ignoring those that are unimportant. The coefficient b measures the effect on spending of an increase in income, holding wealth constant, and c measures the effect on spending of an increase in wealth, holding income constant. The math for estimating these coefficients is complicated but the principle is simple: choose the estimates that give the best predictions of consumer spending for the data used to estimate the model. In Chapter 4, we saw that spurious correlations can appear when we compare variables like spending, income, and wealth that all tend to increase over time.","PeriodicalId":308433,"journal":{"name":"The AI Delusion","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Kitchen Sink\",\"authors\":\"Gary Smith\",\"doi\":\"10.1093/oso/9780198824305.003.0009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Back in the 1980s, I talked to an economics professor who made forecasts for a large bank based on simple correlations like the one in Figure 1. If he wanted to forecast consumer spending, he made a scatter plot of income and spending and used a transparent ruler to draw a line that seemed to fit the data. If the scatter looked like Figure 1, then when income went up, he predicted that spending would go up. The problem with his simple scatter plots is that the world is not simple. Income affects spending, but so does wealth. What if this professor happened to draw his scatter plot using data from a historical period in which income rose (increasing spending) but the stock market crashed (reducing spending) and the wealth effect was more powerful than the income effect, so that spending declined, as in Figure 2? The professor’s scatter plot of spending and income will indicate that an increase in income reduces spending. Then, when he tries to forecast spending for a period when income and wealth both increase, his prediction of a decline in spending will be disastrously wrong. Multiple regression to the rescue. Multiple regression models have multiple explanatory variables. For example, a model of consumer spending might be: C = a + bY + cW where C is consumer spending, Y is household income, and W is wealth. The order in which the explanatory variables are listed does not matter. What does matter is which variables are included in the model and which are left out. A large part of the art of regression analysis is choosing explanatory variables that are important and ignoring those that are unimportant. The coefficient b measures the effect on spending of an increase in income, holding wealth constant, and c measures the effect on spending of an increase in wealth, holding income constant. The math for estimating these coefficients is complicated but the principle is simple: choose the estimates that give the best predictions of consumer spending for the data used to estimate the model. In Chapter 4, we saw that spurious correlations can appear when we compare variables like spending, income, and wealth that all tend to increase over time.\",\"PeriodicalId\":308433,\"journal\":{\"name\":\"The AI Delusion\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The AI Delusion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/oso/9780198824305.003.0009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The AI Delusion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/oso/9780198824305.003.0009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
早在20世纪80年代,我和一位经济学教授交谈过,他根据图1所示的简单相关性对一家大型银行进行了预测。如果他想预测消费者支出,他会绘制收入和支出的散点图,并用透明的尺子画一条似乎与数据相符的线。如果散点看起来像图1,那么当收入增加时,他预测支出会增加。他的简单散点图的问题在于,世界并不简单。收入会影响支出,但财富也会。如果这位教授碰巧用一个历史时期的数据绘制散点图,在这个时期,收入增加(增加支出),但股市崩溃(减少支出),财富效应比收入效应更强大,因此支出下降,如图2所示,会怎么样?教授的支出和收入的散点图将表明,收入的增加会减少支出。然后,当他试图预测收入和财富都增加的时期的支出时,他关于支出下降的预测将是灾难性的错误。多元回归来拯救。多元回归模型有多个解释变量。例如,消费者支出的模型可能是:C = a + bY + cW,其中C是消费者支出,Y是家庭收入,W是财富。解释变量列出的顺序无关紧要。重要的是哪些变量包含在模型中,哪些被遗漏了。回归分析的艺术很大一部分是选择重要的解释变量,忽略那些不重要的解释变量。系数b衡量的是在保持财富不变的情况下,收入增加对支出的影响;系数c衡量的是在保持收入不变的情况下,财富增加对支出的影响。估算这些系数的数学运算很复杂,但原理很简单:选择能对用于估算模型的数据做出最佳预测的估算值。在第四章中,我们看到,当我们比较支出、收入和财富等变量时,虚假相关性可能会出现,这些变量都倾向于随着时间的推移而增加。
Back in the 1980s, I talked to an economics professor who made forecasts for a large bank based on simple correlations like the one in Figure 1. If he wanted to forecast consumer spending, he made a scatter plot of income and spending and used a transparent ruler to draw a line that seemed to fit the data. If the scatter looked like Figure 1, then when income went up, he predicted that spending would go up. The problem with his simple scatter plots is that the world is not simple. Income affects spending, but so does wealth. What if this professor happened to draw his scatter plot using data from a historical period in which income rose (increasing spending) but the stock market crashed (reducing spending) and the wealth effect was more powerful than the income effect, so that spending declined, as in Figure 2? The professor’s scatter plot of spending and income will indicate that an increase in income reduces spending. Then, when he tries to forecast spending for a period when income and wealth both increase, his prediction of a decline in spending will be disastrously wrong. Multiple regression to the rescue. Multiple regression models have multiple explanatory variables. For example, a model of consumer spending might be: C = a + bY + cW where C is consumer spending, Y is household income, and W is wealth. The order in which the explanatory variables are listed does not matter. What does matter is which variables are included in the model and which are left out. A large part of the art of regression analysis is choosing explanatory variables that are important and ignoring those that are unimportant. The coefficient b measures the effect on spending of an increase in income, holding wealth constant, and c measures the effect on spending of an increase in wealth, holding income constant. The math for estimating these coefficients is complicated but the principle is simple: choose the estimates that give the best predictions of consumer spending for the data used to estimate the model. In Chapter 4, we saw that spurious correlations can appear when we compare variables like spending, income, and wealth that all tend to increase over time.