{"title":"高维数据的稳健线性回归:综述","authors":"P. Filzmoser, K. Nordhausen","doi":"10.1002/wics.1524","DOIUrl":null,"url":null,"abstract":"Digitization as the process of converting information into numbers leads to bigger and more complex data sets, bigger also with respect to the number of measured variables. This makes it harder or impossible for the practitioner to identify outliers or observations that are inconsistent with an underlying model. Classical least‐squares based procedures can be affected by those outliers. In the regression context, this means that the parameter estimates are biased, with consequences on the validity of the statistical inference, on regression diagnostics, and on the prediction accuracy. Robust regression methods aim at assigning appropriate weights to observations that deviate from the model. While robust regression techniques are widely known in the low‐dimensional case, researchers and practitioners might still not be very familiar with developments in this direction for high‐dimensional data. Recently, different strategies have been proposed for robust regression in the high‐dimensional case, typically based on dimension reduction, on shrinkage, including sparsity, and on combinations of such techniques. A very recent concept is downweighting single cells of the data matrix rather than complete observations, with the goal to make better use of the model‐consistent information, and thus to achieve higher efficiency of the parameter estimates.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2020-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1524","citationCount":"24","resultStr":"{\"title\":\"Robust linear regression for high‐dimensional data: An overview\",\"authors\":\"P. Filzmoser, K. Nordhausen\",\"doi\":\"10.1002/wics.1524\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Digitization as the process of converting information into numbers leads to bigger and more complex data sets, bigger also with respect to the number of measured variables. This makes it harder or impossible for the practitioner to identify outliers or observations that are inconsistent with an underlying model. Classical least‐squares based procedures can be affected by those outliers. In the regression context, this means that the parameter estimates are biased, with consequences on the validity of the statistical inference, on regression diagnostics, and on the prediction accuracy. Robust regression methods aim at assigning appropriate weights to observations that deviate from the model. While robust regression techniques are widely known in the low‐dimensional case, researchers and practitioners might still not be very familiar with developments in this direction for high‐dimensional data. Recently, different strategies have been proposed for robust regression in the high‐dimensional case, typically based on dimension reduction, on shrinkage, including sparsity, and on combinations of such techniques. A very recent concept is downweighting single cells of the data matrix rather than complete observations, with the goal to make better use of the model‐consistent information, and thus to achieve higher efficiency of the parameter estimates.\",\"PeriodicalId\":47779,\"journal\":{\"name\":\"Wiley Interdisciplinary Reviews-Computational Statistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2020-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1002/wics.1524\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Wiley Interdisciplinary Reviews-Computational Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1002/wics.1524\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wiley Interdisciplinary Reviews-Computational Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/wics.1524","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Robust linear regression for high‐dimensional data: An overview
Digitization as the process of converting information into numbers leads to bigger and more complex data sets, bigger also with respect to the number of measured variables. This makes it harder or impossible for the practitioner to identify outliers or observations that are inconsistent with an underlying model. Classical least‐squares based procedures can be affected by those outliers. In the regression context, this means that the parameter estimates are biased, with consequences on the validity of the statistical inference, on regression diagnostics, and on the prediction accuracy. Robust regression methods aim at assigning appropriate weights to observations that deviate from the model. While robust regression techniques are widely known in the low‐dimensional case, researchers and practitioners might still not be very familiar with developments in this direction for high‐dimensional data. Recently, different strategies have been proposed for robust regression in the high‐dimensional case, typically based on dimension reduction, on shrinkage, including sparsity, and on combinations of such techniques. A very recent concept is downweighting single cells of the data matrix rather than complete observations, with the goal to make better use of the model‐consistent information, and thus to achieve higher efficiency of the parameter estimates.