{"title":"基于距离的线性模型的稳健性:一些建议","authors":"Eva Boj , Aurea Grané","doi":"10.1016/j.seps.2024.101992","DOIUrl":null,"url":null,"abstract":"<div><p>In this work tailor robust metrics are proposed to be used in the predictors’ space of distance-based predictive models. The first proposal is a robust version of Gower’s distance, which takes into account the correlation structure of the data. The second one is a rather complex metric, constructed via Related Metric Scaling, which is able to discard redundant information coming from different sources. Another novelty is the proposal of a distance-based trimming statistic to robustify the metrics. The performance of the models based on new robust metrics is evaluated through a simulation study and compared to those based on Euclidean, Gower’s and generalized Gower’s metrics in the presence of outliers in several datasets of multivariate heterogeneous data. Mean squared error (also median and standard deviation) are used to evaluate the effectiveness in the prediction of responses. Finally, two applications in the areas of sustainable transport and finance and banking are provided in order to illustrate the predictive power of these models. Computations are made using the <span>dbstats</span> package for <span>R</span>.</p></div>","PeriodicalId":22033,"journal":{"name":"Socio-economic Planning Sciences","volume":"95 ","pages":"Article 101992"},"PeriodicalIF":6.2000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0038012124001915/pdfft?md5=61345ff77767982844f8749f45524a9e&pid=1-s2.0-S0038012124001915-main.pdf","citationCount":"0","resultStr":"{\"title\":\"The robustification of distance-based linear models: Some proposals\",\"authors\":\"Eva Boj , Aurea Grané\",\"doi\":\"10.1016/j.seps.2024.101992\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In this work tailor robust metrics are proposed to be used in the predictors’ space of distance-based predictive models. The first proposal is a robust version of Gower’s distance, which takes into account the correlation structure of the data. The second one is a rather complex metric, constructed via Related Metric Scaling, which is able to discard redundant information coming from different sources. Another novelty is the proposal of a distance-based trimming statistic to robustify the metrics. The performance of the models based on new robust metrics is evaluated through a simulation study and compared to those based on Euclidean, Gower’s and generalized Gower’s metrics in the presence of outliers in several datasets of multivariate heterogeneous data. Mean squared error (also median and standard deviation) are used to evaluate the effectiveness in the prediction of responses. Finally, two applications in the areas of sustainable transport and finance and banking are provided in order to illustrate the predictive power of these models. Computations are made using the <span>dbstats</span> package for <span>R</span>.</p></div>\",\"PeriodicalId\":22033,\"journal\":{\"name\":\"Socio-economic Planning Sciences\",\"volume\":\"95 \",\"pages\":\"Article 101992\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0038012124001915/pdfft?md5=61345ff77767982844f8749f45524a9e&pid=1-s2.0-S0038012124001915-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Socio-economic Planning Sciences\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0038012124001915\",\"RegionNum\":2,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Socio-economic Planning Sciences","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0038012124001915","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
摘要
在这项工作中,我们提出了量身定制的稳健度量标准,用于基于距离的预测模型的预测因子空间。第一个建议是高尔距离的稳健版本,它考虑到了数据的相关结构。第二种是通过相关度量缩放构建的一种相当复杂的度量,能够摒弃来自不同来源的冗余信息。另一个新颖之处是提出了一种基于距离的修剪统计量,以增强度量的稳健性。通过模拟研究评估了基于新稳健度量的模型的性能,并将其与基于欧氏、高尔和广义高尔度量的模型进行了比较。平均平方误差(也包括中位数和标准偏差)用于评估预测响应的有效性。最后,为了说明这些模型的预测能力,提供了可持续交通和金融银行领域的两个应用。计算使用 R 的 dbstats 软件包。
The robustification of distance-based linear models: Some proposals
In this work tailor robust metrics are proposed to be used in the predictors’ space of distance-based predictive models. The first proposal is a robust version of Gower’s distance, which takes into account the correlation structure of the data. The second one is a rather complex metric, constructed via Related Metric Scaling, which is able to discard redundant information coming from different sources. Another novelty is the proposal of a distance-based trimming statistic to robustify the metrics. The performance of the models based on new robust metrics is evaluated through a simulation study and compared to those based on Euclidean, Gower’s and generalized Gower’s metrics in the presence of outliers in several datasets of multivariate heterogeneous data. Mean squared error (also median and standard deviation) are used to evaluate the effectiveness in the prediction of responses. Finally, two applications in the areas of sustainable transport and finance and banking are provided in order to illustrate the predictive power of these models. Computations are made using the dbstats package for R.
期刊介绍:
Studies directed toward the more effective utilization of existing resources, e.g. mathematical programming models of health care delivery systems with relevance to more effective program design; systems analysis of fire outbreaks and its relevance to the location of fire stations; statistical analysis of the efficiency of a developing country economy or industry.
Studies relating to the interaction of various segments of society and technology, e.g. the effects of government health policies on the utilization and design of hospital facilities; the relationship between housing density and the demands on public transportation or other service facilities: patterns and implications of urban development and air or water pollution.
Studies devoted to the anticipations of and response to future needs for social, health and other human services, e.g. the relationship between industrial growth and the development of educational resources in affected areas; investigation of future demands for material and child health resources in a developing country; design of effective recycling in an urban setting.