{"title":"改进的高维回归模型与矩阵近似应用于支持向量机的比较案例研究","authors":"M. Roozbeh, S. Babaie-Kafaki, Z. Aminifard","doi":"10.1080/10556788.2021.2022144","DOIUrl":null,"url":null,"abstract":"Nowadays, high-dimensional data appear in many practical applications such as biosciences. In the regression analysis literature, the well-known ordinary least-squares estimation may be misleading when the full ranking of the design matrix is missed. As a popular issue, outliers may corrupt normal distribution of the residuals. Thus, since not being sensitive to the outlying data points, robust estimators are frequently applied in confrontation with the issue. Ill-conditioning in high-dimensional data is another common problem in modern regression analysis under which applying the least-squares estimator is hardly possible. So, it is necessary to deal with estimation methods to tackle these problems. As known, a successful approach for high-dimension cases is the penalized scheme with the aim of obtaining a subset of effective explanatory variables that predict the response as the best, while setting the other parameters to zero. Here, we develop several penalized mixed-integer nonlinear programming models to be used in high-dimension regression analysis. The given matrix approximations have simple structures, decreasing computational cost of the models. Moreover, the models are effectively solvable by metaheuristic algorithms. Numerical tests are made to shed light on performance of the proposed methods on simulated and real world high-dimensional data sets.","PeriodicalId":124811,"journal":{"name":"Optimization Methods and Software","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Improved high-dimensional regression models with matrix approximations applied to the comparative case studies with support vector machines\",\"authors\":\"M. Roozbeh, S. Babaie-Kafaki, Z. Aminifard\",\"doi\":\"10.1080/10556788.2021.2022144\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, high-dimensional data appear in many practical applications such as biosciences. In the regression analysis literature, the well-known ordinary least-squares estimation may be misleading when the full ranking of the design matrix is missed. As a popular issue, outliers may corrupt normal distribution of the residuals. Thus, since not being sensitive to the outlying data points, robust estimators are frequently applied in confrontation with the issue. Ill-conditioning in high-dimensional data is another common problem in modern regression analysis under which applying the least-squares estimator is hardly possible. So, it is necessary to deal with estimation methods to tackle these problems. As known, a successful approach for high-dimension cases is the penalized scheme with the aim of obtaining a subset of effective explanatory variables that predict the response as the best, while setting the other parameters to zero. Here, we develop several penalized mixed-integer nonlinear programming models to be used in high-dimension regression analysis. The given matrix approximations have simple structures, decreasing computational cost of the models. Moreover, the models are effectively solvable by metaheuristic algorithms. Numerical tests are made to shed light on performance of the proposed methods on simulated and real world high-dimensional data sets.\",\"PeriodicalId\":124811,\"journal\":{\"name\":\"Optimization Methods and Software\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optimization Methods and Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/10556788.2021.2022144\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optimization Methods and Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/10556788.2021.2022144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved high-dimensional regression models with matrix approximations applied to the comparative case studies with support vector machines
Nowadays, high-dimensional data appear in many practical applications such as biosciences. In the regression analysis literature, the well-known ordinary least-squares estimation may be misleading when the full ranking of the design matrix is missed. As a popular issue, outliers may corrupt normal distribution of the residuals. Thus, since not being sensitive to the outlying data points, robust estimators are frequently applied in confrontation with the issue. Ill-conditioning in high-dimensional data is another common problem in modern regression analysis under which applying the least-squares estimator is hardly possible. So, it is necessary to deal with estimation methods to tackle these problems. As known, a successful approach for high-dimension cases is the penalized scheme with the aim of obtaining a subset of effective explanatory variables that predict the response as the best, while setting the other parameters to zero. Here, we develop several penalized mixed-integer nonlinear programming models to be used in high-dimension regression analysis. The given matrix approximations have simple structures, decreasing computational cost of the models. Moreover, the models are effectively solvable by metaheuristic algorithms. Numerical tests are made to shed light on performance of the proposed methods on simulated and real world high-dimensional data sets.