{"title":"Model Selection Through Model Sorting","authors":"Mohammad Ali Hajiani, Babak Seyfe","doi":"arxiv-2409.09674","DOIUrl":null,"url":null,"abstract":"We propose a novel approach to select the best model of the data. Based on\nthe exclusive properties of the nested models, we find the most parsimonious\nmodel containing the risk minimizer predictor. We prove the existence of\nprobable approximately correct (PAC) bounds on the difference of the minimum\nempirical risk of two successive nested models, called successive empirical\nexcess risk (SEER). Based on these bounds, we propose a model order selection\nmethod called nested empirical risk (NER). By the sorted NER (S-NER) method to\nsort the models intelligently, the minimum risk decreases. We construct a test\nthat predicts whether expanding the model decreases the minimum risk or not.\nWith a high probability, the NER and S-NER choose the true model order and the\nmost parsimonious model containing the risk minimizer predictor, respectively.\nWe use S-NER model selection in the linear regression and show that, the S-NER\nmethod without any prior information can outperform the accuracy of feature\nsorting algorithms like orthogonal matching pursuit (OMP) that aided with prior\nknowledge of the true model order. Also, in the UCR data set, the NER method\nreduces the complexity of the classification of UCR datasets dramatically, with\na negligible loss of accuracy.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09674","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a novel approach to select the best model of the data. Based on
the exclusive properties of the nested models, we find the most parsimonious
model containing the risk minimizer predictor. We prove the existence of
probable approximately correct (PAC) bounds on the difference of the minimum
empirical risk of two successive nested models, called successive empirical
excess risk (SEER). Based on these bounds, we propose a model order selection
method called nested empirical risk (NER). By the sorted NER (S-NER) method to
sort the models intelligently, the minimum risk decreases. We construct a test
that predicts whether expanding the model decreases the minimum risk or not.
With a high probability, the NER and S-NER choose the true model order and the
most parsimonious model containing the risk minimizer predictor, respectively.
We use S-NER model selection in the linear regression and show that, the S-NER
method without any prior information can outperform the accuracy of feature
sorting algorithms like orthogonal matching pursuit (OMP) that aided with prior
knowledge of the true model order. Also, in the UCR data set, the NER method
reduces the complexity of the classification of UCR datasets dramatically, with
a negligible loss of accuracy.