{"title":"Adaptive threshold-based classification of sparse high-dimensional data","authors":"T. Pavlenko, N. Stepanova, Lee Thompson","doi":"10.1214/22-ejs1998","DOIUrl":null,"url":null,"abstract":"Abstract: We revisit the problem of designing an efficient binary classifier in a challenging high-dimensional framework. The model under study assumes some local dependence structure among feature variables represented by a block-diagonal covariance matrix with a growing number of blocks of an arbitrary, but fixed size. The blocks correspond to non-overlapping independent groups of strongly correlated features. To assess the relevance of a particular block in predicting the response, we introduce a measure of “signal strength” pertaining to each feature block. This measure is then used to specify a sparse model of our interest. We further propose a threshold-based feature selector which operates as a screen-and-clean scheme integrated into a linear classifier: the data is subject to screening and hard threshold cleaning to filter out the blocks that contain no signals. Asymptotic properties of the proposed classifiers are studied when the sample size n depends on the number of feature blocks b, and the sample size goes to infinity with b at a slower rate than b. The new classifiers, which are fully adaptive to unknown parameters of the model, are shown to perform asymptotically optimally in a large part of the classification region. The numerical study confirms good analytical properties of the new classifiers that compare favorably to the existing threshold-based procedure used in a similar context.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Journal of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/22-ejs1998","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract: We revisit the problem of designing an efficient binary classifier in a challenging high-dimensional framework. The model under study assumes some local dependence structure among feature variables represented by a block-diagonal covariance matrix with a growing number of blocks of an arbitrary, but fixed size. The blocks correspond to non-overlapping independent groups of strongly correlated features. To assess the relevance of a particular block in predicting the response, we introduce a measure of “signal strength” pertaining to each feature block. This measure is then used to specify a sparse model of our interest. We further propose a threshold-based feature selector which operates as a screen-and-clean scheme integrated into a linear classifier: the data is subject to screening and hard threshold cleaning to filter out the blocks that contain no signals. Asymptotic properties of the proposed classifiers are studied when the sample size n depends on the number of feature blocks b, and the sample size goes to infinity with b at a slower rate than b. The new classifiers, which are fully adaptive to unknown parameters of the model, are shown to perform asymptotically optimally in a large part of the classification region. The numerical study confirms good analytical properties of the new classifiers that compare favorably to the existing threshold-based procedure used in a similar context.
期刊介绍:
The Electronic Journal of Statistics (EJS) publishes research articles and short notes on theoretical, computational and applied statistics. The journal is open access. Articles are refereed and are held to the same standard as articles in other IMS journals. Articles become publicly available shortly after they are accepted.