An Efficient Greedy Search Algorithm for High-dimensional Linear Discriminant Analysis.

IF 1.2 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica Pub Date : 2023-05-01 DOI:10.5705/ss.202021.0028

Hannan Yang, D Y Lin, Quefeng Li

{"title":"An Efficient Greedy Search Algorithm for High-dimensional Linear Discriminant Analysis.","authors":"Hannan Yang, D Y Lin, Quefeng Li","doi":"10.5705/ss.202021.0028","DOIUrl":null,"url":null,"abstract":"<p><p>High-dimensional classification is an important statistical problem that has applications in many areas. One widely used classifier is the Linear Discriminant Analysis (LDA). In recent years, many regularized LDA classifiers have been proposed to solve the problem of high-dimensional classification. However, these methods rely on inverting a large matrix or solving large-scale optimization problems to render classification rules-methods that are computationally prohibitive when the dimension is ultra-high. With the emergence of big data, it is increasingly important to develop more efficient algorithms to solve the high-dimensional LDA problem. In this paper, we propose an efficient greedy search algorithm that depends solely on closed-form formulae to learn a high-dimensional LDA rule. We establish theoretical guarantee of its statistical properties in terms of variable selection and error rate consistency; in addition, we provide an explicit interpretation of the extra information brought by an additional feature in a LDA problem under some mild distributional assumptions. We demonstrate that this new algorithm drastically improves computational speed compared with other high-dimensional LDA methods, while maintaining comparable or even better classification performance.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 SI","pages":"1343-1364"},"PeriodicalIF":1.2000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10348717/pdf/nihms-1764480.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistica Sinica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.5705/ss.202021.0028","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 1

Abstract

High-dimensional classification is an important statistical problem that has applications in many areas. One widely used classifier is the Linear Discriminant Analysis (LDA). In recent years, many regularized LDA classifiers have been proposed to solve the problem of high-dimensional classification. However, these methods rely on inverting a large matrix or solving large-scale optimization problems to render classification rules-methods that are computationally prohibitive when the dimension is ultra-high. With the emergence of big data, it is increasingly important to develop more efficient algorithms to solve the high-dimensional LDA problem. In this paper, we propose an efficient greedy search algorithm that depends solely on closed-form formulae to learn a high-dimensional LDA rule. We establish theoretical guarantee of its statistical properties in terms of variable selection and error rate consistency; in addition, we provide an explicit interpretation of the extra information brought by an additional feature in a LDA problem under some mild distributional assumptions. We demonstrate that this new algorithm drastically improves computational speed compared with other high-dimensional LDA methods, while maintaining comparable or even better classification performance.

查看原文本刊更多论文

一种高效的高维线性判别分析贪心搜索算法。

高维分类是一个重要的统计问题，在许多领域都有应用。一个广泛使用的分类器是线性判别分析(LDA)。近年来，为了解决高维分类问题，提出了许多正则化LDA分类器。然而，这些方法依赖于反转一个大矩阵或解决大规模优化问题来呈现分类规则——当维度超高时，这些方法在计算上是禁止的。随着大数据的出现，开发更高效的算法来解决高维LDA问题变得越来越重要。在本文中，我们提出了一种高效的贪婪搜索算法，该算法仅依赖于封闭形式的公式来学习高维LDA规则。从变量选择和错误率一致性两个方面建立了其统计性质的理论保证;此外，我们在一些温和的分布假设下，对LDA问题中由附加特征带来的额外信息提供了明确的解释。我们证明，与其他高维LDA方法相比，这种新算法大大提高了计算速度，同时保持了相当甚至更好的分类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistica Sinica 数学-统计学与概率论

CiteScore

2.10

自引率

0.00%

发文量

审稿时长

10.5 months

期刊介绍： Statistica Sinica aims to meet the needs of statisticians in a rapidly changing world. It provides a forum for the publication of innovative work of high quality in all areas of statistics, including theory, methodology and applications. The journal encourages the development and principled use of statistical methodology that is relevant for society, science and technology.