谱方法和正则化MLE都是TOP-K排序的最优方法。

IF 3.2 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics Pub Date : 2019-01-01 Epub Date: 2019-05-21 DOI:10.1214/18-AOS1745

Yuxin Chen, Jianqing Fan, Cong Ma, Kaizheng Wang

{"title":"谱方法和正则化MLE都是TOP-K排序的最优方法。","authors":"Yuxin Chen, Jianqing Fan, Cong Ma, Kaizheng Wang","doi":"10.1214/18-AOS1745","DOIUrl":null,"url":null,"abstract":"This paper is concerned with the problem of top-K ranking from pairwise comparisons. Given a collection of n items and a few pairwise comparisons across them, one wishes to identify the set of K items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model - the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress towards characterizing the performance (e.g. the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-K ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity - the number of paired comparisons needed to ensure exact top-K identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and non-iterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis-Kahan <math><mtext>Θ</mtext></math> theorem for symmetric matrices. This also allows us to close the gap between the <math><msub><mi>l</mi> <mn>2</mn></msub> </math> error upper bound for the spectral method and the minimax lower limit.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"47 4","pages":"2204-2235"},"PeriodicalIF":3.2000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/18-AOS1745","citationCount":"102","resultStr":"{\"title\":\"SPECTRAL METHOD AND REGULARIZED MLE ARE BOTH OPTIMAL FOR TOP-K RANKING.\",\"authors\":\"Yuxin Chen, Jianqing Fan, Cong Ma, Kaizheng Wang\",\"doi\":\"10.1214/18-AOS1745\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper is concerned with the problem of top-K ranking from pairwise comparisons. Given a collection of n items and a few pairwise comparisons across them, one wishes to identify the set of K items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model - the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress towards characterizing the performance (e.g. the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-K ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity - the number of paired comparisons needed to ensure exact top-K identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and non-iterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis-Kahan <math><mtext>Θ</mtext></math> theorem for symmetric matrices. This also allows us to close the gap between the <math><msub><mi>l</mi> <mn>2</mn></msub> </math> error upper bound for the spectral method and the minimax lower limit.\",\"PeriodicalId\":8032,\"journal\":{\"name\":\"Annals of Statistics\",\"volume\":\"47 4\",\"pages\":\"2204-2235\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1214/18-AOS1745\",\"citationCount\":\"102\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/18-AOS1745\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/5/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/18-AOS1745","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/5/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 102

摘要

本文研究了由成对比较得到的前K排序问题。给定n个项目的集合和它们之间的一些成对比较，希望识别接收最高秩的K个项目的集。为了解决这个问题，我们采用了逻辑参数模型——Bradley Terry-Luce模型，其中每个项目都被分配了一个潜在的偏好得分，并且每个配对比较的结果仅取决于所涉及的两个项目的相对得分。最近的工作在表征几种经典方法的性能（例如，用于估计分数的均方误差）方面取得了重大进展，包括谱方法和最大似然估计器（MLE）。然而，他们在排名前K的问题上的立场仍然悬而未决。我们证明，在自然随机采样模型下，就样本复杂性而言，单独的谱方法或单独的正则化MLE是最小-最大最优的。样本复杂性是在固定的动态范围内，确保精确的top-K识别所需的配对比较数。这是通过对得分估计的入口误差的最优控制来实现的。我们通过数值实验补充了我们的理论研究，证实了这两种方法在估计基本分数时都会产生较低的入口误差。我们的理论是通过一种新颖的省略一技巧建立的，该技巧被证明对分析迭代和非迭代过程都是有效的。在此过程中，我们导出了概率转移矩阵的基本特征向量扰动界，这与对称矩阵的Davis-Kahanθ定理相似。这也使我们能够缩小光谱方法的l2误差上限和最小-最大下限之间的差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

SPECTRAL METHOD AND REGULARIZED MLE ARE BOTH OPTIMAL FOR TOP-K RANKING.

查看原文本刊更多论文

SPECTRAL METHOD AND REGULARIZED MLE ARE BOTH OPTIMAL FOR TOP-K RANKING.

This paper is concerned with the problem of top-K ranking from pairwise comparisons. Given a collection of n items and a few pairwise comparisons across them, one wishes to identify the set of K items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model - the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress towards characterizing the performance (e.g. the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-K ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity - the number of paired comparisons needed to ensure exact top-K identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and non-iterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis-Kahan $Θ$ theorem for symmetric matrices. This also allows us to close the gap between the $l_{2}$ error upper bound for the spectral method and the minimax lower limit.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of Statistics 数学-统计学与概率论

CiteScore

9.30

自引率

8.90%

发文量

119

审稿时长

6-12 weeks

期刊介绍： The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.