Principal Curves

Pub Date : 1900-01-01 DOI:10.2307/2289936

T. Hastie, W. Stuetzle

{"title":"Principal Curves","authors":"T. Hastie, W. Stuetzle","doi":"10.2307/2289936","DOIUrl":null,"url":null,"abstract":"Principal curves are smooth one-dimensional curves that pass through the middle of a p-dimensional data set, providing a nonlinear summary of the data. They are nonparametric, and their shape is suggested by the data. The algorithm for constructing principal curves starts with some prior summary, such as the usual principal-component li e. The curve in each successive iteration is a smooth or local average of the p-dimensional points, where the definition of local is based on the distance in arc length of the projections of the points onto the curve found in the previous iteration. In this article principal curves are defined, an algorithm for their construction is given, some theoretical results are presented, and the procedure is compared to other generalizations ofprincipal components. Two applications illustrate the use of principal curves. The first describes how the principal-curve procedure was used to align the magnets of the Stanford linear collider. The collider uses about 950 magnets in a roughly circular arrangement tobend electron and positron beams and bring them to collision. After construction, it was found that some of the magnets had ended up significantly outof place. As a result, the beams had to be bent too sharply and could not be focused. The engineers realized that the magnets did not have to be moved to their originally planned locations, but rather to a sufficiently smooth arc through the middle of the existing positions. This arc was found using the principalcurve procedure. In the second application, two different assays for gold content in several samples of computer-chip waste appear to show some systematic differences that are blurred by measurement error. The classical approach using linear errors in variables regression can detect systematic linear differences but is not able to account for nonlinearities. When the first linear principal component is replaced with a principal curve, a local \"bump\" is revealed, and bootstrapping is used to verify its presence.","PeriodicalId":0,"journal":{"name":"","volume":" ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"925","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2307/2289936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 925

Abstract

Principal curves are smooth one-dimensional curves that pass through the middle of a p-dimensional data set, providing a nonlinear summary of the data. They are nonparametric, and their shape is suggested by the data. The algorithm for constructing principal curves starts with some prior summary, such as the usual principal-component li e. The curve in each successive iteration is a smooth or local average of the p-dimensional points, where the definition of local is based on the distance in arc length of the projections of the points onto the curve found in the previous iteration. In this article principal curves are defined, an algorithm for their construction is given, some theoretical results are presented, and the procedure is compared to other generalizations ofprincipal components. Two applications illustrate the use of principal curves. The first describes how the principal-curve procedure was used to align the magnets of the Stanford linear collider. The collider uses about 950 magnets in a roughly circular arrangement tobend electron and positron beams and bring them to collision. After construction, it was found that some of the magnets had ended up significantly outof place. As a result, the beams had to be bent too sharply and could not be focused. The engineers realized that the magnets did not have to be moved to their originally planned locations, but rather to a sufficiently smooth arc through the middle of the existing positions. This arc was found using the principalcurve procedure. In the second application, two different assays for gold content in several samples of computer-chip waste appear to show some systematic differences that are blurred by measurement error. The classical approach using linear errors in variables regression can detect systematic linear differences but is not able to account for nonlinearities. When the first linear principal component is replaced with a principal curve, a local "bump" is revealed, and bootstrapping is used to verify its presence.

查看原文

主曲线

主曲线是平滑的一维曲线，它穿过p维数据集的中间，提供数据的非线性汇总。它们是非参数的，它们的形状由数据决定。构造主曲线的算法从一些先前的总结开始，例如通常的主成分li e。每次连续迭代中的曲线是p维点的光滑或局部平均，其中局部的定义是基于在前一次迭代中发现的点的投影到曲线上的弧长距离。本文定义了主成分曲线，给出了构造主成分曲线的一种算法，给出了一些理论结果，并与其它主成分的推广方法进行了比较。两个应用说明了主曲线的使用。第一章描述了如何使用主曲线程序来对准斯坦福直线对撞机的磁体。对撞机使用大约950块磁铁，大致呈圆形排列，弯曲电子和正电子束，并使它们碰撞。施工结束后，人们发现一些磁铁最终明显错位了。因此，光束必须弯曲得太厉害，无法聚焦。工程师们意识到，磁铁不必移动到原来计划的位置，而是在现有位置的中间形成一个足够光滑的弧形。这条弧是用原理曲线法得到的。在第二个应用中，对几个计算机芯片废料样品中的含金量进行两种不同的测定，似乎显示出一些由于测量误差而模糊的系统差异。在变量回归中使用线性误差的经典方法可以检测到系统的线性差异，但不能解释非线性。当第一个线性主成分被替换为一个主曲线时，一个局部的“凸起”被显示出来，并使用自举来验证它的存在。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文