{"title":"一般核矩阵的数据驱动线性复杂度低秩逼近:一种几何方法","authors":"Difeng Cai, Edmond Chow, Yuanzhe Xi","doi":"10.1002/nla.2519","DOIUrl":null,"url":null,"abstract":"A general, <i>rectangular</i> kernel matrix may be defined as <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0001\" display=\"inline\" location=\"graphic/nla2519-math-0001.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<msub>\n<mrow>\n<mi>K</mi>\n</mrow>\n<mrow>\n<mi>i</mi>\n<mi>j</mi>\n</mrow>\n</msub>\n<mo>=</mo>\n<mi>κ</mi>\n<mo stretchy=\"false\">(</mo>\n<msub>\n<mrow>\n<mi>x</mi>\n</mrow>\n<mrow>\n<mi>i</mi>\n</mrow>\n</msub>\n<mo>,</mo>\n<msub>\n<mrow>\n<mi>y</mi>\n</mrow>\n<mrow>\n<mi>j</mi>\n</mrow>\n</msub>\n<mo stretchy=\"false\">)</mo>\n</mrow>\n$$ {K}_{ij}=\\kappa \\left({x}_i,{y}_j\\right) $$</annotation>\n</semantics></math> where <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0002\" display=\"inline\" location=\"graphic/nla2519-math-0002.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>κ</mi>\n<mo stretchy=\"false\">(</mo>\n<mi>x</mi>\n<mo>,</mo>\n<mi>y</mi>\n<mo stretchy=\"false\">)</mo>\n</mrow>\n$$ \\kappa \\left(x,y\\right) $$</annotation>\n</semantics></math> is a kernel function and where <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0003\" display=\"inline\" location=\"graphic/nla2519-math-0003.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>X</mi>\n<mo>=</mo>\n<msubsup>\n<mrow>\n<mo stretchy=\"false\">{</mo>\n<msub>\n<mrow>\n<mi>x</mi>\n</mrow>\n<mrow>\n<mi>i</mi>\n</mrow>\n</msub>\n<mo stretchy=\"false\">}</mo>\n</mrow>\n<mrow>\n<mi>i</mi>\n<mo>=</mo>\n<mn>1</mn>\n</mrow>\n<mrow>\n<mi>m</mi>\n</mrow>\n</msubsup>\n</mrow>\n$$ X={\\left\\{{x}_i\\right\\}}_{i=1}^m $$</annotation>\n</semantics></math> and <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0004\" display=\"inline\" location=\"graphic/nla2519-math-0004.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>Y</mi>\n<mo>=</mo>\n<msubsup>\n<mrow>\n<mo stretchy=\"false\">{</mo>\n<msub>\n<mrow>\n<mi>y</mi>\n</mrow>\n<mrow>\n<mi>i</mi>\n</mrow>\n</msub>\n<mo stretchy=\"false\">}</mo>\n</mrow>\n<mrow>\n<mi>i</mi>\n<mo>=</mo>\n<mn>1</mn>\n</mrow>\n<mrow>\n<mi>n</mi>\n</mrow>\n</msubsup>\n</mrow>\n$$ Y={\\left\\{{y}_i\\right\\}}_{i=1}^n $$</annotation>\n</semantics></math> are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0005\" display=\"inline\" location=\"graphic/nla2519-math-0005.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>X</mi>\n</mrow>\n$$ X $$</annotation>\n</semantics></math> and <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0006\" display=\"inline\" location=\"graphic/nla2519-math-0006.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>Y</mi>\n</mrow>\n$$ Y $$</annotation>\n</semantics></math> are large and are arbitrarily distributed, such as away from each other, “intermingled”, identical, and so forth. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0007\" display=\"inline\" location=\"graphic/nla2519-math-0007.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>X</mi>\n</mrow>\n$$ X $$</annotation>\n</semantics></math> corresponds to the training data and <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0008\" display=\"inline\" location=\"graphic/nla2519-math-0008.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>Y</mi>\n</mrow>\n$$ Y $$</annotation>\n</semantics></math> corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linearly with respect to the size of data for a fixed approximation rank. The main idea in this paper is to <i>geometrically</i> select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.","PeriodicalId":49731,"journal":{"name":"Numerical Linear Algebra with Applications","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2023-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Data-driven linear complexity low-rank approximation of general kernel matrices: A geometric approach\",\"authors\":\"Difeng Cai, Edmond Chow, Yuanzhe Xi\",\"doi\":\"10.1002/nla.2519\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A general, <i>rectangular</i> kernel matrix may be defined as <math altimg=\\\"urn:x-wiley:nla:media:nla2519:nla2519-math-0001\\\" display=\\\"inline\\\" location=\\\"graphic/nla2519-math-0001.png\\\" overflow=\\\"scroll\\\">\\n<semantics>\\n<mrow>\\n<msub>\\n<mrow>\\n<mi>K</mi>\\n</mrow>\\n<mrow>\\n<mi>i</mi>\\n<mi>j</mi>\\n</mrow>\\n</msub>\\n<mo>=</mo>\\n<mi>κ</mi>\\n<mo stretchy=\\\"false\\\">(</mo>\\n<msub>\\n<mrow>\\n<mi>x</mi>\\n</mrow>\\n<mrow>\\n<mi>i</mi>\\n</mrow>\\n</msub>\\n<mo>,</mo>\\n<msub>\\n<mrow>\\n<mi>y</mi>\\n</mrow>\\n<mrow>\\n<mi>j</mi>\\n</mrow>\\n</msub>\\n<mo stretchy=\\\"false\\\">)</mo>\\n</mrow>\\n$$ {K}_{ij}=\\\\kappa \\\\left({x}_i,{y}_j\\\\right) $$</annotation>\\n</semantics></math> where <math altimg=\\\"urn:x-wiley:nla:media:nla2519:nla2519-math-0002\\\" display=\\\"inline\\\" location=\\\"graphic/nla2519-math-0002.png\\\" overflow=\\\"scroll\\\">\\n<semantics>\\n<mrow>\\n<mi>κ</mi>\\n<mo stretchy=\\\"false\\\">(</mo>\\n<mi>x</mi>\\n<mo>,</mo>\\n<mi>y</mi>\\n<mo stretchy=\\\"false\\\">)</mo>\\n</mrow>\\n$$ \\\\kappa \\\\left(x,y\\\\right) $$</annotation>\\n</semantics></math> is a kernel function and where <math altimg=\\\"urn:x-wiley:nla:media:nla2519:nla2519-math-0003\\\" display=\\\"inline\\\" location=\\\"graphic/nla2519-math-0003.png\\\" overflow=\\\"scroll\\\">\\n<semantics>\\n<mrow>\\n<mi>X</mi>\\n<mo>=</mo>\\n<msubsup>\\n<mrow>\\n<mo stretchy=\\\"false\\\">{</mo>\\n<msub>\\n<mrow>\\n<mi>x</mi>\\n</mrow>\\n<mrow>\\n<mi>i</mi>\\n</mrow>\\n</msub>\\n<mo stretchy=\\\"false\\\">}</mo>\\n</mrow>\\n<mrow>\\n<mi>i</mi>\\n<mo>=</mo>\\n<mn>1</mn>\\n</mrow>\\n<mrow>\\n<mi>m</mi>\\n</mrow>\\n</msubsup>\\n</mrow>\\n$$ X={\\\\left\\\\{{x}_i\\\\right\\\\}}_{i=1}^m $$</annotation>\\n</semantics></math> and <math altimg=\\\"urn:x-wiley:nla:media:nla2519:nla2519-math-0004\\\" display=\\\"inline\\\" location=\\\"graphic/nla2519-math-0004.png\\\" overflow=\\\"scroll\\\">\\n<semantics>\\n<mrow>\\n<mi>Y</mi>\\n<mo>=</mo>\\n<msubsup>\\n<mrow>\\n<mo stretchy=\\\"false\\\">{</mo>\\n<msub>\\n<mrow>\\n<mi>y</mi>\\n</mrow>\\n<mrow>\\n<mi>i</mi>\\n</mrow>\\n</msub>\\n<mo stretchy=\\\"false\\\">}</mo>\\n</mrow>\\n<mrow>\\n<mi>i</mi>\\n<mo>=</mo>\\n<mn>1</mn>\\n</mrow>\\n<mrow>\\n<mi>n</mi>\\n</mrow>\\n</msubsup>\\n</mrow>\\n$$ Y={\\\\left\\\\{{y}_i\\\\right\\\\}}_{i=1}^n $$</annotation>\\n</semantics></math> are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points <math altimg=\\\"urn:x-wiley:nla:media:nla2519:nla2519-math-0005\\\" display=\\\"inline\\\" location=\\\"graphic/nla2519-math-0005.png\\\" overflow=\\\"scroll\\\">\\n<semantics>\\n<mrow>\\n<mi>X</mi>\\n</mrow>\\n$$ X $$</annotation>\\n</semantics></math> and <math altimg=\\\"urn:x-wiley:nla:media:nla2519:nla2519-math-0006\\\" display=\\\"inline\\\" location=\\\"graphic/nla2519-math-0006.png\\\" overflow=\\\"scroll\\\">\\n<semantics>\\n<mrow>\\n<mi>Y</mi>\\n</mrow>\\n$$ Y $$</annotation>\\n</semantics></math> are large and are arbitrarily distributed, such as away from each other, “intermingled”, identical, and so forth. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where <math altimg=\\\"urn:x-wiley:nla:media:nla2519:nla2519-math-0007\\\" display=\\\"inline\\\" location=\\\"graphic/nla2519-math-0007.png\\\" overflow=\\\"scroll\\\">\\n<semantics>\\n<mrow>\\n<mi>X</mi>\\n</mrow>\\n$$ X $$</annotation>\\n</semantics></math> corresponds to the training data and <math altimg=\\\"urn:x-wiley:nla:media:nla2519:nla2519-math-0008\\\" display=\\\"inline\\\" location=\\\"graphic/nla2519-math-0008.png\\\" overflow=\\\"scroll\\\">\\n<semantics>\\n<mrow>\\n<mi>Y</mi>\\n</mrow>\\n$$ Y $$</annotation>\\n</semantics></math> corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linearly with respect to the size of data for a fixed approximation rank. The main idea in this paper is to <i>geometrically</i> select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.\",\"PeriodicalId\":49731,\"journal\":{\"name\":\"Numerical Linear Algebra with Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Numerical Linear Algebra with Applications\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1002/nla.2519\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Numerical Linear Algebra with Applications","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/nla.2519","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 2
摘要
一般的矩形核矩阵可以定义为Kij=κ(xi,yj) $$ {K}_{ij}=\kappa \left({x}_i,{y}_j\right) $$,其中κ(x,y) $$ \kappa \left(x,y\right) $$是一个核函数,其中x ={xii}=1m $$ X={\left\{{x}_i\right\}}_{i=1}^m $$和y ={yii}=1n $$ Y={\left\{{y}_i\right\}}_{i=1}^n $$是两组点。在本文中,我们寻求一个核矩阵的低秩近似,其中点X $$ X $$和Y $$ Y $$的集合很大并且是任意分布的,例如彼此远离,“混合”,相同,等等。例如,在高斯过程回归中可能会出现这样的矩形核矩阵,其中X $$ X $$对应训练数据,Y $$ Y $$对应测试数据。在这种情况下,点通常是高维的。由于点集很大,我们必须利用矩阵由核函数产生的事实,避免形成矩阵,从而排除了大多数代数技术。特别是,我们寻求的方法,可以线性或接近线性缩放相对于固定的近似秩的数据的大小。本文的主要思想是几何地选择适当的点子集来构造低秩逼近。本文的分析指导了如何进行这种选择。
Data-driven linear complexity low-rank approximation of general kernel matrices: A geometric approach
A general, rectangular kernel matrix may be defined as where is a kernel function and where and are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points and are large and are arbitrarily distributed, such as away from each other, “intermingled”, identical, and so forth. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where corresponds to the training data and corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linearly with respect to the size of data for a fixed approximation rank. The main idea in this paper is to geometrically select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.
期刊介绍:
Manuscripts submitted to Numerical Linear Algebra with Applications should include large-scale broad-interest applications in which challenging computational results are integral to the approach investigated and analysed. Manuscripts that, in the Editor’s view, do not satisfy these conditions will not be accepted for review.
Numerical Linear Algebra with Applications receives submissions in areas that address developing, analysing and applying linear algebra algorithms for solving problems arising in multilinear (tensor) algebra, in statistics, such as Markov Chains, as well as in deterministic and stochastic modelling of large-scale networks, algorithm development, performance analysis or related computational aspects.
Topics covered include: Standard and Generalized Conjugate Gradients, Multigrid and Other Iterative Methods; Preconditioning Methods; Direct Solution Methods; Numerical Methods for Eigenproblems; Newton-like Methods for Nonlinear Equations; Parallel and Vectorizable Algorithms in Numerical Linear Algebra; Application of Methods of Numerical Linear Algebra in Science, Engineering and Economics.