{"title":"Linear-time approximation scheme for k-means clustering of axis-parallel affine subspaces","authors":"Kyungjin Cho, Eunjin Oh","doi":"10.1016/j.comgeo.2023.101981","DOIUrl":null,"url":null,"abstract":"<div><p>In this paper, we present a linear-time approximation scheme for <em>k</em>-means clustering of <em>incomplete</em> data points in <em>d</em>-dimensional Euclidean space. An <em>incomplete</em> data point with <span><math><mi>Δ</mi><mo>></mo><mn>0</mn></math></span><span><span> unspecified entries is represented as an axis-parallel affine subspace of dimension Δ. The distance between two incomplete data points is defined as the </span>Euclidean distance between two closest points in the axis-parallel affine subspaces corresponding to the data points. We present an algorithm for </span><em>k</em>-means clustering of <em>n</em> axis-parallel affine subspaces of dimension Δ that yields an <span><math><mo>(</mo><mn>1</mn><mo>+</mo><mi>ϵ</mi><mo>)</mo></math></span>-approximate solution in <span><math><mi>O</mi><mo>(</mo><mi>n</mi><mi>d</mi><mo>)</mo></math></span> time. The constants hidden behind <span><math><mi>O</mi><mo>(</mo><mo>⋅</mo><mo>)</mo></math></span> depend only on <span><math><mi>Δ</mi><mo>,</mo><mi>ϵ</mi></math></span> and <em>k</em>. This improves the <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>d</mi><mo>)</mo></math></span>-time algorithm by Eiben et al. (2021) <span>[7]</span> by a factor of <em>n</em>.</p></div>","PeriodicalId":51001,"journal":{"name":"Computational Geometry-Theory and Applications","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Geometry-Theory and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925772123000019","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we present a linear-time approximation scheme for k-means clustering of incomplete data points in d-dimensional Euclidean space. An incomplete data point with unspecified entries is represented as an axis-parallel affine subspace of dimension Δ. The distance between two incomplete data points is defined as the Euclidean distance between two closest points in the axis-parallel affine subspaces corresponding to the data points. We present an algorithm for k-means clustering of n axis-parallel affine subspaces of dimension Δ that yields an -approximate solution in time. The constants hidden behind depend only on and k. This improves the -time algorithm by Eiben et al. (2021) [7] by a factor of n.
期刊介绍:
Computational Geometry is a forum for research in theoretical and applied aspects of computational geometry. The journal publishes fundamental research in all areas of the subject, as well as disseminating information on the applications, techniques, and use of computational geometry. Computational Geometry publishes articles on the design and analysis of geometric algorithms. All aspects of computational geometry are covered, including the numerical, graph theoretical and combinatorial aspects. Also welcomed are computational geometry solutions to fundamental problems arising in computer graphics, pattern recognition, robotics, image processing, CAD-CAM, VLSI design and geographical information systems.
Computational Geometry features a special section containing open problems and concise reports on implementations of computational geometry tools.