{"title":"Clustering, coding, and the concept of similarity","authors":"L. Thorne McCarty","doi":"10.1007/s10472-024-09929-7","DOIUrl":null,"url":null,"abstract":"<div><p>This paper develops a theory of <i>clustering</i> and <i>coding</i> that combines a geometric model with a probabilistic model in a principled way. The geometric model is a Riemannian manifold with a Riemannian metric, <span>\\({g}_{ij}(\\textbf{x})\\)</span>, which we interpret as a measure of <i>dissimilarity</i>. The probabilistic model consists of a stochastic process with an invariant probability measure that matches the density of the sample input data. The link between the two models is a potential function, <span>\\(U(\\textbf{x})\\)</span>, and its gradient, <span>\\(\\nabla U(\\textbf{x})\\)</span>. We use the gradient to define the dissimilarity metric, which guarantees that our measure of dissimilarity will depend on the probability measure. Finally, we use the dissimilarity metric to define a coordinate system on the embedded Riemannian manifold, which gives us a low-dimensional encoding of our original data.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 5","pages":"1197 - 1248"},"PeriodicalIF":1.2000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Mathematics and Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10472-024-09929-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper develops a theory of clustering and coding that combines a geometric model with a probabilistic model in a principled way. The geometric model is a Riemannian manifold with a Riemannian metric, \({g}_{ij}(\textbf{x})\), which we interpret as a measure of dissimilarity. The probabilistic model consists of a stochastic process with an invariant probability measure that matches the density of the sample input data. The link between the two models is a potential function, \(U(\textbf{x})\), and its gradient, \(\nabla U(\textbf{x})\). We use the gradient to define the dissimilarity metric, which guarantees that our measure of dissimilarity will depend on the probability measure. Finally, we use the dissimilarity metric to define a coordinate system on the embedded Riemannian manifold, which gives us a low-dimensional encoding of our original data.
期刊介绍:
Annals of Mathematics and Artificial Intelligence presents a range of topics of concern to scholars applying quantitative, combinatorial, logical, algebraic and algorithmic methods to diverse areas of Artificial Intelligence, from decision support, automated deduction, and reasoning, to knowledge-based systems, machine learning, computer vision, robotics and planning.
The journal features collections of papers appearing either in volumes (400 pages) or in separate issues (100-300 pages), which focus on one topic and have one or more guest editors.
Annals of Mathematics and Artificial Intelligence hopes to influence the spawning of new areas of applied mathematics and strengthen the scientific underpinnings of Artificial Intelligence.