Clustering and Geodesic Scaling of Dissimilarities on the Spherical Surface

IF 1.4 4区 数学 Q3 BIOLOGY
{"title":"Clustering and Geodesic Scaling of Dissimilarities on the Spherical Surface","authors":"","doi":"10.1007/s13253-023-00597-4","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>Spherical embedding is an important tool in several fields of data analysis, including environmental data, spatial statistics, text mining, gene expression analysis, medical research and, in general, areas in which the geodesic distance is a relevant factor. Many data acquisition technologies are related to massive data acquisition, and these high-dimensional vectors are often normalised and transformed into spherical data. In this representation of data on spherical surfaces, multidimensional scaling plays an important role. Traditionally, the methods of clustering and representation have been combined, since the precision of the representation tends to decrease when a large number of objects are involved, which makes interpretation difficult. In this paper, we present a model that partitions objects into classes while simultaneously representing the cluster centres on a spherical surface based on geodesic distances. The model combines a partition algorithm based on the approximation of dissimilarities to geodesic distances with a representation procedure for geodesic distances. In this process, the dissimilarities are transformed in order to optimise the radius of the sphere. The efficiency of the procedure described is analysed by means of an extensive Monte Carlo experiment, and its usefulness is illustrated for real data sets. Supplementary material to this paper is provided online.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"2 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Agricultural Biological and Environmental Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s13253-023-00597-4","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Spherical embedding is an important tool in several fields of data analysis, including environmental data, spatial statistics, text mining, gene expression analysis, medical research and, in general, areas in which the geodesic distance is a relevant factor. Many data acquisition technologies are related to massive data acquisition, and these high-dimensional vectors are often normalised and transformed into spherical data. In this representation of data on spherical surfaces, multidimensional scaling plays an important role. Traditionally, the methods of clustering and representation have been combined, since the precision of the representation tends to decrease when a large number of objects are involved, which makes interpretation difficult. In this paper, we present a model that partitions objects into classes while simultaneously representing the cluster centres on a spherical surface based on geodesic distances. The model combines a partition algorithm based on the approximation of dissimilarities to geodesic distances with a representation procedure for geodesic distances. In this process, the dissimilarities are transformed in order to optimise the radius of the sphere. The efficiency of the procedure described is analysed by means of an extensive Monte Carlo experiment, and its usefulness is illustrated for real data sets. Supplementary material to this paper is provided online.

球面上异质性的聚类和大地缩放
摘要 球形嵌入是多个数据分析领域的重要工具,包括环境数据、空间统计、文本挖掘、基因表达分析、医学研究以及一般以大地距离为相关因素的领域。许多数据采集技术都与海量数据采集有关,这些高维矢量通常会被归一化并转换成球形数据。在球面数据的表示中,多维缩放起着重要作用。传统上,聚类和表示的方法是结合在一起的,因为当涉及大量对象时,表示的精度往往会降低,从而给解释带来困难。在本文中,我们提出了一种将物体划分为不同类别的模型,同时根据大地距离在球面上表示聚类中心。该模型结合了基于相似度与大地测量距离近似的划分算法和大地测量距离的表示程序。在这一过程中,为了优化球面的半径,对相似度进行了转换。本文通过大量蒙特卡罗实验分析了所述程序的效率,并在实际数据集上说明了该程序的实用性。本文的补充材料可在线查阅。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.70
自引率
7.10%
发文量
38
审稿时长
>12 weeks
期刊介绍: The Journal of Agricultural, Biological and Environmental Statistics (JABES) publishes papers that introduce new statistical methods to solve practical problems in the agricultural sciences, the biological sciences (including biotechnology), and the environmental sciences (including those dealing with natural resources). Papers that apply existing methods in a novel context are also encouraged. Interdisciplinary papers and papers that illustrate the application of new and important statistical methods using real data are strongly encouraged. The journal does not normally publish papers that have a primary focus on human genetics, human health, or medical statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信