一种分布式无监督学习算法及其对物理观测的适用性

IF 0.7 Q4 COMPUTER SCIENCE, THEORY & METHODS

International Journal of Parallel Emergent and Distributed Systems Pub Date : 2022-02-28 DOI:10.1080/17445760.2022.2042536

R. Hes, Giacomo Gioroli

{"title":"一种分布式无监督学习算法及其对物理观测的适用性","authors":"R. Hes, Giacomo Gioroli","doi":"10.1080/17445760.2022.2042536","DOIUrl":null,"url":null,"abstract":"Large datasets pose a difficult challenge for clustering algorithms due to memory limitations and execution speed. Clustering is typically addressed with current popular techniques: K-Means and DBScan, which are inherently tightly coupled to all points in the data set. K-Means clustering is based on cluster centres and requires prior knowledge of the number of classes present in the dataset. DBScan relaxes this constraint but retains the need for a complete dataset during computation. In this paper, a novel ‘self’-learning primitive unsupervised technique is presented that addresses the tight coupling and is readily distributable. The technique follows the comparison to class averages similar to K-Means yet relaxes the constraint of prior knowledge of the number of classes, similar to DBScan. The algorithm competes well with the standardised K-Means and DBScan variants in the context of physically based observations where Gaussian noise can be presumed. An application of usage of the unsupervised technique is presented; the classification of unknown whale species in the cook strait of New Zealand is shown to perform well. GRAPHICAL ABSTRACT","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":"37 1","pages":"443 - 455"},"PeriodicalIF":0.7000,"publicationDate":"2022-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A distributed unsupervised learning algorithm and its suitability to physical based observation\",\"authors\":\"R. Hes, Giacomo Gioroli\",\"doi\":\"10.1080/17445760.2022.2042536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large datasets pose a difficult challenge for clustering algorithms due to memory limitations and execution speed. Clustering is typically addressed with current popular techniques: K-Means and DBScan, which are inherently tightly coupled to all points in the data set. K-Means clustering is based on cluster centres and requires prior knowledge of the number of classes present in the dataset. DBScan relaxes this constraint but retains the need for a complete dataset during computation. In this paper, a novel ‘self’-learning primitive unsupervised technique is presented that addresses the tight coupling and is readily distributable. The technique follows the comparison to class averages similar to K-Means yet relaxes the constraint of prior knowledge of the number of classes, similar to DBScan. The algorithm competes well with the standardised K-Means and DBScan variants in the context of physically based observations where Gaussian noise can be presumed. An application of usage of the unsupervised technique is presented; the classification of unknown whale species in the cook strait of New Zealand is shown to perform well. GRAPHICAL ABSTRACT\",\"PeriodicalId\":45411,\"journal\":{\"name\":\"International Journal of Parallel Emergent and Distributed Systems\",\"volume\":\"37 1\",\"pages\":\"443 - 455\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2022-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Parallel Emergent and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/17445760.2022.2042536\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Emergent and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/17445760.2022.2042536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

由于内存和执行速度的限制，大型数据集对聚类算法提出了困难的挑战。聚类通常使用当前流行的技术来解决:K-Means和DBScan，它们本质上与数据集中的所有点紧密耦合。K-Means聚类是基于聚类中心的，需要预先知道数据集中存在的类的数量。DBScan放宽了这一限制，但在计算期间保留了对完整数据集的需求。本文提出了一种新颖的“自”学习原始无监督技术，解决了紧耦合和易分布的问题。该技术遵循与类平均的比较，类似于K-Means，但放松了类数量的先验知识的约束，类似于DBScan。在可以假定高斯噪声的基于物理的观测环境中，该算法与标准化K-Means和DBScan变体竞争得很好。介绍了无监督技术的一个应用;新西兰库克海峡的未知鲸鱼种类分类表现良好。图形抽象

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A distributed unsupervised learning algorithm and its suitability to physical based observation

Large datasets pose a difficult challenge for clustering algorithms due to memory limitations and execution speed. Clustering is typically addressed with current popular techniques: K-Means and DBScan, which are inherently tightly coupled to all points in the data set. K-Means clustering is based on cluster centres and requires prior knowledge of the number of classes present in the dataset. DBScan relaxes this constraint but retains the need for a complete dataset during computation. In this paper, a novel ‘self’-learning primitive unsupervised technique is presented that addresses the tight coupling and is readily distributable. The technique follows the comparison to class averages similar to K-Means yet relaxes the constraint of prior knowledge of the number of classes, similar to DBScan. The algorithm competes well with the standardised K-Means and DBScan variants in the context of physically based observations where Gaussian noise can be presumed. An application of usage of the unsupervised technique is presented; the classification of unknown whale species in the cook strait of New Zealand is shown to perform well. GRAPHICAL ABSTRACT

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Parallel Emergent and Distributed Systems COMPUTER SCIENCE, THEORY & METHODS-

CiteScore

2.30

自引率

0.00%

发文量