{"title":"一种分布式无监督学习算法及其对物理观测的适用性","authors":"R. Hes, Giacomo Gioroli","doi":"10.1080/17445760.2022.2042536","DOIUrl":null,"url":null,"abstract":"Large datasets pose a difficult challenge for clustering algorithms due to memory limitations and execution speed. Clustering is typically addressed with current popular techniques: K-Means and DBScan, which are inherently tightly coupled to all points in the data set. K-Means clustering is based on cluster centres and requires prior knowledge of the number of classes present in the dataset. DBScan relaxes this constraint but retains the need for a complete dataset during computation. In this paper, a novel ‘self’-learning primitive unsupervised technique is presented that addresses the tight coupling and is readily distributable. The technique follows the comparison to class averages similar to K-Means yet relaxes the constraint of prior knowledge of the number of classes, similar to DBScan. The algorithm competes well with the standardised K-Means and DBScan variants in the context of physically based observations where Gaussian noise can be presumed. An application of usage of the unsupervised technique is presented; the classification of unknown whale species in the cook strait of New Zealand is shown to perform well. GRAPHICAL ABSTRACT","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2022-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A distributed unsupervised learning algorithm and its suitability to physical based observation\",\"authors\":\"R. Hes, Giacomo Gioroli\",\"doi\":\"10.1080/17445760.2022.2042536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large datasets pose a difficult challenge for clustering algorithms due to memory limitations and execution speed. Clustering is typically addressed with current popular techniques: K-Means and DBScan, which are inherently tightly coupled to all points in the data set. K-Means clustering is based on cluster centres and requires prior knowledge of the number of classes present in the dataset. DBScan relaxes this constraint but retains the need for a complete dataset during computation. In this paper, a novel ‘self’-learning primitive unsupervised technique is presented that addresses the tight coupling and is readily distributable. The technique follows the comparison to class averages similar to K-Means yet relaxes the constraint of prior knowledge of the number of classes, similar to DBScan. The algorithm competes well with the standardised K-Means and DBScan variants in the context of physically based observations where Gaussian noise can be presumed. An application of usage of the unsupervised technique is presented; the classification of unknown whale species in the cook strait of New Zealand is shown to perform well. GRAPHICAL ABSTRACT\",\"PeriodicalId\":45411,\"journal\":{\"name\":\"International Journal of Parallel Emergent and Distributed Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Parallel Emergent and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/17445760.2022.2042536\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Emergent and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/17445760.2022.2042536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
A distributed unsupervised learning algorithm and its suitability to physical based observation
Large datasets pose a difficult challenge for clustering algorithms due to memory limitations and execution speed. Clustering is typically addressed with current popular techniques: K-Means and DBScan, which are inherently tightly coupled to all points in the data set. K-Means clustering is based on cluster centres and requires prior knowledge of the number of classes present in the dataset. DBScan relaxes this constraint but retains the need for a complete dataset during computation. In this paper, a novel ‘self’-learning primitive unsupervised technique is presented that addresses the tight coupling and is readily distributable. The technique follows the comparison to class averages similar to K-Means yet relaxes the constraint of prior knowledge of the number of classes, similar to DBScan. The algorithm competes well with the standardised K-Means and DBScan variants in the context of physically based observations where Gaussian noise can be presumed. An application of usage of the unsupervised technique is presented; the classification of unknown whale species in the cook strait of New Zealand is shown to perform well. GRAPHICAL ABSTRACT