Centroid Sort: a Clustering-based Technique for Accelerating Sorting Algorithms

2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC) Pub Date : 2020-11-25 DOI:10.1109/IMITEC50163.2020.9334102

Peter O. Olukanmi, Peter Popoola, Michael Olusanya

{"title":"Centroid Sort: a Clustering-based Technique for Accelerating Sorting Algorithms","authors":"Peter O. Olukanmi, Peter Popoola, Michael Olusanya","doi":"10.1109/IMITEC50163.2020.9334102","DOIUrl":null,"url":null,"abstract":"Sorting does not only occupy a central place in computer science; it also constitutes a building block in machine learning algorithms. This paper draws a useful connection between sorting and clustering, leading to a general approach for accelerating sorting algorithms. Basically, clustering is presented as an effective way to implement the well-known divide-and-conquer strategy. The key idea is that unless an algorithm takes (sub-)linear time, which is an impossible average complexity for sorting, if a k-means-type clustering is applied to the input data, the sum of the individual sorting times will be less than the time for the original input, and sorted versions of the partitions only need be concatenated. Quadratic time, which is common among well-known algorithms, can be accelerated up to $k$ times. Although any clustering algorithm can be used, to minimize overhead, we propose a simplified one-pass k-means-type algorithm comprising a single iteration of Lloyds standard k-means algorithm. We sample k items as partition prototypes; then each item in the data is grouped with its nearest prototype. As proof of concept, we present experiments involving 3 well-known quadratic-time algorithms, namely Insertion Sort, Bubble Sort and Selection Sort.","PeriodicalId":349926,"journal":{"name":"2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMITEC50163.2020.9334102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Sorting does not only occupy a central place in computer science; it also constitutes a building block in machine learning algorithms. This paper draws a useful connection between sorting and clustering, leading to a general approach for accelerating sorting algorithms. Basically, clustering is presented as an effective way to implement the well-known divide-and-conquer strategy. The key idea is that unless an algorithm takes (sub-)linear time, which is an impossible average complexity for sorting, if a k-means-type clustering is applied to the input data, the sum of the individual sorting times will be less than the time for the original input, and sorted versions of the partitions only need be concatenated. Quadratic time, which is common among well-known algorithms, can be accelerated up to $k$ times. Although any clustering algorithm can be used, to minimize overhead, we propose a simplified one-pass k-means-type algorithm comprising a single iteration of Lloyds standard k-means algorithm. We sample k items as partition prototypes; then each item in the data is grouped with its nearest prototype. As proof of concept, we present experiments involving 3 well-known quadratic-time algorithms, namely Insertion Sort, Bubble Sort and Selection Sort.

查看原文本刊更多论文

质心排序:一种基于聚类的加速排序算法

排序不仅在计算机科学中占据中心地位;它也构成了机器学习算法的构建块。本文在排序和聚类之间建立了有用的联系，从而引出了一种加速排序算法的通用方法。基本上，聚类是一种实现众所周知的分而治之策略的有效方法。关键思想是，除非算法需要(次)线性时间，这对于排序来说是不可能的平均复杂度，如果将k-means类型的聚类应用于输入数据，则单个排序时间的总和将小于原始输入的时间，并且只需要将分区的排序版本连接起来。二次时间，这是常见的知名算法，可以加速到$k$倍。尽管可以使用任何聚类算法，但为了最小化开销，我们提出了一种简化的单遍k-means型算法，该算法包含Lloyds标准k-means算法的一次迭代。我们抽取k个项目作为分区原型;然后将数据中的每个项目与最接近的原型分组。作为概念证明，我们提出了涉及3种著名的二次时间算法的实验，即插入排序，冒泡排序和选择排序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)

自引率

0.00%

发文量