Peter O. Olukanmi, Peter Popoola, Michael Olusanya
{"title":"Centroid Sort: a Clustering-based Technique for Accelerating Sorting Algorithms","authors":"Peter O. Olukanmi, Peter Popoola, Michael Olusanya","doi":"10.1109/IMITEC50163.2020.9334102","DOIUrl":null,"url":null,"abstract":"Sorting does not only occupy a central place in computer science; it also constitutes a building block in machine learning algorithms. This paper draws a useful connection between sorting and clustering, leading to a general approach for accelerating sorting algorithms. Basically, clustering is presented as an effective way to implement the well-known divide-and-conquer strategy. The key idea is that unless an algorithm takes (sub-)linear time, which is an impossible average complexity for sorting, if a k-means-type clustering is applied to the input data, the sum of the individual sorting times will be less than the time for the original input, and sorted versions of the partitions only need be concatenated. Quadratic time, which is common among well-known algorithms, can be accelerated up to $k$ times. Although any clustering algorithm can be used, to minimize overhead, we propose a simplified one-pass k-means-type algorithm comprising a single iteration of Lloyds standard k-means algorithm. We sample k items as partition prototypes; then each item in the data is grouped with its nearest prototype. As proof of concept, we present experiments involving 3 well-known quadratic-time algorithms, namely Insertion Sort, Bubble Sort and Selection Sort.","PeriodicalId":349926,"journal":{"name":"2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMITEC50163.2020.9334102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sorting does not only occupy a central place in computer science; it also constitutes a building block in machine learning algorithms. This paper draws a useful connection between sorting and clustering, leading to a general approach for accelerating sorting algorithms. Basically, clustering is presented as an effective way to implement the well-known divide-and-conquer strategy. The key idea is that unless an algorithm takes (sub-)linear time, which is an impossible average complexity for sorting, if a k-means-type clustering is applied to the input data, the sum of the individual sorting times will be less than the time for the original input, and sorted versions of the partitions only need be concatenated. Quadratic time, which is common among well-known algorithms, can be accelerated up to $k$ times. Although any clustering algorithm can be used, to minimize overhead, we propose a simplified one-pass k-means-type algorithm comprising a single iteration of Lloyds standard k-means algorithm. We sample k items as partition prototypes; then each item in the data is grouped with its nearest prototype. As proof of concept, we present experiments involving 3 well-known quadratic-time algorithms, namely Insertion Sort, Bubble Sort and Selection Sort.