Computational Parallel of K-Nearest Neighbor on Page Blocks Classification Dataset

Damar Zaky, P. H. Gunawan
{"title":"Computational Parallel of K-Nearest Neighbor on Page Blocks Classification Dataset","authors":"Damar Zaky, P. H. Gunawan","doi":"10.1109/ICoICT49345.2020.9166293","DOIUrl":null,"url":null,"abstract":"K-Nearest Neighbor (KNN) is considered as one of the simplest machine learning algorithms. While the implementation is quite simple, KNN is actually computationally expensive that makes it take a lot of time when it tries to predict. KNN has been known to be a lazy learning machine learning method that means that this method doesn’t generalize the data, instead it has to memorize the training data, even when testing. This paper aims to optimize the KNN classifier to solve page blocks classification by making the algorithm parallel. The part of the KNN algorithm that is changed to become parallel is the outer part where the task for each test data is divided according to the number of processors. In this work, we use parallel KNN to classify page blocks. Page blocks are any blocks of a page layout that are detected by using a segmentation technique, the KNN is trained to classify whether a block is a vertical line, picture, text, horizontal line or graphic. The experiment shows that the KNN classifier obtains an accuracy of 93.51% and by using parallel KNN, a speedup of 4.64 times faster and an efficiency of 57.96% can be obtained by using 8 processors and an increasing number of grids up to 6040 while it obtains the same accuracy as serial.","PeriodicalId":113108,"journal":{"name":"2020 8th International Conference on Information and Communication Technology (ICoICT)","volume":"271 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 8th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoICT49345.2020.9166293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

K-Nearest Neighbor (KNN) is considered as one of the simplest machine learning algorithms. While the implementation is quite simple, KNN is actually computationally expensive that makes it take a lot of time when it tries to predict. KNN has been known to be a lazy learning machine learning method that means that this method doesn’t generalize the data, instead it has to memorize the training data, even when testing. This paper aims to optimize the KNN classifier to solve page blocks classification by making the algorithm parallel. The part of the KNN algorithm that is changed to become parallel is the outer part where the task for each test data is divided according to the number of processors. In this work, we use parallel KNN to classify page blocks. Page blocks are any blocks of a page layout that are detected by using a segmentation technique, the KNN is trained to classify whether a block is a vertical line, picture, text, horizontal line or graphic. The experiment shows that the KNN classifier obtains an accuracy of 93.51% and by using parallel KNN, a speedup of 4.64 times faster and an efficiency of 57.96% can be obtained by using 8 processors and an increasing number of grids up to 6040 while it obtains the same accuracy as serial.
页块分类数据集上k近邻的并行计算
KNN算法被认为是最简单的机器学习算法之一。虽然实现非常简单,但KNN实际上在计算上很昂贵,这使得它在尝试预测时需要花费大量时间。众所周知,KNN是一种懒惰学习的机器学习方法,这意味着该方法不会泛化数据,而是必须记住训练数据,即使在测试时也是如此。本文旨在通过并行化算法,优化KNN分类器来解决页面块分类问题。将KNN算法改为并行的部分是根据处理器数量划分每个测试数据的任务的外部部分。在这项工作中,我们使用并行KNN对页面块进行分类。页面块是使用分割技术检测到的页面布局的任何块,KNN被训练来分类一个块是垂直线、图片、文本、水平线还是图形。实验表明,KNN分类器的准确率为93.51%,采用并行KNN分类器,在获得与串行分类器相同的准确率的情况下,使用8个处理器,增加到6040个网格,加速速度提高4.64倍,效率达到57.96%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信