A fast parallelized DBSCAN algorithm based on OpenMp for detection of criminals on streaming services

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Lesia Mochurad, Andrii Sydor, Oleh Ratinskiy
{"title":"A fast parallelized DBSCAN algorithm based on OpenMp for detection of criminals on streaming services","authors":"Lesia Mochurad, Andrii Sydor, Oleh Ratinskiy","doi":"10.3389/fdata.2023.1292923","DOIUrl":null,"url":null,"abstract":"Introduction Streaming services are highly popular today. Millions of people watch live streams or videos and listen to music. Methods One of the most popular streaming platforms is Twitch, and data from this type of service can be a good example for applying the parallel DBSCAN algorithm proposed in this paper. Unlike the classical approach to neighbor search, the proposed one avoids redundancy, i.e., the repetition of the same calculations. At the same time, this algorithm is based on the classical DBSCAN method with a full search for all neighbors, parallelization by subtasks, and OpenMP parallel computing technology. Results In this work, without reducing the accuracy, we managed to speed up the solution based on the DBSCAN algorithm when analyzing medium-sized data. As a result, the acceleration rate tends to the number of cores of a multicore computer system and the efficiency to one. Discussion Before conducting numerical experiments, theoretical estimates of speed-up and efficiency were obtained, and they aligned with the results obtained, confirming their validity. The quality of the performed clustering was verified using the silhouette value. All experiments were conducted using different percentages of medium-sized datasets. The prospects of applying the proposed algorithm can be obtained in various fields such as advertising, marketing, cybersecurity, and sociology. It is worth mentioning that datasets of this kind are often used for detecting fraud on the Internet, making an algorithm capable of considering all neighbors a useful tool for such research.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdata.2023.1292923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction Streaming services are highly popular today. Millions of people watch live streams or videos and listen to music. Methods One of the most popular streaming platforms is Twitch, and data from this type of service can be a good example for applying the parallel DBSCAN algorithm proposed in this paper. Unlike the classical approach to neighbor search, the proposed one avoids redundancy, i.e., the repetition of the same calculations. At the same time, this algorithm is based on the classical DBSCAN method with a full search for all neighbors, parallelization by subtasks, and OpenMP parallel computing technology. Results In this work, without reducing the accuracy, we managed to speed up the solution based on the DBSCAN algorithm when analyzing medium-sized data. As a result, the acceleration rate tends to the number of cores of a multicore computer system and the efficiency to one. Discussion Before conducting numerical experiments, theoretical estimates of speed-up and efficiency were obtained, and they aligned with the results obtained, confirming their validity. The quality of the performed clustering was verified using the silhouette value. All experiments were conducted using different percentages of medium-sized datasets. The prospects of applying the proposed algorithm can be obtained in various fields such as advertising, marketing, cybersecurity, and sociology. It is worth mentioning that datasets of this kind are often used for detecting fraud on the Internet, making an algorithm capable of considering all neighbors a useful tool for such research.
一种基于OpenMp的快速并行DBSCAN算法用于流媒体服务中的犯罪分子检测
流媒体服务在今天非常受欢迎。数百万人观看直播或视频,听音乐。方法Twitch是最流行的流媒体平台之一,该服务的数据可以作为应用本文提出的并行DBSCAN算法的一个很好的例子。与传统的邻居搜索方法不同,该方法避免了冗余,即重复相同的计算。同时,该算法基于经典的DBSCAN方法,充分搜索所有邻居,采用子任务并行化和OpenMP并行计算技术。结果在不降低准确率的情况下,在分析中等规模数据时,我们成功地提高了基于DBSCAN算法的求解速度。因此,加速速率趋向于多核计算机系统的核数,效率趋向于1。在进行数值实验之前,得到了加速和效率的理论估计,并与所得结果相吻合,证实了其有效性。使用轮廓值验证所执行聚类的质量。所有实验都使用不同百分比的中型数据集进行。该算法在广告、市场营销、网络安全、社会学等领域具有广泛的应用前景。值得一提的是,这类数据集经常被用来检测互联网上的欺诈行为,这使得能够考虑所有邻居的算法成为此类研究的有用工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.20
自引率
3.20%
发文量
122
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信