A NOVEL METHOD OF MEDICAL CLASSIFICATION USING PARALLELIZATION ALGORITHMS

Computer systems and information technologies Pub Date : 2022-04-14 DOI:10.31891/csit-2022-1-3

L. Mochurad, Andrii Ilkiv

{"title":"A NOVEL METHOD OF MEDICAL CLASSIFICATION USING PARALLELIZATION ALGORITHMS","authors":"L. Mochurad, Andrii Ilkiv","doi":"10.31891/csit-2022-1-3","DOIUrl":null,"url":null,"abstract":"Methods of machine learning in the medical field are the subject of significant ongoing research, which mainly focuses on modeling certain human actions, thought processes or disease recognition. Other applications include biomedical systems, which include genetics and DNA analysis. The purpose of this paper is the implementation of machine learning methods – Random Forest and Decision Tree, further parallelization of these algorithms to achieve greater accuracy of classification and reduce the time of training of these classifiers in the field of medical data processing, determining the presence of human cardiovascular disease. The paper conducts research using machine learning methods for data processing in medicine in order to improve the accuracy and execution time using parallelization algorithms. Classification is an important tool in today's world, where big data is used to make various decisions in government, economics, medicine, and so on. Researchers have access to vast amounts of data, and classification is one of the tools that helps them understand data and find certain patterns in it. The paper used a dataset consisting of records of 70000 patients and containing 12 attributes. Analysis and preliminary data preparation were performed. The Random Forest algorithm is parallelized using the sklearn library functional. The time required to train the model was reduced by 4.4 times when using 8 parallel streams, compared with sequential training. This algorithm is also parallelized based on CUDA. As a result, the time required to train the model was reduced by 83.4 times when using this technology on the GPU. The paper calculates the acceleration and efficiency coefficients, as well as provides a detailed comparison with a sequential algorithm.","PeriodicalId":353631,"journal":{"name":"Computer systems and information technologies","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer systems and information technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31891/csit-2022-1-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Methods of machine learning in the medical field are the subject of significant ongoing research, which mainly focuses on modeling certain human actions, thought processes or disease recognition. Other applications include biomedical systems, which include genetics and DNA analysis. The purpose of this paper is the implementation of machine learning methods – Random Forest and Decision Tree, further parallelization of these algorithms to achieve greater accuracy of classification and reduce the time of training of these classifiers in the field of medical data processing, determining the presence of human cardiovascular disease. The paper conducts research using machine learning methods for data processing in medicine in order to improve the accuracy and execution time using parallelization algorithms. Classification is an important tool in today's world, where big data is used to make various decisions in government, economics, medicine, and so on. Researchers have access to vast amounts of data, and classification is one of the tools that helps them understand data and find certain patterns in it. The paper used a dataset consisting of records of 70000 patients and containing 12 attributes. Analysis and preliminary data preparation were performed. The Random Forest algorithm is parallelized using the sklearn library functional. The time required to train the model was reduced by 4.4 times when using 8 parallel streams, compared with sequential training. This algorithm is also parallelized based on CUDA. As a result, the time required to train the model was reduced by 83.4 times when using this technology on the GPU. The paper calculates the acceleration and efficiency coefficients, as well as provides a detailed comparison with a sequential algorithm.

查看原文本刊更多论文

一种基于并行化算法的医学分类新方法

医学领域的机器学习方法是正在进行的重要研究课题，主要集中在对某些人类行为、思维过程或疾病识别进行建模。其他应用包括生物医学系统，包括遗传学和DNA分析。本文的目的是实现机器学习方法-随机森林和决策树，进一步并行化这些算法，以实现更高的分类精度，并减少这些分类器在医疗数据处理领域的训练时间，确定人类心血管疾病的存在。本文研究将机器学习方法应用于医学数据处理，利用并行化算法提高数据处理的准确性和执行时间。分类是当今世界的一个重要工具，大数据被用于政府、经济、医学等领域的各种决策。研究人员可以访问大量数据，分类是帮助他们理解数据并从中找到某些模式的工具之一。本文使用的数据集由7万例患者的记录组成，包含12个属性。进行分析和初步数据准备。随机森林算法使用sklearn库函数并行化。与顺序训练相比，使用8个并行流训练模型所需的时间减少了4.4倍。该算法也是基于CUDA并行化的。结果，在GPU上使用该技术时，训练模型所需的时间减少了83.4倍。本文计算了加速系数和效率系数，并与顺序算法进行了详细的比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer systems and information technologies

自引率

0.00%

发文量