Data Analysis Architecture using Techniques of Machine Learning for the Prediction of the Quality of Blood Fonations against the Hepatitis C Virus

Paul Idrovo-Berrezueta, Denys Dutan-Sanchez, Remigio Hurtado-Ortiz, V. Robles-Bykbaev
{"title":"Data Analysis Architecture using Techniques of Machine Learning for the Prediction of the Quality of Blood Fonations against the Hepatitis C Virus","authors":"Paul Idrovo-Berrezueta, Denys Dutan-Sanchez, Remigio Hurtado-Ortiz, V. Robles-Bykbaev","doi":"10.1109/ROPEC55836.2022.10018741","DOIUrl":null,"url":null,"abstract":"Nowadays the WHO (World health Organization) has difficulties improving the access to safe blood. The WHO have published that the problem with blood donations is that of the millions of blood donations that they receive one in four donations made from low-income countries do not test all the donated blood. This is a big problematic because a hospital cannot ensure a patient if the blood, he/she is receiving is safe. As a solution to this problematic, we have proposed the use a method based on CRISP-DM, where as a first procedure we apply a preparation to the data, then we prepared the dataset by cleaning the null variables, transforming the dataset by applying Hot Encoding, analysis the data with PCA (Principal Component Analysis) and using the 85% of variance, and using oversampling for the class that we have chosen. Once the dataset has been preprocessed we proceed to use the techniques of machine learning to help evaluate if a donor's blood is qualified or not for its use. We have applied a variety of machine learning techniques such as: RandomForest, KNN (K-Nearest-Neighbor), SVM (Support Vector Machine), and a neural network ANN (Artificial Neural Network). As a final step, we interpreted the results and got to a conclusion that the classifier that had the highest precision is the Random Forest classifier. For this this research we found a public dataset gathered by the university of Germany. This investigation has the objective to help improve the detection of hepatitis C in low-income countries and hopes to help improve the access to safe blood for patients who need them. In addition, we can apply this data analysis method for future investigations from which we encourage that tests be made with other techniques or models to analyze data.","PeriodicalId":237392,"journal":{"name":"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","volume":"1 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROPEC55836.2022.10018741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Nowadays the WHO (World health Organization) has difficulties improving the access to safe blood. The WHO have published that the problem with blood donations is that of the millions of blood donations that they receive one in four donations made from low-income countries do not test all the donated blood. This is a big problematic because a hospital cannot ensure a patient if the blood, he/she is receiving is safe. As a solution to this problematic, we have proposed the use a method based on CRISP-DM, where as a first procedure we apply a preparation to the data, then we prepared the dataset by cleaning the null variables, transforming the dataset by applying Hot Encoding, analysis the data with PCA (Principal Component Analysis) and using the 85% of variance, and using oversampling for the class that we have chosen. Once the dataset has been preprocessed we proceed to use the techniques of machine learning to help evaluate if a donor's blood is qualified or not for its use. We have applied a variety of machine learning techniques such as: RandomForest, KNN (K-Nearest-Neighbor), SVM (Support Vector Machine), and a neural network ANN (Artificial Neural Network). As a final step, we interpreted the results and got to a conclusion that the classifier that had the highest precision is the Random Forest classifier. For this this research we found a public dataset gathered by the university of Germany. This investigation has the objective to help improve the detection of hepatitis C in low-income countries and hopes to help improve the access to safe blood for patients who need them. In addition, we can apply this data analysis method for future investigations from which we encourage that tests be made with other techniques or models to analyze data.
使用机器学习技术预测抗丙型肝炎病毒血液质量的数据分析体系结构
如今,世界卫生组织在改善获得安全血液方面遇到了困难。世界卫生组织发表说,献血的问题在于,在他们接受的数百万献血中,有四分之一来自低收入国家的献血没有对所有捐献的血液进行检测。这是一个大问题,因为医院不能保证病人接受的血液是安全的。为了解决这个问题,我们提出了一种基于CRISP-DM的方法,其中作为第一个过程,我们对数据进行准备,然后我们通过清理空变量来准备数据集,通过应用热编码来转换数据集,用PCA(主成分分析)分析数据,使用85%的方差,并对我们选择的类使用过采样。一旦数据集经过预处理,我们就会继续使用机器学习技术来帮助评估捐献者的血液是否符合使用条件。我们应用了各种机器学习技术,如:随机森林,KNN (K-Nearest-Neighbor), SVM(支持向量机)和神经网络ANN(人工神经网络)。作为最后一步,我们解释了结果并得出结论,具有最高精度的分类器是随机森林分类器。为了这项研究,我们找到了一个由德国大学收集的公共数据集。这项调查的目的是帮助改善低收入国家对丙型肝炎的检测,并希望帮助改善需要安全血液的患者获得安全血液的机会。此外,我们可以将这种数据分析方法应用于今后的调查,我们鼓励用其他技术或模型进行测试来分析数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信