Paul Idrovo-Berrezueta, Denys Dutan-Sanchez, Remigio Hurtado-Ortiz, V. Robles-Bykbaev
{"title":"Data Analysis Architecture using Techniques of Machine Learning for the Prediction of the Quality of Blood Fonations against the Hepatitis C Virus","authors":"Paul Idrovo-Berrezueta, Denys Dutan-Sanchez, Remigio Hurtado-Ortiz, V. Robles-Bykbaev","doi":"10.1109/ROPEC55836.2022.10018741","DOIUrl":null,"url":null,"abstract":"Nowadays the WHO (World health Organization) has difficulties improving the access to safe blood. The WHO have published that the problem with blood donations is that of the millions of blood donations that they receive one in four donations made from low-income countries do not test all the donated blood. This is a big problematic because a hospital cannot ensure a patient if the blood, he/she is receiving is safe. As a solution to this problematic, we have proposed the use a method based on CRISP-DM, where as a first procedure we apply a preparation to the data, then we prepared the dataset by cleaning the null variables, transforming the dataset by applying Hot Encoding, analysis the data with PCA (Principal Component Analysis) and using the 85% of variance, and using oversampling for the class that we have chosen. Once the dataset has been preprocessed we proceed to use the techniques of machine learning to help evaluate if a donor's blood is qualified or not for its use. We have applied a variety of machine learning techniques such as: RandomForest, KNN (K-Nearest-Neighbor), SVM (Support Vector Machine), and a neural network ANN (Artificial Neural Network). As a final step, we interpreted the results and got to a conclusion that the classifier that had the highest precision is the Random Forest classifier. For this this research we found a public dataset gathered by the university of Germany. This investigation has the objective to help improve the detection of hepatitis C in low-income countries and hopes to help improve the access to safe blood for patients who need them. In addition, we can apply this data analysis method for future investigations from which we encourage that tests be made with other techniques or models to analyze data.","PeriodicalId":237392,"journal":{"name":"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","volume":"1 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROPEC55836.2022.10018741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Nowadays the WHO (World health Organization) has difficulties improving the access to safe blood. The WHO have published that the problem with blood donations is that of the millions of blood donations that they receive one in four donations made from low-income countries do not test all the donated blood. This is a big problematic because a hospital cannot ensure a patient if the blood, he/she is receiving is safe. As a solution to this problematic, we have proposed the use a method based on CRISP-DM, where as a first procedure we apply a preparation to the data, then we prepared the dataset by cleaning the null variables, transforming the dataset by applying Hot Encoding, analysis the data with PCA (Principal Component Analysis) and using the 85% of variance, and using oversampling for the class that we have chosen. Once the dataset has been preprocessed we proceed to use the techniques of machine learning to help evaluate if a donor's blood is qualified or not for its use. We have applied a variety of machine learning techniques such as: RandomForest, KNN (K-Nearest-Neighbor), SVM (Support Vector Machine), and a neural network ANN (Artificial Neural Network). As a final step, we interpreted the results and got to a conclusion that the classifier that had the highest precision is the Random Forest classifier. For this this research we found a public dataset gathered by the university of Germany. This investigation has the objective to help improve the detection of hepatitis C in low-income countries and hopes to help improve the access to safe blood for patients who need them. In addition, we can apply this data analysis method for future investigations from which we encourage that tests be made with other techniques or models to analyze data.