{"title":"研究电子病历大数据的数据修复步骤","authors":"Suraj Juddoo","doi":"10.1109/NextComp55567.2022.9932167","DOIUrl":null,"url":null,"abstract":"This paper builds on previous research with the aim of optimizing data quality methodologies for Big Data systems, with a focus on Electronic Health Records. This optimization is performed for organisations aiming to follow a data-centric data quality strategy. One of the most important stages of a data quality lifecycle is involved with correcting dirty data detected. There is a lack of knowledge relative to the performance of existing data repair algorithms and tools in a Big Data context. This study performs a systemic review of data repair algorithms and tools, subsequently undertaking an experiment-based approach to evaluate those algorithms and tools while comparing it with a prototype built based on the results of a previous study. While some algorithms and tools could be seen to be marginally better than others, there was no algorithm or tool which was seen to be extremely adequate in the Big Data context. Thus, recommendations of improvements needed for data repair algorithms and tools for Big Data are given.","PeriodicalId":422085,"journal":{"name":"2022 3rd International Conference on Next Generation Computing Applications (NextComp)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigating Data Repair steps for EHR Big Data\",\"authors\":\"Suraj Juddoo\",\"doi\":\"10.1109/NextComp55567.2022.9932167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper builds on previous research with the aim of optimizing data quality methodologies for Big Data systems, with a focus on Electronic Health Records. This optimization is performed for organisations aiming to follow a data-centric data quality strategy. One of the most important stages of a data quality lifecycle is involved with correcting dirty data detected. There is a lack of knowledge relative to the performance of existing data repair algorithms and tools in a Big Data context. This study performs a systemic review of data repair algorithms and tools, subsequently undertaking an experiment-based approach to evaluate those algorithms and tools while comparing it with a prototype built based on the results of a previous study. While some algorithms and tools could be seen to be marginally better than others, there was no algorithm or tool which was seen to be extremely adequate in the Big Data context. Thus, recommendations of improvements needed for data repair algorithms and tools for Big Data are given.\",\"PeriodicalId\":422085,\"journal\":{\"name\":\"2022 3rd International Conference on Next Generation Computing Applications (NextComp)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 3rd International Conference on Next Generation Computing Applications (NextComp)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NextComp55567.2022.9932167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 3rd International Conference on Next Generation Computing Applications (NextComp)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NextComp55567.2022.9932167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper builds on previous research with the aim of optimizing data quality methodologies for Big Data systems, with a focus on Electronic Health Records. This optimization is performed for organisations aiming to follow a data-centric data quality strategy. One of the most important stages of a data quality lifecycle is involved with correcting dirty data detected. There is a lack of knowledge relative to the performance of existing data repair algorithms and tools in a Big Data context. This study performs a systemic review of data repair algorithms and tools, subsequently undertaking an experiment-based approach to evaluate those algorithms and tools while comparing it with a prototype built based on the results of a previous study. While some algorithms and tools could be seen to be marginally better than others, there was no algorithm or tool which was seen to be extremely adequate in the Big Data context. Thus, recommendations of improvements needed for data repair algorithms and tools for Big Data are given.