医疗保健中数据挖掘的伦理：挑战、框架和未来方向。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2025-07-11 DOI:10.1186/s13040-025-00461-w

Mohamed Mustaf Ahmed, Olalekan John Okesanya, Majd Oweidat, Zhinya Kawa Othman, Shuaibu Saidu Musa, Don Eliseo Lucero-Prisno Iii

{"title":"医疗保健中数据挖掘的伦理：挑战、框架和未来方向。","authors":"Mohamed Mustaf Ahmed, Olalekan John Okesanya, Majd Oweidat, Zhinya Kawa Othman, Shuaibu Saidu Musa, Don Eliseo Lucero-Prisno Iii","doi":"10.1186/s13040-025-00461-w","DOIUrl":null,"url":null,"abstract":"Data mining in healthcare offers transformative insights yet surfaces multilayered ethical and governance challenges that extend beyond privacy alone. Privacy and consent concerns remain paramount when handling sensitive medical data, particularly as healthcare organizations increasingly share patient information with large digital platforms. The risks of data breaches and unauthorized access are stark: 725 reportable incidents in 2023 alone exposed more than 133 million patient records, and hacking-related breaches surged by 239% since 2018. Algorithmic bias further threatens equity; models trained on historically prejudiced data can reinforce health disparities across protected groups. Therefore, transparency must span three levels-dataset documentation, model interpretability, and post-deployment audit logging-to make algorithmic reasoning and failures traceable. Security vulnerabilities in the Internet of Medical Things (IoMT) and cloud-based health platforms amplify these risks, while corporate data-sharing deals complicate questions of data ownership and patient autonomy. A comprehensive response requires (i) dataset-level artifacts such as \"datasheets,\" (ii) model-cards that disclose fairness metrics, and (iii) continuous logging of predictions and LIME/SHAP explanations for independent audits. Technical safeguards must blend differential privacy (with empirically validated noise budgets), homomorphic encryption for high-value queries, and federated learning to maintain the locality of raw data. Governance frameworks must also mandate routine bias and robust audits and harmonized penalties for non-compliance. Regular reassessments, thorough documentation, and active engagement with clinicians, patients, and regulators are critical to accountability. This paper synthesizes current evidence, from a 2019 European re-identification study demonstrating 99.98% uniqueness with 15 quasi-identifiers to recent clinical audits that trimmed false-negative rates via threshold recalibration, and proposes an integrated set of fairness, privacy, and security controls aligned with SPIRIT-AI, CONSORT-AI, and emerging PROBAST-AI guidelines. Implementing these solutions will help healthcare systems harness the benefits of data mining while safeguarding patient rights and sustaining public trust.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"47"},"PeriodicalIF":6.1000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12255135/pdf/","citationCount":"0","resultStr":"{\"title\":\"The ethics of data mining in healthcare: challenges, frameworks, and future directions.\",\"authors\":\"Mohamed Mustaf Ahmed, Olalekan John Okesanya, Majd Oweidat, Zhinya Kawa Othman, Shuaibu Saidu Musa, Don Eliseo Lucero-Prisno Iii\",\"doi\":\"10.1186/s13040-025-00461-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data mining in healthcare offers transformative insights yet surfaces multilayered ethical and governance challenges that extend beyond privacy alone. Privacy and consent concerns remain paramount when handling sensitive medical data, particularly as healthcare organizations increasingly share patient information with large digital platforms. The risks of data breaches and unauthorized access are stark: 725 reportable incidents in 2023 alone exposed more than 133 million patient records, and hacking-related breaches surged by 239% since 2018. Algorithmic bias further threatens equity; models trained on historically prejudiced data can reinforce health disparities across protected groups. Therefore, transparency must span three levels-dataset documentation, model interpretability, and post-deployment audit logging-to make algorithmic reasoning and failures traceable. Security vulnerabilities in the Internet of Medical Things (IoMT) and cloud-based health platforms amplify these risks, while corporate data-sharing deals complicate questions of data ownership and patient autonomy. A comprehensive response requires (i) dataset-level artifacts such as \\\"datasheets,\\\" (ii) model-cards that disclose fairness metrics, and (iii) continuous logging of predictions and LIME/SHAP explanations for independent audits. Technical safeguards must blend differential privacy (with empirically validated noise budgets), homomorphic encryption for high-value queries, and federated learning to maintain the locality of raw data. Governance frameworks must also mandate routine bias and robust audits and harmonized penalties for non-compliance. Regular reassessments, thorough documentation, and active engagement with clinicians, patients, and regulators are critical to accountability. This paper synthesizes current evidence, from a 2019 European re-identification study demonstrating 99.98% uniqueness with 15 quasi-identifiers to recent clinical audits that trimmed false-negative rates via threshold recalibration, and proposes an integrated set of fairness, privacy, and security controls aligned with SPIRIT-AI, CONSORT-AI, and emerging PROBAST-AI guidelines. Implementing these solutions will help healthcare systems harness the benefits of data mining while safeguarding patient rights and sustaining public trust.\",\"PeriodicalId\":48947,\"journal\":{\"name\":\"Biodata Mining\",\"volume\":\"18 1\",\"pages\":\"47\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12255135/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biodata Mining\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13040-025-00461-w\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00461-w","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

医疗保健领域的数据挖掘提供了变革性的见解，但也暴露了超越隐私的多层道德和治理挑战。在处理敏感医疗数据时，隐私和同意问题仍然是最重要的，尤其是在医疗保健组织越来越多地与大型数字平台共享患者信息的情况下。数据泄露和未经授权访问的风险非常明显：仅在2023年，就有725起可报告的事件暴露了超过1.33亿份患者记录，自2018年以来，与黑客相关的泄露事件激增了239%。算法偏见进一步威胁到公平；用历史上有偏见的数据训练的模型可能会加剧受保护群体之间的健康差距。因此，透明度必须跨越三个级别—数据集文档、模型可解释性和部署后审计日志—以使算法推理和故障可跟踪。医疗物联网（IoMT）和基于云的健康平台的安全漏洞放大了这些风险，而企业数据共享交易使数据所有权和患者自主权问题复杂化。全面的回应需要(i)数据集级别的工件，如“数据表”，（ii）披露公平指标的模型卡，以及（iii）持续记录预测和独立审计的LIME/SHAP解释。技术保障必须混合差分隐私（与经验验证的噪声预算）、用于高价值查询的同态加密以及用于维护原始数据局域性的联邦学习。治理框架还必须规定例行的偏见和强有力的审计，并对违规行为进行协调一致的惩罚。定期重新评估、全面的文件记录以及临床医生、患者和监管机构的积极参与对问责制至关重要。本文综合了目前的证据，从2019年欧洲重新识别研究显示，15个准标识符具有99.98%的唯一性，到最近通过阈值重新校准减少假阴性率的临床审计，并提出了一套与SPIRIT-AI、CONSORT-AI和新兴PROBAST-AI指南一致的综合公平、隐私和安全控制措施。实施这些解决方案将有助于医疗保健系统利用数据挖掘的好处，同时保护患者权利并维持公众信任。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

The ethics of data mining in healthcare: challenges, frameworks, and future directions.

查看原文本刊更多论文

The ethics of data mining in healthcare: challenges, frameworks, and future directions.

Data mining in healthcare offers transformative insights yet surfaces multilayered ethical and governance challenges that extend beyond privacy alone. Privacy and consent concerns remain paramount when handling sensitive medical data, particularly as healthcare organizations increasingly share patient information with large digital platforms. The risks of data breaches and unauthorized access are stark: 725 reportable incidents in 2023 alone exposed more than 133 million patient records, and hacking-related breaches surged by 239% since 2018. Algorithmic bias further threatens equity; models trained on historically prejudiced data can reinforce health disparities across protected groups. Therefore, transparency must span three levels-dataset documentation, model interpretability, and post-deployment audit logging-to make algorithmic reasoning and failures traceable. Security vulnerabilities in the Internet of Medical Things (IoMT) and cloud-based health platforms amplify these risks, while corporate data-sharing deals complicate questions of data ownership and patient autonomy. A comprehensive response requires (i) dataset-level artifacts such as "datasheets," (ii) model-cards that disclose fairness metrics, and (iii) continuous logging of predictions and LIME/SHAP explanations for independent audits. Technical safeguards must blend differential privacy (with empirically validated noise budgets), homomorphic encryption for high-value queries, and federated learning to maintain the locality of raw data. Governance frameworks must also mandate routine bias and robust audits and harmonized penalties for non-compliance. Regular reassessments, thorough documentation, and active engagement with clinicians, patients, and regulators are critical to accountability. This paper synthesizes current evidence, from a 2019 European re-identification study demonstrating 99.98% uniqueness with 15 quasi-identifiers to recent clinical audits that trimmed false-negative rates via threshold recalibration, and proposes an integrated set of fairness, privacy, and security controls aligned with SPIRIT-AI, CONSORT-AI, and emerging PROBAST-AI guidelines. Implementing these solutions will help healthcare systems harness the benefits of data mining while safeguarding patient rights and sustaining public trust.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.