Prabodi Senevirathna , Douglas E.V. Pires , Daniel Capurro
{"title":"揭示数字过度诊断-使用临床轨迹的量化和缓解:肝素诱导的血小板减少用例。","authors":"Prabodi Senevirathna , Douglas E.V. Pires , Daniel Capurro","doi":"10.1016/j.jbi.2025.104876","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Overdiagnosis occurs when abnormalities meeting diagnostic criteria would remain asymptomatic if undiagnosed. Cases initially identified through digital diagnostic tools but later recognised as overdiagnosis are referred to as ‘digital overdiagnosis’. Data-driven frameworks to quantify and mitigate overdiagnosis remain limited. This study introduces a framework that integrates clinical trajectories to train a machine learning (ML)-based disease classifier, enabling the quantification and mitigation of digital overdiagnosis, using Heparin-Induced Thrombocytopenia (HIT) as a case study.</div></div><div><h3>Methods</h3><div>A pre-existing HIT classifier identified HIT-positive and HIT-negative cases, with ground truth based on HIT diagnostic criteria. Clinical trajectories for True Positive (TP) and True Negative (TN) patients were clustered using a novel process-models-based approach. Overdiagnosis was detected when TP cases clustered with predominantly TN cases. The classifier was then retrained with an ‘updated label’ integrating both HIT criteria and the concordant trajectory, to reduce overdiagnosis while maintaining accuracy.</div></div><div><h3>Results</h3><div>7.2% of TP cases were identified as overdiagnosed. Retraining with the updated labels successfully reclassified 89.5% of overdiagnosed cases as TN, with only a minimal reduction in performance (MCC decreased by 0.03, positive likelihood ratio decreased by 0.49, and negative likelihood ratio increased by 0.05). Clinical outcomes—length of stay, thrombotic events, and mortality—differed significantly between non-overdiagnosed and overdiagnosed cases, and between non-overdiagnosed and TN cases, but not between overdiagnosed and TN cases, confirming that overdiagnosed patients resemble TN patients.</div></div><div><h3>Conclusion</h3><div>Incorporating clinical trajectories into ML-based diagnosis enables the quantification of digital overdiagnosis. This approach could refine ML algorithms by prompting a reassessment of criteria-based disease labels in supervised learning.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"169 ","pages":"Article 104876"},"PeriodicalIF":4.0000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncovering digital overdiagnosis – Quantification and mitigation using clinical trajectories: Heparin-induced thrombocytopenia use case\",\"authors\":\"Prabodi Senevirathna , Douglas E.V. Pires , Daniel Capurro\",\"doi\":\"10.1016/j.jbi.2025.104876\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>Overdiagnosis occurs when abnormalities meeting diagnostic criteria would remain asymptomatic if undiagnosed. Cases initially identified through digital diagnostic tools but later recognised as overdiagnosis are referred to as ‘digital overdiagnosis’. Data-driven frameworks to quantify and mitigate overdiagnosis remain limited. This study introduces a framework that integrates clinical trajectories to train a machine learning (ML)-based disease classifier, enabling the quantification and mitigation of digital overdiagnosis, using Heparin-Induced Thrombocytopenia (HIT) as a case study.</div></div><div><h3>Methods</h3><div>A pre-existing HIT classifier identified HIT-positive and HIT-negative cases, with ground truth based on HIT diagnostic criteria. Clinical trajectories for True Positive (TP) and True Negative (TN) patients were clustered using a novel process-models-based approach. Overdiagnosis was detected when TP cases clustered with predominantly TN cases. The classifier was then retrained with an ‘updated label’ integrating both HIT criteria and the concordant trajectory, to reduce overdiagnosis while maintaining accuracy.</div></div><div><h3>Results</h3><div>7.2% of TP cases were identified as overdiagnosed. Retraining with the updated labels successfully reclassified 89.5% of overdiagnosed cases as TN, with only a minimal reduction in performance (MCC decreased by 0.03, positive likelihood ratio decreased by 0.49, and negative likelihood ratio increased by 0.05). Clinical outcomes—length of stay, thrombotic events, and mortality—differed significantly between non-overdiagnosed and overdiagnosed cases, and between non-overdiagnosed and TN cases, but not between overdiagnosed and TN cases, confirming that overdiagnosed patients resemble TN patients.</div></div><div><h3>Conclusion</h3><div>Incorporating clinical trajectories into ML-based diagnosis enables the quantification of digital overdiagnosis. This approach could refine ML algorithms by prompting a reassessment of criteria-based disease labels in supervised learning.</div></div>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\"169 \",\"pages\":\"Article 104876\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1532046425001054\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425001054","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Uncovering digital overdiagnosis – Quantification and mitigation using clinical trajectories: Heparin-induced thrombocytopenia use case
Objective
Overdiagnosis occurs when abnormalities meeting diagnostic criteria would remain asymptomatic if undiagnosed. Cases initially identified through digital diagnostic tools but later recognised as overdiagnosis are referred to as ‘digital overdiagnosis’. Data-driven frameworks to quantify and mitigate overdiagnosis remain limited. This study introduces a framework that integrates clinical trajectories to train a machine learning (ML)-based disease classifier, enabling the quantification and mitigation of digital overdiagnosis, using Heparin-Induced Thrombocytopenia (HIT) as a case study.
Methods
A pre-existing HIT classifier identified HIT-positive and HIT-negative cases, with ground truth based on HIT diagnostic criteria. Clinical trajectories for True Positive (TP) and True Negative (TN) patients were clustered using a novel process-models-based approach. Overdiagnosis was detected when TP cases clustered with predominantly TN cases. The classifier was then retrained with an ‘updated label’ integrating both HIT criteria and the concordant trajectory, to reduce overdiagnosis while maintaining accuracy.
Results
7.2% of TP cases were identified as overdiagnosed. Retraining with the updated labels successfully reclassified 89.5% of overdiagnosed cases as TN, with only a minimal reduction in performance (MCC decreased by 0.03, positive likelihood ratio decreased by 0.49, and negative likelihood ratio increased by 0.05). Clinical outcomes—length of stay, thrombotic events, and mortality—differed significantly between non-overdiagnosed and overdiagnosed cases, and between non-overdiagnosed and TN cases, but not between overdiagnosed and TN cases, confirming that overdiagnosed patients resemble TN patients.
Conclusion
Incorporating clinical trajectories into ML-based diagnosis enables the quantification of digital overdiagnosis. This approach could refine ML algorithms by prompting a reassessment of criteria-based disease labels in supervised learning.
期刊介绍:
The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.