{"title":"Explainable machine learning model for assessing health status in patients with comorbid coronary heart disease and depression: Development and validation study","authors":"Jiqing Li, Shuo Wu, Jianhua Gu","doi":"10.1016/j.ijmedinf.2025.105808","DOIUrl":"10.1016/j.ijmedinf.2025.105808","url":null,"abstract":"<div><h3>Background</h3><div>Coronary heart disease (CHD) and depression frequently co-occur, significantly impacting patient outcomes. However, comprehensive health status assessment tools for this complex population are lacking. This study aimed to develop and validate an explainable machine learning model to evaluate overall health status in patients with comorbid CHD and depression.</div></div><div><h3>Methods</h3><div>Utilizing data from the 2021–2022 Behavioral Risk Factor Surveillance System, we developed and externally validated machine learning models to predict overall health status, defined as having both poor physical and mental health for ≥ 14 days in the past 30 days. Eleven machine learning algorithms were evaluated, including artificial neural networks, support vector machines, and ensemble methods. The SHapley Additive exPlanations (SHAP) method was employed to enhance model interpretability. Model performance was assessed using discrimination, calibration, and decision curve analysis.</div></div><div><h3>Results</h3><div>The study included 9,747 participants in the derivation cohort and 8,394 in the external validation cohort. Among the eleven algorithms evaluated, an optimized XGBoost model with eight key features demonstrated balanced performance. SHAP analysis revealed that employment status, physical activity, income, and age were the most influential predictors. The model maintained good discrimination (AUC 0.712, 95% CI 0.703–0.721 in derivation; AUC 0.711, 95% CI 0.701–0.721 in validation), calibration and clinical utility across both cohorts.</div></div><div><h3>Conclusion</h3><div>Our explainable machine learning model provides a novel, comprehensive approach to assessing health status in patients with comorbid CHD and depression, offering valuable insights for personalized management strategies.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105808"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Masciulli , Anna Corti , Alessia Lindemann , Katia Chiappetta , Mattia Loppini , Valentina D.A. Corino
{"title":"Hip prosthesis failure prediction through radiological deep sequence learning","authors":"Francesco Masciulli , Anna Corti , Alessia Lindemann , Katia Chiappetta , Mattia Loppini , Valentina D.A. Corino","doi":"10.1016/j.ijmedinf.2025.105802","DOIUrl":"10.1016/j.ijmedinf.2025.105802","url":null,"abstract":"<div><h3>Background</h3><div>Existing deep learning studies for the automated detection of hip prosthesis failure only consider the last available radiographic image. However, using longitudinal data is thought to improve the prediction, by combining temporal and spatial components. The aim of this study is to develop artificial intelligence models for predicting hip implant failure from multiple subsequent plain radiographs.</div></div><div><h3>Methods</h3><div>A cohort of 224 patients was considered for model development and a balanced cohort of 14 patients was used for external validation. A sequence of two or three anteroposterior radiographic images per patient was considered to track the prosthesis over time. A combination of a convolutional neural network (CNN) and a recurrent section was used. For the CNN, a pretrained autoencoder, a pretrained RadImageNet DenseNet and a pretrained custom DenseNet were considered. The recurrent section was implemented using either a single Gated Recurrent Unit (GRU) layer or a Long Short-Term Memory block.</div></div><div><h3>Results</h3><div>Considering 3 images as input provided a positive predictive value (PPV) of 0.966 and an f1 score of 0.933 on the validation set. Regarding the 2-image models, using the postoperative and the last image resulted in PPV of 0.933 and f1 score of 0.918, whereas using the second-to-last image with the post-operative one reached a PPV of 0.882 and f1 score of 0.923. On the external validation set, the 3-image model reached an accuracy of 0.786.</div></div><div><h3>Conclusion</h3><div>This study demonstrated the potential of the developed models, based on a series of plain radiographs, to predict hip prosthesis failure.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105802"},"PeriodicalIF":3.7,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143069648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Feng , Honghan Wu , Hui Ma , Yuechuchu Yin , Zhenhuan Tao , Shan Lu , Xin Zhang , Yun Yu , Cheng Wan , Yun Liu
{"title":"Deep learning based prediction of depression and anxiety in patients with type 2 diabetes mellitus using regional electronic health records","authors":"Wei Feng , Honghan Wu , Hui Ma , Yuechuchu Yin , Zhenhuan Tao , Shan Lu , Xin Zhang , Yun Yu , Cheng Wan , Yun Liu","doi":"10.1016/j.ijmedinf.2025.105801","DOIUrl":"10.1016/j.ijmedinf.2025.105801","url":null,"abstract":"<div><h3>Background</h3><div>Depression and anxiety are prevalent mental health conditions among individuals with type 2 diabetes mellitus (T2DM), who exhibit unique vulnerabilities and etiologies. However, existing approaches fail to fully utilize regional heterogeneous electronic health record (EHR) data. Integrating this data can provide a more comprehensive understanding of depression and anxiety in T2DM patients, leading to more personalized treatment strategies.</div></div><div><h3>Objective</h3><div>This study aims to develop and validate a deep learning model, the Regional EHR for Depression and Anxiety Prediction Model (REDAPM), using regional EHR data to predict depression and anxiety in patients with T2DM.</div></div><div><h3>Methods</h3><div>A case-control development and validation study was conducted using regional EHR data from the Nanjing Health Information Center (NHIC). Two retrospective, matched (1:3) datasets were constructed from the full cohort for the model's internal and external validation. These two datasets were selected from the NHIC data of 2020 and 2022, respectively. The REDAPM incorporates both structured and unstructured EHR data, capturing the temporal dependency of clinical events. The performance of REDAPM was compared to a set of baseline models, evaluated using the area under the receiver operating characteristic curve (ROC-AUC) and the area under the precision-recall curve (PR-AUC). Subgroup, ablation, and interpretation analyses were conducted to identify relevant clinical features available from EHRs.</div></div><div><h3>Results</h3><div>The internal and external validation datasets comprised 24,724 and 34,340 patients, respectively. The REDAPM outperformed baseline models in both datasets, achieving ROC-AUC scores of 0.9029±0.008 and 0.7360±0.005, and PR-AUC scores of 0.8124±0.011 and 0.5504±0.009. Ablation and subgroup experiments confirmed the significant contribution of patients' medical history text to the model's performance. Integrated gradient score analysis identified the predictive importance of other mental disorders.</div></div><div><h3>Conclusion</h3><div>The REDAPM effectively leverages the heterogeneous characteristics of regional EHR data, demonstrating strong predictive performance for depression onset in diabetic patients. It also shows potential for discovering significant clinical features, indicating considerable promise for clinical utility.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105801"},"PeriodicalIF":3.7,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143076333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An interpretable hybrid machine learning approach for predicting three-month unfavorable outcomes in patients with acute ischemic stroke","authors":"Chen Chen , Wenkang Zhang , Yang Pan , Zhen Li","doi":"10.1016/j.ijmedinf.2025.105807","DOIUrl":"10.1016/j.ijmedinf.2025.105807","url":null,"abstract":"<div><h3>Background</h3><div>Acute ischemic stroke (AIS) is a clinical disorder caused by nontraumatic cerebrovascular disease with a high incidence, mortality, and disability rate. Most stroke survivors are left with speech and physical impairments, and emotional problems. Despite technological advances and improved treatment options, death and disability after stroke remain a major problem. Our research aims to develop interpretable hybrid machine learning (ML) models to accurately predict three-month unfavorable outcomes in patients with AIS.</div></div><div><h3>Methods</h3><div>Within the framework of this analysis, the model was trained using data from 731 cases in the dataset and subsequently validated using data from both internal and external validation datasets. A total of 25 models (including ML and deep learning models) were initially employed, along with 14 evaluation metrics, and the results were subjected to cluster analysis to objectively validate the model’s effectiveness and assess the similarity of evaluation metrics. For the final model evaluation, 10 metrics selected after metric screening and calibration analysis were utilized to evaluate model performance, while clinical decision analysis, cost curve analysis, and model fairness analysis were applied to assess the clinical applicability of the model. Nested cross-validation and optimal hyperparameter search were employed to determine the best hyperparameter for the ML models. The SHAP diagram is utilized to provide further visual explanations regarding the importance of features and their interaction effects, ultimately leading to the establishment of a practical AIS three-month prognostic prediction platform.</div></div><div><h3>Results</h3><div>The frequencies of unfavorable outcomes in the internal dataset and external validation dataset were 389 / 1045 (37.2 %) and 161 / 411 (39.2 %), respectively. Through cluster analysis of the results of 14 evaluation metrics across 25 models and a comparison of clinical applicability, 12 ML models were ultimately selected for further analysis. The findings revealed that XGBoost and CatBoost performed the best. Further ensemble modeling of these two models and adjustment of decision thresholds using cost curves resulted in the final model performing as follows on the internal validation set: PRAUC of 0.856 (0.801, 0.902), ROCAUC of 0.856 (0.801, 0.901), specificity of 0.879 (0.797, 0.953), balanced accuracy of 0.840 (0.763, 0.912) and MCC of 0.678 (0.591, 0.760). Similarly, the model exhibited excellent performance on the external validation set, with a PRAUC of 0.823 (0.775, 0.872), ROCAUC of 0.842 (0.801, 0.890), specificity of 0.888 (0.822, 0.920), balanced accuracy of 0.814 (0.751, 0.869) and MCC of 0.639 (0.546, 0.721). In terms of the important features of AIS three-month outcomes, albumin ranked highest, followed by FBG, BMI, Scr, WBC, and age, while gender exhibited significant interactions with other indicators. Ultimately, b","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105807"},"PeriodicalIF":3.7,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143369806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Public value and digital health: The example of guiding values in the national digital health strategy of France","authors":"Simon Lewerenz , Anne Moen , Henrique Martins","doi":"10.1016/j.ijmedinf.2025.105794","DOIUrl":"10.1016/j.ijmedinf.2025.105794","url":null,"abstract":"<div><h3>Introduction</h3><div>In the WHO European Region, 44 of 53 reporting Member States (MS) have a national digital health strategy (NDHS) or policy. Their formulation is heterogenous and evolving and should best reflect public common interest. This research aims to explore how a public value approach improves the relevance of digital health policies and services, increasing their capacity to better serve the diverse range of societal interests. It utilises the guiding values within the French NDHS as an example before discussing other digital health policies such as the European Heath Data Space.</div></div><div><h3>Methods</h3><div>Three homogenous focus group discussions were conducted in November and December 2023. Each focus group separately gathered distinct stakeholders: public clients, health professionals, private sector. 19 participants were included in the study. Data collection comprised live polling and semi-structured discussion. Results were analysed considering the pre-defined stakeholder groups and the values discussed during the study.</div></div><div><h3>Results</h3><div>Findings reveal both technical and cultural challenges in digital health that highlight the need for adaptable frameworks across different contexts. Stakeholder insights informed a framework classifying public values into democratic and managerial categories, suggesting themes that may be relevant to digital health strategies in other national and regional settings.</div></div><div><h3>Discussion</h3><div>Public value is discussed as a multidimensional concept, and the plurality of its perceptions give basis for tailored approaches to serve different value-beneficiaries comprehensively. We propose this values-based approach as a systematic model for supra-, sub-, and national scales and additional policy topics, beyond digital health strategies.</div></div><div><h3>Conclusion</h3><div>The study suggests that using a public value lens considering multiple perceptions is valuable for advancing digital health policy in a responsible and ethical manner. Such an approach could promote wider governance of and adoption of digital health. To evolve the framework, application in multiple and large ecosystems at different levels should be considered.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105794"},"PeriodicalIF":3.7,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renata Savian Colvero de Oliveira , Sharon Nabwire , Heta Merikallio , Markku Savolainen , Janne Hukkanen , Harri Oinas-Kukkonen
{"title":"Behind the software: The impact of Unobtrusiveness, Goal Setting and persuasive features on BMI","authors":"Renata Savian Colvero de Oliveira , Sharon Nabwire , Heta Merikallio , Markku Savolainen , Janne Hukkanen , Harri Oinas-Kukkonen","doi":"10.1016/j.ijmedinf.2025.105795","DOIUrl":"10.1016/j.ijmedinf.2025.105795","url":null,"abstract":"<div><h3>Background</h3><div>Studies have demonstrated that interventions targeting weight loss and body mass index (BMI) reduction can be successful, although the specific factors that influence their effectiveness are still unclear. Behavior change support systems (BCSS) are an approach that aims to help users in their efforts to modify their behavior. A useful tool for assessing BCSS is the Persuasive Systems Design model (PSD), where different features and postulates can be employed. However, it is unknown whether the grouping of software features and design principles, along with behavioral traits, provide a better combination to achieve effective BMI reduction.</div></div><div><h3>Objective</h3><div>This study investigates the impact of PSD features, postulates behind the design, and behavioral traits on BMI reduction after six months of utilizing a mobile health behavior change support system (mHBCSS).</div></div><div><h3>Methods</h3><div>We examined a subset of 96 individuals from a randomized controlled trial using a mHBCSS for a period of six months. Data was analyzed using Partial Least Squares Structural Equation Modeling (PLS-SEM).</div></div><div><h3>Results</h3><div>We found that 15.3 % in the variance of BMI reduction was explained by the ability of setting goals. Furthermore, users who perceive a system as highly persuasive are more likely to establish goals (R<sup>2</sup> = 0.207). Among PSD features, Dialogue Support and Primary Task Support explained 54.9 % of the variance in Perceived Persuasiveness. In addition, both Dialogue Support and Credibility Support have a mutual effect on Primary Task Support (R<sup>2</sup> = 0.685). Finally, the system’s unobtrusiveness explained 41.1 % of the variance in Dialogue Support.</div></div><div><h3>Conclusion</h3><div>PSD framework and behavior change theories provide significant influence on BMI reduction. Setting a clear and organized objective assists individuals in successfully pursuing their intended results. The findings of this study can help developers and health professionals decide which PSD features and postulates to include to make mHBCSS interventions targeting BMI reduction more effective.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105795"},"PeriodicalIF":3.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akram Mustafa , Usman Naseem , Mostafa Rahimi Azghadi
{"title":"Large language models vs human for classifying clinical documents","authors":"Akram Mustafa , Usman Naseem , Mostafa Rahimi Azghadi","doi":"10.1016/j.ijmedinf.2025.105800","DOIUrl":"10.1016/j.ijmedinf.2025.105800","url":null,"abstract":"<div><h3>Background</h3><div>Accurate classification of medical records is crucial for clinical documentation, particularly when using the 10th revision of the International Classification of Diseases (ICD-10) coding system. The use of machine learning algorithms and Systematized Nomenclature of Medicine (SNOMED) mapping has shown promise in performing these classifications. However, challenges remain, particularly in reducing false negatives, where certain diagnoses are not correctly identified by either approach.</div></div><div><h3>Objective</h3><div>This study explores the potential of leveraging advanced large language models to improve the accuracy of ICD-10 classifications in challenging cases of medical records where machine learning and SNOMED mapping fail.</div></div><div><h3>Methods</h3><div>We evaluated the performance of ChatGPT 3.5 and ChatGPT 4 in classifying ICD-10 codes from discharge summaries within selected records of the Medical Information Mart for Intensive Care (MIMIC) IV dataset. These records comprised 802 discharge summaries identified as false negatives by both machine learning and SNOMED mapping methods, showing their challenging case. Each summary was assessed by ChatGPT 3.5 and 4 using a classification prompt, and the results were compared to human coder evaluations. Five human coders, with a combined experience of over 30 years, independently classified a stratified sample of 100 summaries to validate ChatGPT's performance.</div></div><div><h3>Results</h3><div>ChatGPT 4 demonstrated significantly improved consistency over ChatGPT 3.5, with matching results between runs ranging from 86% to 89%, compared to 57% to 67% for ChatGPT 3.5. The classification accuracy of ChatGPT 4 was variable across different ICD-10 codes. Overall, human coders performed better than ChatGPT. However, ChatGPT matched the median performance of human coders, achieving an accuracy rate of 22%.</div></div><div><h3>Conclusion</h3><div>This study underscores the potential of integrating advanced language models with clinical coding processes to improve documentation accuracy. ChatGPT 4 demonstrated improved consistency and comparable performance to median human coders, achieving 22% accuracy in challenging cases. Combining ChatGPT with methods like SNOMED mapping could further enhance clinical coding accuracy, particularly for complex scenarios.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105800"},"PeriodicalIF":3.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengying Li , Yin Fang , Jiong Shao , Yan Jiang , Guoping Xu , Xin-wu Cui , Xinglong Wu
{"title":"Vision transformer-based multimodal fusion network for classification of tumor malignancy on breast ultrasound: A retrospective multicenter study","authors":"Mengying Li , Yin Fang , Jiong Shao , Yan Jiang , Guoping Xu , Xin-wu Cui , Xinglong Wu","doi":"10.1016/j.ijmedinf.2025.105793","DOIUrl":"10.1016/j.ijmedinf.2025.105793","url":null,"abstract":"<div><h3>Background</h3><div>In the context of routine breast cancer diagnosis, the precise discrimination between benign and malignant breast masses holds utmost significance. Notably, few prior investigations have concurrently explored the integration of imaging histology features, deep learning characteristics, and clinical parameters. The primary objective of this retrospective study was to pioneer a multimodal feature fusion model tailored for the prediction of breast tumor malignancy, harnessing the potential of ultrasound images.</div></div><div><h3>Method</h3><div>We compiled a dataset that included clinical features from 1065 patients and 3315 image datasets. Specifically, we selected data from 603 patients for training our multimodal model. The comprehensive experimental workflow involves identifying the optimal unimodal model, extracting unimodal features, fusing multimodal features, gaining insights from these fused features, and ultimately generating prediction results using a classifier.</div></div><div><h3>Results</h3><div>Our multimodal feature fusion model demonstrates outstanding performance, achieving an AUC of 0.994 (95 % CI: 0.988–0.999) and an F1 score of 0.971 on the primary multicenter dataset. In the evaluation on two independent testing cohorts (TCs), it maintains strong performance, with AUCs of 0.942 (95 % CI: 0.854–0.994) for TC1 and 0.945 (95 % CI: 0.857–1.000) for TC2, accompanied by corresponding F1 scores of 0.872 and 0.857, respectively. Notably, the decision curve analysis reveals that our model achieves higher accuracy within the threshold probability range of approximately [0.210, 0.890] (TC1) and [0.000, 0.850] (TC2) compared to alternative methods. This capability enhances its utility in clinical decision-making, providing substantial benefits.</div></div><div><h3>Conclusion</h3><div>The multimodal model proposed in this paper can comprehensively evaluate patients’ multifaceted clinical information, achieve the prediction of benign and malignant breast ultrasound tumors, and obtain high performance indexes.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105793"},"PeriodicalIF":3.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tapio Niemi, Jean Pierre Ghobril, Gautier Defossez, Simon Germann, Eloïse Martin, Jean-Luc Bulliard
{"title":"Automatic (near-) duplicate content document detection in a cancer registry","authors":"Tapio Niemi, Jean Pierre Ghobril, Gautier Defossez, Simon Germann, Eloïse Martin, Jean-Luc Bulliard","doi":"10.1016/j.ijmedinf.2025.105799","DOIUrl":"10.1016/j.ijmedinf.2025.105799","url":null,"abstract":"<div><h3>Background</h3><div>Duplicate and near-duplicate medical documents are problematic in document management, clinical use, and medical research. In this study, we focus on multisourced medical documents in the context of a population-based cancer registry in Switzerland. Although the data collection process is well-regulated, the volume of transmitted documents steadily increases and the presence of full or near-duplicates slows down and complicates document processing. Identifying near-duplicates is particularly challenging because the large number of documents makes pairwise comparison non-feasible.</div></div><div><h3>Methods</h3><div>We implemented a system based on both normal hash functions, Simhash (Locality Sensitive Hashing), and Smith-Waterman text alignment similarity. Simhash offers good performance and confirming its results by the Smith-Waterman algorithm with a selected similarity threshold reduces the false positive rate to near zero without lowering sensitivity. Extracted differences in near-duplicate content documents are shown by highlighting differences in original PDF documents.</div><div>We validated the method using 3042 manually verified document pairs containing 1252 full-duplicate and 398 near-duplicate pairs. The area under the curve (AUC) was 0.96, sensitivity 0.92, specificity 1.00, PPV 1.00, and NPV 0.91. For the same size simulated data, corresponding values were 0.86, 0.72, 1.00, 1.00, and 0.77, respectively.</div></div><div><h3>Results</h3><div>We applied the method against 224,398 medical documents in the cancer registry. We found 5.5% of duplicates on the text level, and 0.17–0.24% near-duplicates depending on the used parameters and threshold values. Most near-duplicates related to the same patient and originated from the same transmitter. Manual evaluation showed that only 2% of differences were in medical contents and 83% in administrative data (21% in patient, 11% in doctor, and 51% in other administrative data). Many near-duplicates looked strikingly similar from a human perspective.</div></div><div><h3>Conclusions</h3><div>We demonstrated that our method can efficiently find all full-duplicates and most near-duplicates in a large set of multisourced medical documents. Potential ways to further improve this method are discussed. The method can be applied to documents in all domains.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105799"},"PeriodicalIF":3.7,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liqin Wang , John Novoa-Laurentiev , Claire Cook , Shruthi Srivatsan , Yining Hua , Jie Yang , Eli Miloslavsky , Hyon K. Choi , Li Zhou , Zachary S. Wallace
{"title":"Identification of an ANCA-associated vasculitis cohort using deep learning and electronic health records","authors":"Liqin Wang , John Novoa-Laurentiev , Claire Cook , Shruthi Srivatsan , Yining Hua , Jie Yang , Eli Miloslavsky , Hyon K. Choi , Li Zhou , Zachary S. Wallace","doi":"10.1016/j.ijmedinf.2025.105797","DOIUrl":"10.1016/j.ijmedinf.2025.105797","url":null,"abstract":"<div><h3>Background</h3><div>ANCA-associated vasculitis (AAV) is a rare but serious disease. Traditional case-identification methods using claims data can be time-intensive and may miss important subgroups. We hypothesized that a deep learning model analyzing electronic health records (EHR) can more accurately identify AAV cases.</div></div><div><h3>Methods</h3><div>We examined the Mass General Brigham (MGB) repository of clinical documentation from 12/1/1979 to 5/11/2021, using expert-curated keywords and ICD codes to identify a large cohort of potential AAV cases. Three labeled datasets (I, II, III) were created, each containing note sections. We trained and evaluated a range of machine learning and deep learning algorithms for note-level classification, using metrics like positive predictive value (PPV), sensitivity, F-score, area under the receiver operating characteristic curve (AUROC), and area under the precision and recall curve (AUPRC). The hierarchical attention network (HAN) was further evaluated for its ability to classify AAV cases at the patient-level, compared with rule-based algorithms in 2000 randomly chosen samples.</div></div><div><h3>Results</h3><div>Datasets I, II, and III comprised 6000, 3008, and 7500 note sections, respectively. HAN achieved the highest AUROC in all three datasets, with scores of 0.983, 0.991, and 0.991. The deep learning approach also had among the highest PPVs across the three datasets (0.941, 0.954, and 0.800, respectively). In a test cohort of 2000 cases, the HAN model achieved a PPV of 0.262 and an estimated sensitivity of 0.975. Compared to the best rule-based algorithm, HAN identified six additional AAV cases, representing 13% of the total.</div></div><div><h3>Conclusion</h3><div>The deep learning model effectively classifies clinical note sections for AAV diagnosis. Its application to EHR notes can potentially uncover additional cases missed by traditional rule-based methods.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105797"},"PeriodicalIF":3.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}