Yifei Wang , Liqin Wang , Zhengyang Zhou , John Laurentiev , Joshua R. Lakin , Li Zhou , Pengyu Hong
{"title":"Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases","authors":"Yifei Wang , Liqin Wang , Zhengyang Zhou , John Laurentiev , Joshua R. Lakin , Li Zhou , Pengyu Hong","doi":"10.1016/j.jbi.2024.104677","DOIUrl":"10.1016/j.jbi.2024.104677","url":null,"abstract":"<div><h3>Objective</h3><p>Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences.</p></div><div><h3>Methods</h3><p>We created five datasets from Mass General Brigham’s electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated.</p></div><div><h3>Results</h3><p>We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (<em>p</em>-value < 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (<em>p</em> = 0.043), in the CKD cohort for insurance type (<em>p</em> = 0.005) and education level (<em>p</em> = 0.016), and in the dementia cohort for body mass index (<em>p</em> = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with <em>p</em>-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and <em>p</em>-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively.</p></div><div><h3>Discussion and conclusion</h3><p>This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104677"},"PeriodicalIF":4.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identify and mitigate bias in electronic phenotyping: A comprehensive study from computational perspective","authors":"Sirui Ding , Shenghan Zhang , Xia Hu , Na Zou","doi":"10.1016/j.jbi.2024.104671","DOIUrl":"10.1016/j.jbi.2024.104671","url":null,"abstract":"<div><p>Electronic phenotyping is a fundamental task that identifies the special group of patients, which plays an important role in precision medicine in the era of digital health. Phenotyping provides real-world evidence for other related biomedical research and clinical tasks, e.g., disease diagnosis, drug development, and clinical trials, etc. With the development of electronic health records, the performance of electronic phenotyping has been significantly boosted by advanced machine learning techniques. In the healthcare domain, precision and fairness are both essential aspects that should be taken into consideration. However, most related efforts are put into designing phenotyping models with higher accuracy. Few attention is put on the fairness perspective of phenotyping. The neglection of bias in phenotyping leads to subgroups of patients being underrepresented which will further affect the following healthcare activities such as patient recruitment in clinical trials. In this work, we are motivated to bridge this gap through a comprehensive experimental study to identify the bias existing in electronic phenotyping models and evaluate the widely-used debiasing methods’ performance on these models. We choose pneumonia and sepsis as our phenotyping target diseases. We benchmark 9 kinds of electronic phenotyping methods spanning from rule-based to data-driven methods. Meanwhile, we evaluate the performance of the 5 bias mitigation strategies covering pre-processing, in-processing, and post-processing. Through the extensive experiments, we summarize several insightful findings from the bias identified in the phenotyping and key points of the bias mitigation strategies in phenotyping.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104671"},"PeriodicalIF":4.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Zhang, Zhihao Yang, Yumeng Yang, Hongfei Lin, Jian Wang
{"title":"Location-enhanced syntactic knowledge for biomedical relation extraction","authors":"Yan Zhang, Zhihao Yang, Yumeng Yang, Hongfei Lin, Jian Wang","doi":"10.1016/j.jbi.2024.104676","DOIUrl":"10.1016/j.jbi.2024.104676","url":null,"abstract":"<div><p>Biomedical relation extraction has long been considered a challenging task due to the specialization and complexity of biomedical texts. Syntactic knowledge has been widely employed in existing research to enhance relation extraction, providing guidance for the semantic understanding and text representation of models. However, the utilization of syntactic knowledge in most studies is not exhaustive, and there is often a lack of fine-grained noise reduction, leading to confusion in relation classification. In this paper, we propose an attention generator that comprehensively considers both syntactic dependency type information and syntactic position information to distinguish the importance of different dependency connections. Additionally, we integrate positional information, dependency type information, and word representations together to introduce location-enhanced syntactic knowledge for guiding our biomedical relation extraction. Experimental results on three widely used English benchmark datasets in the biomedical domain consistently outperform a range of baseline models, demonstrating that our approach not only makes full use of syntactic knowledge but also effectively reduces the impact of noisy words.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104676"},"PeriodicalIF":4.5,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bui Duc Tho , Minh-Tien Nguyen , Dung Tien Le , Lin-Lung Ying , Shumpei Inoue , Tri-Thanh Nguyen
{"title":"Improving biomedical Named Entity Recognition with additional external contexts","authors":"Bui Duc Tho , Minh-Tien Nguyen , Dung Tien Le , Lin-Lung Ying , Shumpei Inoue , Tri-Thanh Nguyen","doi":"10.1016/j.jbi.2024.104674","DOIUrl":"10.1016/j.jbi.2024.104674","url":null,"abstract":"<div><h3>Objective:</h3><p>Biomedical Named Entity Recognition (bio NER) is the task of recognizing named entities in biomedical texts. This paper introduces a new model that addresses bio NER by considering additional external contexts. Different from prior methods that mainly use original input sequences for sequence labeling, the model takes into account additional contexts to enhance the representation of entities in the original sequences, since additional contexts can provide enhanced information for the concept explanation of biomedical entities.</p></div><div><h3>Methods:</h3><p>To exploit an additional context, given an original input sequence, the model first retrieves the relevant sentences from PubMed and then ranks the retrieved sentences to form the contexts. It next combines the context with the original input sequence to form a new enhanced sequence. The original and new enhanced sequences are fed into PubMedBERT for learning feature representation. To obtain more fine-grained features, the model stacks a BiLSTM layer on top of PubMedBERT. The final named entity label prediction is done by using a CRF layer. The model is jointly trained in an end-to-end manner to take advantage of the additional context for NER of the original sequence.</p></div><div><h3>Results:</h3><p>Experimental results on six biomedical datasets show that the proposed model achieves promising performance compared to strong baselines and confirms the contribution of additional contexts for bio NER.</p></div><div><h3>Conclusion:</h3><p>The promising results confirm three important points. First, the additional context from PubMed helps to improve the quality of the recognition of biomedical entities. Second, PubMed is more appropriate than the Google search engine for providing relevant information of bio NER. Finally, more relevant sentences from the context are more beneficial than irrelevant ones to provide enhanced information for the original input sequences. The model is flexible to integrate any additional context types for the NER task.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104674"},"PeriodicalIF":4.5,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141317373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongjiang Niu, Lianwei Zhang, Beiyi Zhang, Qiang Zhang, Zhen Li
{"title":"DAS-DDI: A dual-view framework with drug association and drug structure for drug–drug interaction prediction","authors":"Dongjiang Niu, Lianwei Zhang, Beiyi Zhang, Qiang Zhang, Zhen Li","doi":"10.1016/j.jbi.2024.104672","DOIUrl":"10.1016/j.jbi.2024.104672","url":null,"abstract":"<div><p>In drug development and clinical application, drug–drug interaction (DDI) prediction is crucial for patient safety and therapeutic efficacy. However, traditional methods for DDI prediction often overlook the structural features of drugs and the complex interrelationships between them, which affect the accuracy and interpretability of the model. In this paper, a novel dual-view DDI prediction framework, DAS-DDI is proposed. Firstly, a drug association network is constructed based on similarity information among drugs, which could provide rich context information for DDI prediction. Subsequently, a novel drug substructure extraction method is proposed, which could update the features of nodes and chemical bonds simultaneously, improving the comprehensiveness of the feature. Furthermore, an attention mechanism is employed to fuse multiple drug embeddings from different views dynamically, enhancing the discriminative ability of the model in handling multi-view data. Comparative experiments on three public datasets demonstrate the superiority of DAS-DDI compared with other state-of-the-art models under two scenarios.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104672"},"PeriodicalIF":4.5,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141300768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Yuan , Chuan Hong , Peng-Tao Jiang , Gangming Zhao , Nguyen Tuan Anh Tran , Xinxing Xu , Yet Yen Yan , Nan Liu
{"title":"Clinical domain knowledge-derived template improves post hoc AI explanations in pneumothorax classification","authors":"Han Yuan , Chuan Hong , Peng-Tao Jiang , Gangming Zhao , Nguyen Tuan Anh Tran , Xinxing Xu , Yet Yen Yan , Nan Liu","doi":"10.1016/j.jbi.2024.104673","DOIUrl":"10.1016/j.jbi.2024.104673","url":null,"abstract":"<div><h3>Objective</h3><p>Pneumothorax is an acute thoracic disease caused by abnormal air collection between the lungs and chest wall. Recently, artificial intelligence (AI), especially deep learning (DL), has been increasingly employed for automating the diagnostic process of pneumothorax. To address the opaqueness often associated with DL models, explainable artificial intelligence (XAI) methods have been introduced to outline regions related to pneumothorax. However, these explanations sometimes diverge from actual lesion areas, highlighting the need for further improvement.</p></div><div><h3>Method</h3><p>We propose a template-guided approach to incorporate the clinical knowledge of pneumothorax into model explanations generated by XAI methods, thereby enhancing the quality of the explanations. Utilizing one lesion delineation created by radiologists, our approach first generates a template that represents potential areas of pneumothorax occurrence. This template is then superimposed on model explanations to filter out extraneous explanations that fall outside the template’s boundaries. To validate its efficacy, we carried out a comparative analysis of three XAI methods (Saliency Map, Grad-CAM, and Integrated Gradients) with and without our template guidance when explaining two DL models (VGG-19 and ResNet-50) in two real-world datasets (SIIM-ACR and ChestX-Det).</p></div><div><h3>Results</h3><p>The proposed approach consistently improved baseline XAI methods across twelve benchmark scenarios built on three XAI methods, two DL models, and two datasets. The average incremental percentages, calculated by the performance improvements over the baseline performance, were 97.8% in Intersection over Union (IoU) and 94.1% in Dice Similarity Coefficient (DSC) when comparing model explanations and ground-truth lesion areas. We further visualized baseline and template-guided model explanations on radiographs to showcase the performance of our approach.</p></div><div><h3>Conclusions</h3><p>In the context of pneumothorax diagnoses, we proposed a template-guided approach for improving model explanations. Our approach not only aligns model explanations more closely with clinical insights but also exhibits extensibility to other thoracic diseases. We anticipate that our template guidance will forge a novel approach to elucidating AI models by integrating clinical domain expertise.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104673"},"PeriodicalIF":4.5,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiwei Jiang , Qingxiao Zheng , Taiyong Li , Shuanghong Luo
{"title":"Clinical research text summarization method based on fusion of domain knowledge","authors":"Shiwei Jiang , Qingxiao Zheng , Taiyong Li , Shuanghong Luo","doi":"10.1016/j.jbi.2024.104668","DOIUrl":"10.1016/j.jbi.2024.104668","url":null,"abstract":"<div><h3>Objective</h3><p>The objective of this study is to integrate PICO knowledge into the clinical research text summarization process, aiming to enhance the model’s comprehension of biomedical texts while capturing crucial content from the perspective of summary readers, ultimately improving the quality of summaries.</p></div><div><h3>Methods</h3><p>We propose a clinical research text summarization method called DKGE-PEGASUS (Domain-Knowledge and Graph Convolutional Enhanced PEGASUS), which is based on integrating domain knowledge. The model mainly consists of three components: a PICO label prediction module, a text information re-mining unit based on Graph Convolutional Neural Networks (GCN), and a pre-trained summarization model. First, the PICO label prediction module is used to identify PICO elements in clinical research texts while obtaining word embeddings enriched with PICO knowledge. Then, we use GCN to reinforce the encoder of the pre-trained summarization model to achieve deeper text information mining while explicitly injecting PICO knowledge. Finally, the outputs of the PICO label prediction module, the GCN text information re-mining unit, and the encoder of the pre-trained model are fused to produce the final coding results, which are then decoded by the decoder to generate summaries.</p></div><div><h3>Results</h3><p>Experiments conducted on two datasets, PubMed and CDSR, demonstrated the effectiveness of our method. The Rouge-1 scores achieved were 42.64 and 38.57, respectively. Furthermore, the quality of our summarization results was found to significantly outperform the baseline model in comparisons of summarization results for a segment of biomedical text.</p></div><div><h3>Conclusion</h3><p>The method proposed in this paper is better equipped to identify critical elements in clinical research texts and produce a higher-quality summary.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104668"},"PeriodicalIF":4.5,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141300767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nevo Itzhak , Szymon Jaroszewicz , Robert Moskovitch
{"title":"Event prediction by estimating continuously the completion of a single temporal pattern’s instances","authors":"Nevo Itzhak , Szymon Jaroszewicz , Robert Moskovitch","doi":"10.1016/j.jbi.2024.104665","DOIUrl":"10.1016/j.jbi.2024.104665","url":null,"abstract":"<div><h3>Objective:</h3><p>Develop a new method for continuous prediction that utilizes a single temporal pattern ending with an event of interest and its multiple instances detected in the temporal data.</p></div><div><h3>Methods:</h3><p>Use temporal abstraction to transform time series, instantaneous events, and time intervals into a uniform representation using symbolic time intervals (STIs). Introduce a new approach to event prediction using a single time intervals-related pattern (TIRP), which can learn models to predict whether and when an event of interest will occur, based on multiple instances of a pattern that end with the event.</p></div><div><h3>Results:</h3><p>The proposed methods achieved an average improvement of 5% AUROC over LSTM-FCN, the best-performed baseline model, out of the evaluated baseline models (RawXGB, Resnet, LSTM-FCN, and ROCKET) that were applied to real-life datasets.</p></div><div><h3>Conclusion:</h3><p>The proposed methods for predicting events continuously have the potential to be used in a wide range of real-world and real-time applications in diverse domains with heterogeneous multivariate temporal data. For example, it could be used to predict panic attacks early using wearable devices or to predict complications early in intensive care unit patients.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104665"},"PeriodicalIF":4.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141296135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Esther L. Meerwijk , Duncan C. McElfresh , Susana Martins , Suzanne R. Tamang
{"title":"Evaluating accuracy and fairness of clinical decision support algorithms when health care resources are limited","authors":"Esther L. Meerwijk , Duncan C. McElfresh , Susana Martins , Suzanne R. Tamang","doi":"10.1016/j.jbi.2024.104664","DOIUrl":"10.1016/j.jbi.2024.104664","url":null,"abstract":"<div><h3>Objective</h3><p>Guidance on how to evaluate accuracy and algorithmic fairness across subgroups is missing for clinical models that flag patients for an intervention but when health care resources to administer that intervention are limited. We aimed to propose a framework of metrics that would fit this specific use case.</p></div><div><h3>Methods</h3><p>We evaluated the following metrics and applied them to a Veterans Health Administration clinical model that flags patients for intervention who are at risk of overdose or a suicidal event among outpatients who were prescribed opioids (N = 405,817): Receiver – Operating Characteristic and area under the curve, precision – recall curve, calibration – reliability curve, false positive rate, false negative rate, and false omission rate. In addition, we developed a new approach to visualize false positives and false negatives that we named ‘per true positive bars.’ We demonstrate the utility of these metrics to our use case for three cohorts of patients at the highest risk (top 0.5 %, 1.0 %, and 5.0 %) by evaluating algorithmic fairness across the following age groups: <=30, 31–50, 51–65, and >65 years old.</p></div><div><h3>Results</h3><p>Metrics that allowed us to assess group differences more clearly were the false positive rate, false negative rate, false omission rate, and the new ‘per true positive bars’. Metrics with limited utility to our use case were the Receiver – Operating Characteristic and area under the curve, the calibration – reliability curve, and the precision – recall curve.</p></div><div><h3>Conclusion</h3><p>There is no “one size fits all” approach to model performance monitoring and bias analysis. Our work informs future researchers and clinicians who seek to evaluate accuracy and fairness of predictive models that identify patients to intervene on in the context of limited health care resources. In terms of ease of interpretation and utility for our use case, the new ‘per true positive bars’ may be the most intuitive to a range of stakeholders and facilitates choosing a threshold that allows weighing false positives against false negatives, which is especially important when predicting severe adverse events.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104664"},"PeriodicalIF":4.5,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141293450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Piccininni , Maximilian Wechsung , Ben Van Calster , Jessica L. Rohmann , Stefan Konigorski , Maarten van Smeden
{"title":"Understanding random resampling techniques for class imbalance correction and their consequences on calibration and discrimination of clinical risk prediction models","authors":"Marco Piccininni , Maximilian Wechsung , Ben Van Calster , Jessica L. Rohmann , Stefan Konigorski , Maarten van Smeden","doi":"10.1016/j.jbi.2024.104666","DOIUrl":"10.1016/j.jbi.2024.104666","url":null,"abstract":"<div><h3>Objective</h3><p>Class imbalance is sometimes considered a problem when developing clinical prediction models and assessing their performance. To address it, correction strategies involving manipulations of the training dataset, such as random undersampling or oversampling, are frequently used. The aim of this article is to illustrate the consequences of these class imbalance correction strategies on clinical prediction models’ internal validity in terms of calibration and discrimination performances.</p></div><div><h3>Methods</h3><p>We used both heuristic intuition and formal mathematical reasoning to characterize the relations between conditional probabilities of interest and probabilities targeted when using random undersampling or oversampling. We propose a plug-in estimator that represents a natural correction for predictions obtained from models that have been trained on artificially balanced datasets (“naïve” models). We conducted a Monte Carlo simulation with two different data generation processes and present a real-world example using data from the International Stroke Trial database to empirically demonstrate the consequences of applying random resampling techniques for class imbalance correction on calibration and discrimination (in terms of Area Under the ROC, AUC) for logistic regression and tree-based prediction models.</p></div><div><h3>Results</h3><p>Across our simulations and in the real-world example, calibration of the naïve models was very poor. The models using the plug-in estimator generally outperformed the models relying on class imbalance correction in terms of calibration while achieving the same discrimination performance.</p></div><div><h3>Conclusion</h3><p>Random resampling techniques for class imbalance correction do not generally improve discrimination performance (i.e., AUC), and their use is hard to justify when aiming at providing calibrated predictions. Improper use of such class imbalance correction techniques can lead to suboptimal data usage and less valid risk prediction models.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"155 ","pages":"Article 104666"},"PeriodicalIF":4.5,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141288117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}