Charlotte Zelin , Wendy K. Chung , Mederic Jeanne , Gongbo Zhang , Chunhua Weng
{"title":"Rare disease diagnosis using knowledge guided retrieval augmentation for ChatGPT","authors":"Charlotte Zelin , Wendy K. Chung , Mederic Jeanne , Gongbo Zhang , Chunhua Weng","doi":"10.1016/j.jbi.2024.104702","DOIUrl":"10.1016/j.jbi.2024.104702","url":null,"abstract":"<div><p>Although rare diseases individually have a low prevalence, they collectively affect nearly 400 million individuals around the world. On average, it takes five years for an accurate rare disease diagnosis, but many patients remain undiagnosed or misdiagnosed. As machine learning technologies have been used to aid diagnostics in the past, this study aims to test ChatGPT’s suitability for rare disease diagnostic support with the enhancement provided by Retrieval Augmented Generation (RAG). RareDxGPT, our enhanced ChatGPT model, supplies ChatGPT with information about 717 rare diseases from an external knowledge resource, the RareDis Corpus, through RAG. In RareDxGPT, when a query is entered, the three documents most relevant to the query in the RareDis Corpus are retrieved. Along with the query, they are returned to ChatGPT to provide a diagnosis. Additionally, phenotypes for thirty different diseases were extracted from free text from PubMed’s Case Reports. They were each entered with three different prompt types: “prompt”, “prompt + explanation” and “prompt + role play.” The accuracy of ChatGPT and RareDxGPT with each prompt was then measured. With “Prompt”, RareDxGPT had a 40 % accuracy, while ChatGPT 3.5 got 37 % of the cases correct. With “Prompt + Explanation”, RareDxGPT had a 43 % accuracy, while ChatGPT 3.5 got 23 % of the cases correct. With “Prompt + Role Play”, RareDxGPT had a 40 % accuracy, while ChatGPT 3.5 got 23 % of the cases correct. To conclude, ChatGPT, especially when supplying extra domain specific knowledge, demonstrates early potential for rare disease diagnosis with adjustments.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104702"},"PeriodicalIF":4.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141859869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Celia Alvarez-Romero , Máximo Bernabeu-Wittel , Carlos Luis Parra-Calderón , Silvia Rodríguez Mejías , Alicia Martínez-García
{"title":"Desiderata for discoverability and FAIR adoption of health data hubs","authors":"Celia Alvarez-Romero , Máximo Bernabeu-Wittel , Carlos Luis Parra-Calderón , Silvia Rodríguez Mejías , Alicia Martínez-García","doi":"10.1016/j.jbi.2024.104700","DOIUrl":"10.1016/j.jbi.2024.104700","url":null,"abstract":"<div><h3>Background</h3><p>The future European Health Research and Innovation Cloud (HRIC), as fundamental part of the European Health Data Space (EHDS), will promote the secondary use of data and the capabilities to push the boundaries of health research within an ethical and legally compliant framework that reinforces the trust of patients and citizens.</p></div><div><h3>Objective</h3><p>This study aimed to analyse health data management mechanisms in Europe to determine their alignment with FAIR principles and data discovery generating best.</p><p>practices for new data hubs joining the HRIC ecosystem. In this line, the compliance of health data hubs with FAIR principles and data discovery were assessed, and a set of best practices for health data hubs was concluded.</p></div><div><h3>Methods</h3><p>A survey was conducted in January 2022, involving 99 representative health data hubs from multiple countries, and 42 responses were obtained in June 2022. Stratification methods were employed to cover different levels of granularity. The survey data was analysed to assess compliance with FAIR and data discovery principles. The study started with a general analysis of survey responses, followed by the creation of specific profiles based on three categories: organization type, function, and level of data aggregation.</p></div><div><h3>Results</h3><p>The study produced specific best practices for data hubs regarding the adoption of FAIR principles and data discoverability. It also provided an overview of the survey study and specific profiles derived from category analysis, considering different types of data hubs.</p></div><div><h3>Conclusions</h3><p>The study concluded that a significant number of health data hubs in Europe did not fully comply with FAIR and data discovery principles. However, the study identified specific best practices that can guide new data hubs in adhering to these principles. The study highlighted the importance of aligning health data management mechanisms with FAIR principles to enhance interoperability and reusability in the future HRIC.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104700"},"PeriodicalIF":4.0,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1532046424001187/pdfft?md5=8528674c63bb931855f719c8a92b3d67&pid=1-s2.0-S1532046424001187-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141848605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingdong Yang , Han Wang , Peng Liu , Yuhang Lu , Minghui Yao , Haixia Yan
{"title":"Prediction of hypertension risk based on multiple feature fusion","authors":"Jingdong Yang , Han Wang , Peng Liu , Yuhang Lu , Minghui Yao , Haixia Yan","doi":"10.1016/j.jbi.2024.104701","DOIUrl":"10.1016/j.jbi.2024.104701","url":null,"abstract":"<div><h3>Objective</h3><p>In the application of machine learning to the prediction of hypertension, many factors have seriously affected the classification accuracy and generalization performance. We propose a pulse wave classification model based on multi-feature fusion for accuracy prediction of hypertension.</p></div><div><h3>Methods and Materials</h3><p>We propose an ensemble under-sampling model with dynamic weights to decrease the influence of class imbalance on classification, further to automatically classify of hypertension on inquiry diagnosis. We also build a deep learning model based on hybrid attention mechanism, which transforms pulse waves to feature maps for extraction of in-depth features, so as to automatically classify hypertension on pulse diagnosis. We build the multi-feature fusion model based on dynamic Dempster/Shafer (DS) theory combining inquiry diagnosis and pulse diagnosis to enhance fault tolerance of prediction for multiple classifiers. In addition, this study calculates feature importance ranking of scale features on inquiry diagnosis and temporal and frequency-domain features on pulse diagnosis.</p></div><div><h3>Results</h3><p>The accuracy, sensitivity, specificity, F1-score and G-mean after 5-fold cross-validation were 94.08%, 93.43%, 96.86%, 93.45% and 95.12%, respectively, based on the hypertensive samples of 409 cases from Longhua Hospital affiliated to Shanghai University of Traditional Chinese Medicine and Hospital of Integrated Traditional Chinese and Western Medicine. We find the key factors influencing hypertensive classification accuracy, so as to assist in the prevention and clinical diagnosis of hypertension.</p></div><div><h3>Conclusion</h3><p>Compared with the state-of-the-art models, the multi-feature fusion model effectively utilizes the patients’ correlated multimodal features, and has higher classification accuracy and generalization performance.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104701"},"PeriodicalIF":4.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141758975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sumei Yao , Yan Zhang , Jing Chen , Quan Lu , Zhiguang Zhao
{"title":"Enhancing identification performance of cognitive impairment high-risk based on a semi-supervised learning method","authors":"Sumei Yao , Yan Zhang , Jing Chen , Quan Lu , Zhiguang Zhao","doi":"10.1016/j.jbi.2024.104699","DOIUrl":"10.1016/j.jbi.2024.104699","url":null,"abstract":"<div><h3>Background</h3><p>Cognitive assessment plays a pivotal role in the early detection of cognitive impairment, particularly in the prevention and management of cognitive diseases such as Alzheimer’s and Lewy body dementia. Large-scale screening relies heavily on cognitive assessment scales as primary tools, with some low sensitivity and others expensive. Despite significant progress in machine learning for cognitive function assessment, its application in this particular screening domain remains underexplored, often requiring labor-intensive expert annotations.</p></div><div><h3>Aims</h3><p>This paper introduces a semi-supervised learning algorithm based on pseudo-label with putback (SS-PP), aiming to enhance model efficiency in predicting the high risk of cognitive impairment (HR-CI) by utilizing the distribution of unlabeled samples.</p></div><div><h3>Data</h3><p>The study involved 189 labeled samples and 215,078 unlabeled samples from real world. A semi-supervised classification algorithm was designed and evaluated by comparison with supervised methods composed by 14 traditional machine-learning methods and other advanced semi-supervised algorithms.</p></div><div><h3>Results</h3><p>The optimal SS-PP model, based on GBDT, achieved an AUC of 0.947. Comparative analysis with supervised learning models and semi-supervised methods demonstrated an average AUC improvement of 8% and state-of-art performance, repectively.</p></div><div><h3>Conclusion</h3><p>This study pioneers the exploration of utilizing limited labeled data for HR-CI predictions and evaluates the benefits of incorporating physical examination data, holding significant implications for the development of cost-effective strategies in relevant healthcare domains.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104699"},"PeriodicalIF":4.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141734164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sahar Hassanzadeh Mostafaei , Jafar Tanha , Amir Sharafkhaneh
{"title":"A novel deep learning model based on transformer and cross modality attention for classification of sleep stages","authors":"Sahar Hassanzadeh Mostafaei , Jafar Tanha , Amir Sharafkhaneh","doi":"10.1016/j.jbi.2024.104689","DOIUrl":"10.1016/j.jbi.2024.104689","url":null,"abstract":"<div><p>The classification of sleep stages is crucial for gaining insights into an individual’s sleep patterns and identifying potential health issues. Employing several important physiological channels in different views, each providing a distinct perspective on sleep patterns, can have a great impact on the efficiency of the classification models. In the context of neural networks and deep learning models, transformers are very effective, especially when dealing with time series data, and have shown remarkable compatibility with sequential data analysis as physiological channels. On the other hand, cross-modality attention by integrating information from multiple views of the data enables to capture relationships among different modalities, allowing models to selectively focus on relevant information from each modality. In this paper, we introduce a novel deep-learning model based on transformer encoder-decoder and cross-modal attention for sleep stage classification. The proposed model processes information from various physiological channels with different modalities using the Sleep Heart Health Study Dataset (SHHS) data and leverages transformer encoders for feature extraction and cross-modal attention for effective integration to feed into the transformer decoder. The combination of these elements increased the accuracy of the model up to 91.33% in classifying five classes of sleep stages. Empirical evaluations demonstrated the model’s superior performance compared to standalone approaches and other state-of-the-art techniques, showcasing the potential of combining transformer and cross-modal attention for improved sleep stage classification.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104689"},"PeriodicalIF":4.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141727297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annie T. Chen , Claire E. Child , Mary Grace Asirot , Kimiko Domoto-Reilly , Anne M. Turner
{"title":"A visual approach to facilitating conversations about supportive care options in the context of cognitive impairment","authors":"Annie T. Chen , Claire E. Child , Mary Grace Asirot , Kimiko Domoto-Reilly , Anne M. Turner","doi":"10.1016/j.jbi.2024.104691","DOIUrl":"10.1016/j.jbi.2024.104691","url":null,"abstract":"<div><h3>Background</h3><p>Persons with cognitive impairment may experience difficulties with language and cognition that interfere with their ability to communicate about health-related decision making.</p></div><div><h3>Objective</h3><p>We developed a visual elicitation technique to facilitate conversations about preferences concerning potential future supportive care needs and explored the utility of this technique in a qualitative interview study.</p></div><div><h3>Methods</h3><p>We conducted 15 online interviews with persons with mild cognitive impairment and mild to moderate dementia, using storytelling and a virtual tool designed to facilitate discussion about preferences for supportive care. Interviews were transcribed verbatim and analyzed using an inductive qualitative data analysis method. We report our findings with respect to several main themes. First, we considered participants’ perspectives on supportive care. Next, we examined the utility of the tool for engaging participants in conversation through two themes: cognitive and communicative processes exhibited by participants; and dialogic interactions between the interviewer and the participant.</p></div><div><h3>Results</h3><p>With respect to participants’ perspectives on supportive care, common themes included considerations relating to informal caregivers such as availability and burden, and the quality of care options such as paid caregivers. Other themes, such as the importance of making decisions as a family, considerations related to facing these challenges on one’s own, and the fluid nature of decision making, also emerged. Common communicative processes included not being responsive to the question and unclear responses. Common cognitive processes included uncertainty and introspection, or self-awareness, of one's cognitive abilities. Last, we examined dialogic interactions between the participant and the interviewer to better understand engagement with the tool. The interviewer was active in using the visualization tool to facilitate the conversation, and participants engaged with the interface to varying degrees. Some participants expressed greater agency and involvement through suggesting images, elaborating on their or the interviewer’s comments, and suggesting icon labels.</p></div><div><h3>Conclusion</h3><p>This article presents a visual method to engage older adults with cognitive impairment in active dialogue about complex decisions. Though designed for a research setting, the diverse communication and participant-interviewer interaction patterns observed in this study suggest that the tool might be adapted for use in clinical or community settings.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104691"},"PeriodicalIF":4.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141633643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashley C. Griffin , Karen H. Wang , Tiffany I. Leung , Julio C. Facelli
{"title":"Recommendations to promote fairness and inclusion in biomedical AI research and clinical use","authors":"Ashley C. Griffin , Karen H. Wang , Tiffany I. Leung , Julio C. Facelli","doi":"10.1016/j.jbi.2024.104693","DOIUrl":"10.1016/j.jbi.2024.104693","url":null,"abstract":"<div><h3>Objective</h3><p>Understanding and quantifying biases when designing and implementing actionable approaches to increase fairness and inclusion is critical for artificial intelligence (AI) in biomedical applications.</p></div><div><h3>Methods</h3><p>In this Special Communication, we discuss how bias is introduced at different stages of the development and use of AI applications in biomedical sciences and health care. We describe various AI applications and their implications for fairness and inclusion in sections on 1) Bias in Data Source Landscapes, 2) Algorithmic Fairness, 3) Uncertainty in AI Predictions, 4) Explainable AI for Fairness and Equity, and 5) Sociological/Ethnographic Issues in Data and Results Representation.</p></div><div><h3>Results</h3><p>We provide recommendations to address biases when developing and using AI in clinical applications.</p></div><div><h3>Conclusion</h3><p>These recommendations can be applied to informatics research and practice to foster more equitable and inclusive health care systems and research discoveries.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104693"},"PeriodicalIF":4.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141633644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isabelle-Emmanuella Nogues , Jun Wen , Yihan Zhao , Clara-Lea Bonzel , Victor M. Castro , Yucong Lin , Shike Xu , Jue Hou , Tianxi Cai
{"title":"Semi-supervised Double Deep Learning Temporal Risk Prediction (SeDDLeR) with Electronic Health Records","authors":"Isabelle-Emmanuella Nogues , Jun Wen , Yihan Zhao , Clara-Lea Bonzel , Victor M. Castro , Yucong Lin , Shike Xu , Jue Hou , Tianxi Cai","doi":"10.1016/j.jbi.2024.104685","DOIUrl":"10.1016/j.jbi.2024.104685","url":null,"abstract":"<div><h3>Background:</h3><p>Risk prediction plays a crucial role in planning for prevention, monitoring, and treatment. Electronic Health Records (EHRs) offer an expansive repository of temporal medical data encompassing both risk factors and outcome indicators essential for effective risk prediction. However, challenges emerge due to the lack of readily available gold-standard outcomes and the complex effects of various risk factors. Compounding these challenges are the false positives in diagnosis codes, and formidable task of pinpointing the onset timing in annotations.</p></div><div><h3>Objective:</h3><p>We develop a <strong>Se</strong>mi-supervised <strong>D</strong>ouble <strong>D</strong>eep <strong>Le</strong>arning Temporal <strong>R</strong>isk Prediction (SeDDLeR) algorithm based on extensive unlabeled longitudinal Electronic Health Records (EHR) data augmented by a limited set of gold standard labels on the binary status information indicating whether the clinical event of interest occurred during the follow-up period.</p></div><div><h3>Methods:</h3><p>The SeDDLeR algorithm calculates an individualized risk of developing future clinical events over time using each patient’s baseline EHR features via the following steps: (1) construction of an initial EHR-derived surrogate as a proxy for the onset status; (2) deep learning calibration of the surrogate along gold-standard onset status; and (3) semi-supervised deep learning for risk prediction combining calibrated surrogates and gold-standard onset status. To account for missing onset time and heterogeneous follow-up, we introduce temporal kernel weighting. We devise a Gated Recurrent Units (GRUs) module to capture temporal characteristics. We subsequently assess our proposed SeDDLeR method in simulation studies and apply the method to the Massachusetts General Brigham (MGB) Biobank to predict type 2 diabetes (T2D) risk.</p></div><div><h3>Results:</h3><p>SeDDLeR outperforms benchmark risk prediction methods, including Semi-parametric Transformation Model (STM) and DeepHit, with consistently best accuracy across experiments. SeDDLeR achieved the best C-statistics ( 0.815, SE 0.023; vs STM +.084, SE 0.030, <span><math><mi>P</mi></math></span>-value .004; vs DeepHit +.055, SE 0.027, <span><math><mi>P</mi></math></span>-value .024) and best average time-specific AUC (0.778, SE 0.022; vs STM + 0.059, SE 0.039, <span><math><mi>P</mi></math></span>-value .067; vs DeepHit + 0.168, SE 0.032, <span><math><mi>P</mi></math></span>-value <span><math><mo><</mo></math></span>0.001) in the MGB T2D study.</p></div><div><h3>Conclusion:</h3><p>SeDDLeR can train robust risk prediction models in both real-world EHR and synthetic datasets with minimal requirements of labeling event times. It holds the potential to be incorporated for future clinical trial recruitment or clinical decision-making.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104685"},"PeriodicalIF":4.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141616530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiwen Lu , Jiayi Tong , Jessica Chubak , Thomas Lumley , Rebecca A Hubbard , Hua Xu , Yong Chen
{"title":"Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data","authors":"Yiwen Lu , Jiayi Tong , Jessica Chubak , Thomas Lumley , Rebecca A Hubbard , Hua Xu , Yong Chen","doi":"10.1016/j.jbi.2024.104690","DOIUrl":"10.1016/j.jbi.2024.104690","url":null,"abstract":"<div><h3>Objectives</h3><p>It has become increasingly common for multiple computable phenotypes from electronic health records (EHR) to be developed for a given phenotype. However, EHR-based association studies often focus on a single phenotype. In this paper, we develop a method aiming to simultaneously make use of multiple EHR-derived phenotypes for reduction of bias due to phenotyping error and improved efficiency of phenotype/exposure associations.</p></div><div><h3>Materials and Methods</h3><p>The proposed method combines multiple algorithm-derived phenotypes with a small set of validated outcomes to reduce bias and improve estimation accuracy and efficiency. The performance of our method was evaluated through simulation studies and real-world application to an analysis of colon cancer recurrence using EHR data from Kaiser Permanente Washington.</p></div><div><h3>Results</h3><p>In settings where there was no single surrogate performing uniformly better than all others in terms of both sensitivity and specificity, our method achieved substantial bias reduction compared to using a single algorithm-derived phenotype. Our method also led to higher estimation efficiency by up to 30% compared to an estimator that used only one algorithm-derived phenotype.</p></div><div><h3>Discussion</h3><p>Simulation studies and application to real-world data demonstrated the effectiveness of our method in integrating multiple phenotypes, thereby enhancing bias reduction, statistical accuracy and efficiency.</p></div><div><h3>Conclusions</h3><p>Our method combines information across multiple surrogates using a statistically efficient seemingly unrelated regression framework. Our method provides a robust alternative to single-surrogate-based bias correction, especially in contexts lacking information on which surrogate is superior.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104690"},"PeriodicalIF":4.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141616529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patricia Cabanillas Silva , Hong Sun , Pablo Rodriguez-Brazzarola , Mohamed Rezk , Xianchao Zhang , Janis Fliegenschmidt , Nikolai Hulde , Vera von Dossow , Laurent Meesseman , Kristof Depraetere , Ralph Szymanowsky , Jörg Stieg , Fried-Michael Dahlweid
{"title":"Evaluating gender bias in ML-based clinical risk prediction models: A study on multiple use cases at different hospitals","authors":"Patricia Cabanillas Silva , Hong Sun , Pablo Rodriguez-Brazzarola , Mohamed Rezk , Xianchao Zhang , Janis Fliegenschmidt , Nikolai Hulde , Vera von Dossow , Laurent Meesseman , Kristof Depraetere , Ralph Szymanowsky , Jörg Stieg , Fried-Michael Dahlweid","doi":"10.1016/j.jbi.2024.104692","DOIUrl":"10.1016/j.jbi.2024.104692","url":null,"abstract":"<div><h3>Background</h3><p>An inherent difference exists between male and female bodies, the historical under-representation of females in clinical trials widened this gap in existing healthcare data. The fairness of clinical decision-support tools is at risk when developed based on biased data. This paper aims to quantitatively assess the gender bias in risk prediction models. We aim to generalize our findings by performing this investigation on multiple use cases at different hospitals.</p></div><div><h3>Methods</h3><p>First, we conduct a thorough analysis of the source data to find gender-based disparities. Secondly, we assess the model performance on different gender groups at different hospitals and on different use cases. Performance evaluation is quantified using the area under the receiver-operating characteristic curve (AUROC). Lastly, we investigate the clinical implications of these biases by analyzing the underdiagnosis and overdiagnosis rate, and the decision curve analysis (DCA). We also investigate the influence of model calibration on mitigating gender-related disparities in decision-making processes.</p></div><div><h3>Results</h3><p>Our data analysis reveals notable variations in incidence rates, AUROC, and over-diagnosis rates across different genders, hospitals and clinical use cases. However, it is also observed the underdiagnosis rate is consistently higher in the female population. In general, the female population exhibits lower incidence rates and the models perform worse when applied to this group. Furthermore, the decision curve analysis demonstrates there is no statistically significant difference between the model’s clinical utility across gender groups within the interested range of thresholds.</p></div><div><h3>Conclusion</h3><p>The presence of gender bias within risk prediction models varies across different clinical use cases and healthcare institutions. Although inherent difference is observed between male and female populations at the data source level, this variance does not affect the parity of clinical utility. In conclusion, the evaluations conducted in this study highlight the significance of continuous monitoring of gender-based disparities in various perspectives for clinical risk prediction models.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104692"},"PeriodicalIF":4.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141620028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}