medRxiv - Health Informatics最新文献

筛选
英文 中文
Fine-tuning large language models for effective nutrition support in residential aged care: a domain expertise approach 微调大型语言模型,为养老院护理提供有效的营养支持:领域专长方法
medRxiv - Health Informatics Pub Date : 2024-07-21 DOI: 10.1101/2024.07.21.24310775
Mohammad Alkhalaf, Chao Deng, Jun Shen, Hui-Chen (Rita) Chang, Ping Yu
{"title":"Fine-tuning large language models for effective nutrition support in residential aged care: a domain expertise approach","authors":"Mohammad Alkhalaf, Chao Deng, Jun Shen, Hui-Chen (Rita) Chang, Ping Yu","doi":"10.1101/2024.07.21.24310775","DOIUrl":"https://doi.org/10.1101/2024.07.21.24310775","url":null,"abstract":"Purpose: Malnutrition is a serious health concern, particularly among the older people living in residential aged care facilities. An automated and efficient method is required to identify the individuals afflicted with malnutrition in this setting. The recent advancements in transformer-based large language models (LLMs) equipped with sophisticated context-aware embeddings, such as RoBERTa, have significantly improved machine learning performance, particularly in predictive modelling. Enhancing the embeddings of these models on domain-specific corpora, such as clinical notes, is essential for elevating their performance in clinical tasks. Therefore, our study introduces a novel approach that trains a foundational RoBERTa model on nursing progress notes to develop a RAC domain-specific LLM. The model is further fine-tuned on nursing progress notes to enhance malnutrition identification and prediction in residential aged care setting.\u0000Methods: We develop our domain-specific model by training the RoBERTa LLM on 500,000 nursing progress notes from residential aged care electronic health records (EHRs). The model embeddings were used for two downstream tasks: malnutrition note identification and malnutrition prediction. Its performance was compared against baseline RoBERTa and BioClinicalBERT. Furthermore, we truncated long sequence text to fit into RoBERTa 512-token sequence length limitation, enabling our model to handle sequences up to1536 tokens.\u0000Results: Utilizing 5-fold cross-validation for both tasks, our RAC domain-specific LLM demonstrated significantly better performance over other models. In malnutrition note identification, it achieved a slightly higher F1-score of 0.966 compared to other LLMs. In prediction, it achieved significantly higher F1-score of 0.655. We enhanced our model predictive capability by integrating the risk factors extracted from each client notes, creating a combined data layer of structured risk factors and free-text notes. This integration improved the prediction performance, evidenced by an increased F1-score of 0.687.\u0000Conclusion: Our findings suggest that further fine-tuning a large language model on a domain-specific clinical corpus can improve the foundational model performance in clinical tasks. This specialized adaptation significantly improves our domain-specific model performance in tasks such as malnutrition risk identification and malnutrition prediction, making it useful for identifying and predicting malnutrition among older people living in residential aged care or long-term care facilities.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exposomics and Cardiovascular Diseases: A Scoping Review of Machine Learning Approaches 暴露组学与心血管疾病:机器学习方法范围综述
medRxiv - Health Informatics Pub Date : 2024-07-19 DOI: 10.1101/2024.07.19.24310695
Katerina D. Argyri, Ioannis K. Gallos, Angelos Amditis, Dimitra D. Dionysiou
{"title":"Exposomics and Cardiovascular Diseases: A Scoping Review of Machine Learning Approaches","authors":"Katerina D. Argyri, Ioannis K. Gallos, Angelos Amditis, Dimitra D. Dionysiou","doi":"10.1101/2024.07.19.24310695","DOIUrl":"https://doi.org/10.1101/2024.07.19.24310695","url":null,"abstract":"Cardiovascular disease has been established as the world's number one killer, causing over 20 million deaths per year. This fact, along with the growing awareness of the impact of exposomic risk factors on cardiovascular diseases, has led the scientific community to leverage machine learning strategies as a complementary approach to traditional statistical epidemiological studies that are challenged by the highly heterogeneous and dynamic nature of exposomics data. The principal objective served by this work is to identify key pertinent literature and provide an overview of the breadth of research in the field of machine learning applications on exposomics data with a focus on cardiovascular diseases. Secondarily, we aimed at identifying common limitations and meaningful directives to be addressed in the future. Overall, this work shows that, despite the fact that machine learning on exposomics data is under-researched compared to its application on other members of the -omics family, it is increasingly adopted to investigate different aspects of cardiovascular diseases.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141746269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radiotherapy continuity for cancer treatment: lessons learned from natural disasters 癌症放疗的连续性:从自然灾害中汲取的经验教训
medRxiv - Health Informatics Pub Date : 2024-07-19 DOI: 10.1101/2024.07.18.24310636
Ralf Müller-Polyzou, Melanie Reuter-Oppermann
{"title":"Radiotherapy continuity for cancer treatment: lessons learned from natural disasters","authors":"Ralf Müller-Polyzou, Melanie Reuter-Oppermann","doi":"10.1101/2024.07.18.24310636","DOIUrl":"https://doi.org/10.1101/2024.07.18.24310636","url":null,"abstract":"Background:\u0000The contemporary world is challenged by natural disasters accelerated by climate change, affecting a growing world population. Simultaneously, cancer remains a persistent threat as a leading cause of death, killing 10~million people annually. The efficacy of radiotherapy, a cornerstone in cancer treatment worldwide, depends on an uninterrupted course of therapy. However, natural disasters cause significant disruptions to the continuity of radiotherapy services, posing a critical challenge to cancer treatment. This paper explores how natural disasters impact radiotherapy practice, compares them to man-made disasters, and outlines strategies to mitigate adverse effects of natural disasters. Through this analysis, the study seeks to contribute to developing resilient healthcare frameworks capable of sustaining essential cancer treatment amidst the challenges posed by natural disasters.\u0000Method:\u0000We conducted a Structured Literature Review to investigate this matter comprehensively, gathering and evaluating relevant academic publications. We explored how natural disasters affected radiotherapy practice and examined the experience of radiotherapy centres worldwide in resuming operations after such events. Subsequently, we validated and extended our research findings through a global online survey involving radiotherapy professionals.\u0000Results:\u0000The Structured Literature Review identified twelve academic publications describing hurricanes, floods, and earthquakes as the primary disruptors of radiotherapy practice. The analysis confirms and complements risk mitigation themes identified in our previous research, which focused on the continuity of radiotherapy practice during the COVID-19 pandemic. Our work describes nine overarching themes, forming the basis for a taxonomy of 36 distinct groups. The subsequent confirmative online survey supported and solidified our findings and served as a basis for developing a conceptual framework for natural disaster-resilient radiotherapy.\u0000Discussion:\u0000The growing threat posed by natural disasters underscores the need to develop business continuity programs and define risk mitigation measures to ensure the uninterrupted provision of radiotherapy services. By drawing lessons from past disasters, we can better prepare for future hazards, supporting disaster management and planning efforts, particularly enhancing the resilience of radiotherapy practice. Additionally, our study can serve as a resource for shaping policy initiatives aimed at mitigating the impact of natural hazards.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of Ambient Artificial Intelligence Notes on Provider Burnout 环境人工智能笔记对医务人员职业倦怠的影响
medRxiv - Health Informatics Pub Date : 2024-07-19 DOI: 10.1101/2024.07.18.24310656
Jason MIsurac, Lindsey A Knake, James M Blum
{"title":"Impact of Ambient Artificial Intelligence Notes on Provider Burnout","authors":"Jason MIsurac, Lindsey A Knake, James M Blum","doi":"10.1101/2024.07.18.24310656","DOIUrl":"https://doi.org/10.1101/2024.07.18.24310656","url":null,"abstract":"Background: Healthcare provider burnout is a critical issue with significant implications for individual well-being, patient care, and healthcare system efficiency. Addressing burnout is essential for improving both provider well-being and the quality of patient care. Ambient artificial intelligence (AI) offers a novel approach to mitigating burnout by reducing the documentation burden through advanced speech recognition and natural language processing technologies that summarize the patient encounter into a clinical note to be reviewed by clinicians.\u0000Objective: To assess provider burnout and professional fulfilment associated with Ambient AI technology during a pilot study, assessed using the Stanford Professional Fulfillment Index (PFI). Methods: A pre-post observational study was conducted at University of Iowa Health Care with 38 volunteer physicians and advanced practice providers. Participants used a commercial ambient AI tool, over a 5-week trial in ambulatory environments. The AI tool transcribed patient-clinician conversations and generated preliminary clinical notes for review and entry into the electronic medical record. Burnout and professional fulfillment were assessed using the Stanford PFI at baseline and post-intervention. Results: Pre-test and post-test surveys were completed by 35/38 participants (92% survey completion rate). Results showed a significant reduction in burnout scores, with the median burnout score improving from 4.16 to 3.16 (p=0.005), with validated Stanford PFI cutoff for overall burnout 3.33. Burnout rates decreased from 69% to 43%. There was a notable improvement in interpersonal disengagement scores (3.6 vs. 2.5, p<0.001), although work exhaustion scores did not significantly change. Professional fulfillment showed a modest, non-significant increase (6.1 vs. 6.5, p=0.10). Conclusions: Ambient AI significantly reduces healthcare provider burnout and modestly enhances professional fulfillment. By alleviating documentation burdens, ambient AI improves operational efficiency and provider well-being. These findings suggest that broader implementation of ambient AI could be a strategic intervention to combat burnout in healthcare settings.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protocol for: A Simple, Accessible, Literature-based Drug Repurposing Pipeline 协议:基于文献的简单、易用的药物再利用管道
medRxiv - Health Informatics Pub Date : 2024-07-19 DOI: 10.1101/2024.07.18.24310641
Maximin Lange, Eoin Gogarty, Meredith Martyn, Philip Braude, Feras Fayez, Ben Carter
{"title":"Protocol for: A Simple, Accessible, Literature-based Drug Repurposing Pipeline","authors":"Maximin Lange, Eoin Gogarty, Meredith Martyn, Philip Braude, Feras Fayez, Ben Carter","doi":"10.1101/2024.07.18.24310641","DOIUrl":"https://doi.org/10.1101/2024.07.18.24310641","url":null,"abstract":"We will develop a novel approach to drug repurposing, utilising Natural Language Processing (NLP) and Literature Based Discovery (LBD) techniques. This will present a simplified, accessible drug repurposing pipeline using Word2Vec embeddings trained on PubMed abstracts to identify potential new medications to be repurposed. We present this approach in the context of antipsychotics, but it could be repeated for any available medication. The research is structured in three stages:\u00001. Identification of candidate medications using Word2Vec algorithm trained on scientific literature.\u00002. Empirical testing of identified candidates using a large hospital dataset to explore protective effects against disease onset.\u00003. Validation of findings using a second, independent dataset to assess generalizability. This method addresses limitations in current machine learning-based drug repurposing approaches, including lack of external validation and limited accessibility. By leveraging Word2Vec's ability to capture semantic relationships between words, the study aims to uncover hidden connections in medical literature that may lead to novel therapeutic discoveries. The protocol emphasizes transparency and reproducibility, utilizing publicly available electronic health record (EHR) databases for validation. This approach allows for tangible results even for researchers with limited machine learning expertise, bridging the gap between biomedical and information systems communities.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable Machine Learning for Predicting Multiple Sclerosis Conversion from Clinically Isolated Syndrome 用于预测多发性硬化症从临床孤立综合征转归的可解释机器学习
medRxiv - Health Informatics Pub Date : 2024-07-19 DOI: 10.1101/2024.07.18.24310578
Eden Caroline Daniel, SANTOSH TIRUNAGARI, Karan Batth, David Windridge, Yashaswini Balla
{"title":"Interpretable Machine Learning for Predicting Multiple Sclerosis Conversion from Clinically Isolated Syndrome","authors":"Eden Caroline Daniel, SANTOSH TIRUNAGARI, Karan Batth, David Windridge, Yashaswini Balla","doi":"10.1101/2024.07.18.24310578","DOIUrl":"https://doi.org/10.1101/2024.07.18.24310578","url":null,"abstract":"Background: Machine learning (ML) prediction of clinically isolated syndrome (CIS) conversion to multiple sclerosis (MS) could be used as a remote, preliminary tool by clinicians to identify high-risk patients that would benefit from early treatment. Objective: This study evaluates ML models to predict CIS to MS conversion and identifies key predictors. Methods: Five supervised learning techniques (Naive Bayes, Logistic Regression, Decision Trees, Random Forests and Support Vector Machines) were applied to clinical data from 138 Lithuanian and 273 Mexican CIS patients. Seven different feature combinations were evaluated to determine the most effective models and predictors. Results: Key predictors common to both datasets included sex, presence of oligoclonal bands in CSF, MRI spinal lesions, abnormal visual evoked potentials and brainstem auditory evoked potentials. The Lithuanian dataset confirmed predictors identified by previous clinical research, while the Mexican dataset partially validated them. The highest F1 score of 1.0 was achieved using Random Forests on all features for the Mexican dataset and Logistic Regression with SMOTE Upsampling on all features for the Lithuanian dataset. Conclusion: Applying the identified high-performing ML models to the CIS patient datasets shows potential in assisting clinicians to identify high-risk patients.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141746271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
His-MMDM: Multi-domain and Multi-omics Translation of Histopathology Images with Diffusion Models His-MMDM:利用扩散模型对组织病理学图像进行多域和多组学转换
medRxiv - Health Informatics Pub Date : 2024-07-12 DOI: 10.1101/2024.07.11.24310294
Zhongxiao Li, Tianqi Su, Bin Zhang, Wenkai Han, Sibin Zhang, Guiyin Sun, Yuwei Cong, Xin Chen, Jiping Qi, Yujie Wang, Shiguang Zhao, Hongxue Meng, Peng Liang, Xin Gao
{"title":"His-MMDM: Multi-domain and Multi-omics Translation of Histopathology Images with Diffusion Models","authors":"Zhongxiao Li, Tianqi Su, Bin Zhang, Wenkai Han, Sibin Zhang, Guiyin Sun, Yuwei Cong, Xin Chen, Jiping Qi, Yujie Wang, Shiguang Zhao, Hongxue Meng, Peng Liang, Xin Gao","doi":"10.1101/2024.07.11.24310294","DOIUrl":"https://doi.org/10.1101/2024.07.11.24310294","url":null,"abstract":"Generative AI (GenAI) has advanced computational pathology through various image translation models. These models synthesize histopathological images from existing ones, facilitating tasks such as color normalization and virtual staining. Current models, while effective, are mostly dedicated to specific source-target domain pairs and lack scalability for multi-domain translations. Here we introduce His-MMDM, a diffusion model-based framework enabling multi-domain and multi-omics histopathological image translation. His-MMDM can translate images across an unlimited number of categorical domains, enabling new applications like the translation of tumor images across various tumor types, while performing comparably to dedicated models on previous tasks such as transforming cryosectioned images to formalin-fixed paraffin-embedded (FFPE) ones. Additionally, it can perform genomics- and/or transcriptomics-guided editing of histopathological images, illustrating the impact of driver mutations and oncogenic pathway alterations on tissue histopathology. These versatile capabilities position His-MMDM as a versatile tool in the GenAI toolkit for future pathologists.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pretrained Language Models for Semantics-Aware Data Harmonisation of Observational Clinical Studies in the Era of Big Data 用于大数据时代临床观察研究语义感知数据协调的预训练语言模型
medRxiv - Health Informatics Pub Date : 2024-07-12 DOI: 10.1101/2024.07.12.24310136
Jakub Jan Dylag, Zlatko Zlatev, Michael Boniface
{"title":"Pretrained Language Models for Semantics-Aware Data Harmonisation of Observational Clinical Studies in the Era of Big Data","authors":"Jakub Jan Dylag, Zlatko Zlatev, Michael Boniface","doi":"10.1101/2024.07.12.24310136","DOIUrl":"https://doi.org/10.1101/2024.07.12.24310136","url":null,"abstract":"In clinical research, there is a strong drive to leverage big data from population cohort studies and routine electronic healthcare records to design new interventions, improve health outcomes and increase efficiency of healthcare delivery. Yet, realising these potential demands requires substantial efforts in harmonising source datasets and curating study data, which currently relies on costly and time-consuming manual and labour-intensive methods. We evaluate the applicability of AI methods for natural language processing (NLP) and unsupervised machine learning (ML) to the challenges of big data semantic harmonisation and curation. Our aim is to establish an efficient and robust technological foundation for the development of automated tools supporting data curation of large clinical datasets. We assess NLP and unsupervised ML algorithms and propose two pipelines for automated semantic harmonisation: a pipeline for semantics-aware search for domain relevant variables and a pipeline for clustering of semantically similar variables. We evaluate pipeline performance using 94,037 textual variable descriptions from the English Longitudinal Study of Ageing (ELSA) database. We observe high accuracy of our Semantic Search pipeline with an AUC of 0.899 (SD=0.056). Our Semantic Clustering pipeline achieves a V-measure of 0.237 (SD=0.157), which is on par with leading implementations in other relevant domains. Automation can significantly accelerate the process of dataset harmonization. Manual labelling was performed at a speed of 2.1 descriptions per minute, with our automated labelling increasing speed to 245 descriptions per minute. Our study findings underscore the potential of AI technologies, such as NLP and unsupervised ML, in automating the harmonisation and curation of big data for clinical research. By establishing a robust technological foundation, we pave the way for the development of automated tools that streamline the process, enabling health data scientists to leverage big data more efficiently and effectively in their studies, accelerating insights from data for clinical benefit.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of Race, Sex, and Language Proficiency Disparities in Documented Medical Decisions 有据可查的医疗决定中的种族、性别和语言能力差异分析
medRxiv - Health Informatics Pub Date : 2024-07-12 DOI: 10.1101/2024.07.11.24310289
Hadi Amiri, Nidhi Vakil, Mohamed Elgaar, Jiali Cheng, Mitra Mohtarami, Adrian Wong, Mehrnaz Sadrolashrafi, Leo Anthony G. Celi
{"title":"Analysis of Race, Sex, and Language Proficiency Disparities in Documented Medical Decisions","authors":"Hadi Amiri, Nidhi Vakil, Mohamed Elgaar, Jiali Cheng, Mitra Mohtarami, Adrian Wong, Mehrnaz Sadrolashrafi, Leo Anthony G. Celi","doi":"10.1101/2024.07.11.24310289","DOIUrl":"https://doi.org/10.1101/2024.07.11.24310289","url":null,"abstract":"Abstract\u0000Importance: Detecting potential disparities in documented medical decisions is a crucial step toward achieving more equitable practices and care, informing healthcare policy making, and preventing computational models from learning and perpetuating such biases. Objective: To identify disparities associated with race, sex and language proficiency of patients in the documentation of medical decisions. Design: This cross-sectional study included 451 discharge summaries from MIMIC-III, with all medical decisions annotated by domain experts according to the 10 medical decision categories defined in the Decision Identification and Classification Taxonomy for Use in Medicine. Annotated discharge summaries were stratified by race, sex, language proficiency, diagnosis codes, type of ICU, patient status code, and patient comorbidities (quantified by Elixhauser Comorbidity Index) to account for potential confounding factors. Welch's t-test with Bonferroni correction was used to identify significant disparities in the frequency of medical decisions. Setting: The study used the MIMIC-III data set, which contains de-identified health data for patients admitted to the critical care units at the Beth Israel Deaconess Medical Center. Participants: The population reflects the race, sex, and clinical conditions of patients in a data set developed by previous work for patient phenotyping. Main Outcomes and Measures: The primary outcomes were different types of disparities associated with language proficiency of patients in documented medical decisions within discharge summaries, and the secondary outcome was the prevalence of medical decisions documented in discharge summaries. The data set will be made available at https://physionet.org/ Results: This study analyzed 56,759 medical decision text segments documented in 451 discharge summaries. Analysis across demographic groups revealed a higher documentation frequency for English proficient patients compared to non-English proficient patients in several categories, suggesting potential disparities in documentation or care. Specifically, English proficient patients consistently had more documented decisions in critical decision categories such as \"Defining Problem\" in conditions related to circulatory system and endocrine, nutritional and metabolic diseases. However, this study found no significant disparities in medical decision documentation based on sex or race. Conclusions and Relevance: This study illustrates disparities in the documentation of medical decisions, with English proficient patients receiving more comprehensive documentation compared to non-English proficient patients. Conversely, no significant disparity was identified in terms of sex or race. These findings suggest a potential need for targeted interventions to improve the equity of medical documentation practices so that all patients receive the same level of detailed care documentation and prevent computational models from learning and ","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Machine Learning-Based Prediction of Hospital Mortality in Mechanically Ventilated ICU Patients 基于机器学习的机械通气 ICU 患者住院死亡率预测方法
medRxiv - Health Informatics Pub Date : 2024-07-12 DOI: 10.1101/2024.07.12.24310325
Hexin Li, Negin Ashrafi, Chris Kang, Guanlan Zhao, Yubing Chen, Maryam Pishgar
{"title":"A Machine Learning-Based Prediction of Hospital Mortality in Mechanically Ventilated ICU Patients","authors":"Hexin Li, Negin Ashrafi, Chris Kang, Guanlan Zhao, Yubing Chen, Maryam Pishgar","doi":"10.1101/2024.07.12.24310325","DOIUrl":"https://doi.org/10.1101/2024.07.12.24310325","url":null,"abstract":"Background:\u0000Mechanical ventilation (MV) is vital for critically ill ICU patients but carries significant mortality risks. This study aims to develop a predictive model to estimate hospital mortality among MV patients, utilizing comprehensive health data to assist ICU physicians with early-stage alerts. Methods:\u0000We developed a Machine Learning (ML) framework to predict hospital mortality in ICU patients receiving MV. Using the MIMIC-III database, we identified 25,202 eligible patients through ICD-9 codes. We employed backward elimination and the Lasso method, selecting 32 features based on clinical insights and literature. Data preprocessing included eliminating columns with over 90% missing data and using mean imputation for the remaining missing values. To address class imbalance, we used the Synthetic Minority Over-sampling Technique (SMOTE). We evaluated several ML models, including CatBoost, XGBoost, Decision Tree, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Logistic Regression, using a 70/30 train-test split. The CatBoost model was chosen for its superior performance in terms of accuracy, precision, recall, F1-score, AUROC metrics, and calibration plots. Results:\u0000The study involved a cohort of 25,202 patients on MV. The CatBoost model attained an AUROC of 0.862, an increase from an initial AUROC of 0.821, which was the best reported in the literature. It also demonstrated an accuracy of 0.789, an F1-score of 0.747, and better calibration, outperforming other models. These improvements are due to systematic feature selection and the robust gradient boosting architecture of CatBoost. Conclusion:\u0000The preprocessing methodology significantly reduced the number of relevant features, simplifying computational processes, and identified critical features previously overlooked. Integrating these features and tuning the parameters, our model demonstrated strong generalization to unseen data. This highlights the potential of ML as a crucial tool in ICUs, enhancing resource allocation and providing more personalized interventions for MV patients.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信