Lipi Mishra, Sowmya Muchukunte Ramaswamy, Broderick Ivan McCallum-Hee, Keaton Wright, Riley Croxford, Sunil Belur Nagaraj, Matthew Anstey
{"title":"使用自然语言处理从非结构化电子健康记录中自动预测败血症:一项回顾性队列研究。","authors":"Lipi Mishra, Sowmya Muchukunte Ramaswamy, Broderick Ivan McCallum-Hee, Keaton Wright, Riley Croxford, Sunil Belur Nagaraj, Matthew Anstey","doi":"10.1136/bmjhci-2024-101354","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Artificial intelligence (AI) holds promise for predicting sepsis. However, challenges remain in integrating AI, natural language processing (NLP) and free text data to enhance sepsis diagnosis at emergency department (ED) triage. This study aimed to evaluate the effectiveness of AI in improving sepsis diagnosis.</p><p><strong>Methods: </strong>This retrospective cohort study analysed data from 134 266 patients admitted to the ED and subsequently hospitalised between 1 January 2016 and 31 December 2021. The data set comprised 10 variables and free-text triage comments, which underwent tokenisation and processing using a bag-of-words model. We evaluated four traditional NLP classifier models, including logistic regression, LightGBM, random forest and neural network. We also evaluated the performance of the BERT classifier. We used area under precision-recall curve (AUPRC) and area under the curve (AUC) as performance metrics.</p><p><strong>Results: </strong>Random forest exhibited superior predictive performance with an AUPRC of 0.789 (95% CI: 0.7668 to 0.8018) and an AUC of 0.80 (95% CI: 0.7842 to 0.8173). Using raw text, the BERT model achieved an AUPRC of 0.7542 (95% CI: 0.7418 to 0.7741) and AUC of 0.7735 (95% CI: 0.7628 to 0.8017) for sepsis prediction. Key variables included ED treatment time, patient age, arrival-to-treatment time, Australasian Triage Scale and visit type.</p><p><strong>Discussion: </strong>This study demonstrates AI, particularly random forest and BERT classifiers, for early sepsis detection in EDs using free-text patient concerns.</p><p><strong>Conclusion: </strong>Incorporating free text into machine learning improved diagnosis and identified missed cases, enhancing sepsis prediction in the ED with an AI-powered clinical decision support system. Large, prospective studies are needed to validate these findings.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12434735/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automated sepsis prediction from unstructured electronic health records using natural language processing: a retrospective cohort study.\",\"authors\":\"Lipi Mishra, Sowmya Muchukunte Ramaswamy, Broderick Ivan McCallum-Hee, Keaton Wright, Riley Croxford, Sunil Belur Nagaraj, Matthew Anstey\",\"doi\":\"10.1136/bmjhci-2024-101354\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>Artificial intelligence (AI) holds promise for predicting sepsis. However, challenges remain in integrating AI, natural language processing (NLP) and free text data to enhance sepsis diagnosis at emergency department (ED) triage. This study aimed to evaluate the effectiveness of AI in improving sepsis diagnosis.</p><p><strong>Methods: </strong>This retrospective cohort study analysed data from 134 266 patients admitted to the ED and subsequently hospitalised between 1 January 2016 and 31 December 2021. The data set comprised 10 variables and free-text triage comments, which underwent tokenisation and processing using a bag-of-words model. We evaluated four traditional NLP classifier models, including logistic regression, LightGBM, random forest and neural network. We also evaluated the performance of the BERT classifier. We used area under precision-recall curve (AUPRC) and area under the curve (AUC) as performance metrics.</p><p><strong>Results: </strong>Random forest exhibited superior predictive performance with an AUPRC of 0.789 (95% CI: 0.7668 to 0.8018) and an AUC of 0.80 (95% CI: 0.7842 to 0.8173). Using raw text, the BERT model achieved an AUPRC of 0.7542 (95% CI: 0.7418 to 0.7741) and AUC of 0.7735 (95% CI: 0.7628 to 0.8017) for sepsis prediction. Key variables included ED treatment time, patient age, arrival-to-treatment time, Australasian Triage Scale and visit type.</p><p><strong>Discussion: </strong>This study demonstrates AI, particularly random forest and BERT classifiers, for early sepsis detection in EDs using free-text patient concerns.</p><p><strong>Conclusion: </strong>Incorporating free text into machine learning improved diagnosis and identified missed cases, enhancing sepsis prediction in the ED with an AI-powered clinical decision support system. Large, prospective studies are needed to validate these findings.</p>\",\"PeriodicalId\":9050,\"journal\":{\"name\":\"BMJ Health & Care Informatics\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12434735/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Health & Care Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjhci-2024-101354\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Automated sepsis prediction from unstructured electronic health records using natural language processing: a retrospective cohort study.
Objective: Artificial intelligence (AI) holds promise for predicting sepsis. However, challenges remain in integrating AI, natural language processing (NLP) and free text data to enhance sepsis diagnosis at emergency department (ED) triage. This study aimed to evaluate the effectiveness of AI in improving sepsis diagnosis.
Methods: This retrospective cohort study analysed data from 134 266 patients admitted to the ED and subsequently hospitalised between 1 January 2016 and 31 December 2021. The data set comprised 10 variables and free-text triage comments, which underwent tokenisation and processing using a bag-of-words model. We evaluated four traditional NLP classifier models, including logistic regression, LightGBM, random forest and neural network. We also evaluated the performance of the BERT classifier. We used area under precision-recall curve (AUPRC) and area under the curve (AUC) as performance metrics.
Results: Random forest exhibited superior predictive performance with an AUPRC of 0.789 (95% CI: 0.7668 to 0.8018) and an AUC of 0.80 (95% CI: 0.7842 to 0.8173). Using raw text, the BERT model achieved an AUPRC of 0.7542 (95% CI: 0.7418 to 0.7741) and AUC of 0.7735 (95% CI: 0.7628 to 0.8017) for sepsis prediction. Key variables included ED treatment time, patient age, arrival-to-treatment time, Australasian Triage Scale and visit type.
Discussion: This study demonstrates AI, particularly random forest and BERT classifiers, for early sepsis detection in EDs using free-text patient concerns.
Conclusion: Incorporating free text into machine learning improved diagnosis and identified missed cases, enhancing sepsis prediction in the ED with an AI-powered clinical decision support system. Large, prospective studies are needed to validate these findings.