Automated Evaluation of Antibiotic Prescribing Guideline Concordance in Pediatric Sinusitis Clinical Notes.

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Pub Date : 2025-01-01 DOI:10.1142/9789819807024_0011

Davy Weissenbacher, Lauren Dutcher, Mickael Boustany, Leigh Cressman, Karen O'Connor, Keith W Hamilton, Jeffrey Gerber, Robert Grundmeier, Graciela Gonzalez-Hernandez

{"title":"Automated Evaluation of Antibiotic Prescribing Guideline Concordance in Pediatric Sinusitis Clinical Notes.","authors":"Davy Weissenbacher, Lauren Dutcher, Mickael Boustany, Leigh Cressman, Karen O'Connor, Keith W Hamilton, Jeffrey Gerber, Robert Grundmeier, Graciela Gonzalez-Hernandez","doi":"10.1142/9789819807024_0011","DOIUrl":null,"url":null,"abstract":"Background: Ensuring antibiotics are prescribed only when necessary is crucial for maintaining their effectiveness and is a key focus of public health initiatives worldwide. In cases of sinusitis, among the most common reasons for antibiotic prescriptions in children, healthcare providers must distinguish between bacterial and viral causes based on clinical signs and symptoms. However, due to the overlap between symptoms of acute sinusitis and viral upper respiratory infections, antibiotics are often over-prescribed.Objectives: Currently, there are no electronic health record (EHR)-based methods, such as lab tests or ICD-10 codes, to retroactively assess the appropriateness of prescriptions for sinusitis, making manual chart reviews the only available method for evaluation, which is time-intensive and not feasible at a large scale. In this study, we propose using natural language processing to automate this assessment.Methods: We developed, trained, and evaluated generative models to classify the appropriateness of antibiotic prescriptions in 300 clinical notes from pediatric patients with sinusitis seen at a primary care practice in the Children's Hospital of Philadelphia network. We utilized standard prompt engineering techniques, including few-shot learning and chain-of-thought prompting, to refine an initial prompt. Additionally, we employed Parameter-Efficient Fine-Tuning to train a medium-sized generative model Llama 3 70B-instruct.Results: While parameter-efficient fine-tuning did not enhance performance, the combination of few-shot learning and chain-of-thought prompting proved beneficial. Our best results were achieved using the largest generative model publicly available to date, the Llama 3.1 405B-instruct. On our evaluation set, the model correctly identified 94.7% of the 152 notes where antibiotic prescription was appropriate and 66.2% of the 83 notes where it was not appropriate. However, 15 notes that were insufficiently, vaguely, or ambiguously documented by physicians posed a challenge to our model, as none were accurately classified.Conclusion: Our generative model demonstrated good performance in the challenging task of chart review. This level of performance may be sufficient for deploying the model within the EHR, where it can assist physicians in real-time to prescribe antibiotics in concordance with the guidelines, or for monitoring antibiotic stewardship on a large scale.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"138-153"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9789819807024_0011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Ensuring antibiotics are prescribed only when necessary is crucial for maintaining their effectiveness and is a key focus of public health initiatives worldwide. In cases of sinusitis, among the most common reasons for antibiotic prescriptions in children, healthcare providers must distinguish between bacterial and viral causes based on clinical signs and symptoms. However, due to the overlap between symptoms of acute sinusitis and viral upper respiratory infections, antibiotics are often over-prescribed.

Objectives: Currently, there are no electronic health record (EHR)-based methods, such as lab tests or ICD-10 codes, to retroactively assess the appropriateness of prescriptions for sinusitis, making manual chart reviews the only available method for evaluation, which is time-intensive and not feasible at a large scale. In this study, we propose using natural language processing to automate this assessment.

Methods: We developed, trained, and evaluated generative models to classify the appropriateness of antibiotic prescriptions in 300 clinical notes from pediatric patients with sinusitis seen at a primary care practice in the Children's Hospital of Philadelphia network. We utilized standard prompt engineering techniques, including few-shot learning and chain-of-thought prompting, to refine an initial prompt. Additionally, we employed Parameter-Efficient Fine-Tuning to train a medium-sized generative model Llama 3 70B-instruct.

Results: While parameter-efficient fine-tuning did not enhance performance, the combination of few-shot learning and chain-of-thought prompting proved beneficial. Our best results were achieved using the largest generative model publicly available to date, the Llama 3.1 405B-instruct. On our evaluation set, the model correctly identified 94.7% of the 152 notes where antibiotic prescription was appropriate and 66.2% of the 83 notes where it was not appropriate. However, 15 notes that were insufficiently, vaguely, or ambiguously documented by physicians posed a challenge to our model, as none were accurately classified.

Conclusion: Our generative model demonstrated good performance in the challenging task of chart review. This level of performance may be sufficient for deploying the model within the EHR, where it can assist physicians in real-time to prescribe antibiotics in concordance with the guidelines, or for monitoring antibiotic stewardship on a large scale.

查看原文本刊更多论文

自动评估小儿鼻窦炎临床笔记中抗生素处方指南的一致性。

背景：确保仅在必要时开抗生素处方对于保持其有效性至关重要，也是世界各地公共卫生行动的重点。在鼻窦炎的情况下，儿童抗生素处方的最常见原因之一，医疗保健提供者必须根据临床体征和症状区分细菌和病毒原因。然而，由于急性鼻窦炎和病毒性上呼吸道感染的症状重叠，抗生素经常被过量使用。目的：目前，没有基于电子健康记录（EHR）的方法，如实验室测试或ICD-10代码，来追溯评估鼻窦炎处方的适当性，使手动图表审查成为唯一可用的评估方法，这是耗时且不可行的大规模方法。在本研究中，我们建议使用自然语言处理来自动化此评估。方法：我们开发、训练和评估生成模型，对费城儿童医院网络初级保健实践中300例鼻窦炎儿科患者的临床记录进行抗生素处方的适当性分类。我们利用标准的提示工程技术，包括少量学习和思维链提示，来完善初始提示。此外，我们采用参数高效微调来训练中型生成模型Llama 370b - instruction。结果：虽然参数有效的微调并没有提高性能，但少射学习和思维链提示的结合被证明是有益的。我们的最佳结果是使用迄今为止公开可用的最大生成模型Llama 3.1 405b指令。在我们的评估集中，该模型正确识别了152个抗生素处方合适的笔记中的94.7%和83个不合适的笔记中的66.2%。然而，医生记录的不充分、模糊或含糊的15个笔记对我们的模型构成了挑战，因为没有一个是准确分类的。结论：我们的生成模型在具有挑战性的图表评审任务中表现良好。这种性能水平可能足以在电子病历中部署该模型，它可以帮助医生根据指南实时开抗生素处方，或大规模监测抗生素管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Medicine-Medicine (all)

CiteScore

4.50

自引率

0.00%

发文量