{"title":"使用大型语言模型进行紧急医疗服务的幼儿虐待性头部创伤的相关因素","authors":"Allison Broad, Xiao Luo, Fattah Muhammad Tahabi, Denise Abdoo, Zhan Zhang, Kathleen Adelgais","doi":"10.1080/10903127.2025.2451209","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Abusive head trauma (AHT) is a leading cause of death in young children. Analyses of patient characteristics presenting to Emergency Medical Services (EMS) are often limited to structured data fields. Artificial Intelligence (AI) and Large Language Models (LLM) may identify rare presentations like AHT through factors not found in structured data. Our goal was to apply AI and LLM to EMS narrative documentation of young children to detect AHT.</p><p><strong>Methods: </strong>This is a retrospective cohort study of EMS transports of children <36 months of age with a diagnosis of head injury from the 2018-2019 ESO Research Data Collaborative. Non-abusive closed head injury (NA-CHI) was distinguished from AHT and child maltreatment (AHT-CAN) through 2 expert reviewers; kappa statistic (k) assessed inter-rater reliability. A Natural Language Processing (NLP) framework using an LLM augmented with expert derived n-grams was developed to identify AHT-CAN. We compared test characteristics (sensitivity, specificity, negative predictive value (NPV)) between this NLP framework to a Generative Pretrained Transformer (GPT) or n-grams only models to detect AHT-CAN. Association of specific word tokens with AHT-CAN was analyzed using Pearson's chi-square. Area Under the Receiver Operator Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC) are also reported.</p><p><strong>Results: </strong>There were 1082 encounters in our cohort; 1030 (95.2%) NA-CHI and 52 (4.8%) AHT-CAN. Inter-rater agreement was substantial (<i>k</i> = 0.71). The augmented NLP framework had a specificity and sensitivity of 72.4% and 92.3%, respectively with a NPV of 99.5%. In comparison, the GPT model had a sensitivity of 69.2%, specificity of 97.1% and NPV of 98.4% and n-grams alone had a sensitivity of 53.8%, specificity of 62.0%, NPV of 96.4%. AUROC was 0.91 and AUPRC was 0.52. A total of 44 n-grams and bi-grams were positively associated with AHT-CAN including \"domestic,\" \"various,\" \"bruise,\" \"cheek,\" \"multiple,\" \"doa,\" \"not respond,\" \"see EMS.\"</p><p><strong>Conclusions: </strong>AI and LLMs have high sensitivity and specificity to detect AHT-CAN in EMS free-text narratives. Words associated with physical signs of trauma are strongly associated with AHT-CAN. LLMs augmented with a list of n-grams may help EMS identify signs of trauma that aid in the detection of AHT in young children.</p>","PeriodicalId":20336,"journal":{"name":"Prehospital Emergency Care","volume":" ","pages":"1-11"},"PeriodicalIF":2.1000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Factors Associated with Abusive Head Trauma in Young Children Presenting to Emergency Medical Services Using a Large Language Model.\",\"authors\":\"Allison Broad, Xiao Luo, Fattah Muhammad Tahabi, Denise Abdoo, Zhan Zhang, Kathleen Adelgais\",\"doi\":\"10.1080/10903127.2025.2451209\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>Abusive head trauma (AHT) is a leading cause of death in young children. Analyses of patient characteristics presenting to Emergency Medical Services (EMS) are often limited to structured data fields. Artificial Intelligence (AI) and Large Language Models (LLM) may identify rare presentations like AHT through factors not found in structured data. Our goal was to apply AI and LLM to EMS narrative documentation of young children to detect AHT.</p><p><strong>Methods: </strong>This is a retrospective cohort study of EMS transports of children <36 months of age with a diagnosis of head injury from the 2018-2019 ESO Research Data Collaborative. Non-abusive closed head injury (NA-CHI) was distinguished from AHT and child maltreatment (AHT-CAN) through 2 expert reviewers; kappa statistic (k) assessed inter-rater reliability. A Natural Language Processing (NLP) framework using an LLM augmented with expert derived n-grams was developed to identify AHT-CAN. We compared test characteristics (sensitivity, specificity, negative predictive value (NPV)) between this NLP framework to a Generative Pretrained Transformer (GPT) or n-grams only models to detect AHT-CAN. Association of specific word tokens with AHT-CAN was analyzed using Pearson's chi-square. Area Under the Receiver Operator Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC) are also reported.</p><p><strong>Results: </strong>There were 1082 encounters in our cohort; 1030 (95.2%) NA-CHI and 52 (4.8%) AHT-CAN. Inter-rater agreement was substantial (<i>k</i> = 0.71). The augmented NLP framework had a specificity and sensitivity of 72.4% and 92.3%, respectively with a NPV of 99.5%. In comparison, the GPT model had a sensitivity of 69.2%, specificity of 97.1% and NPV of 98.4% and n-grams alone had a sensitivity of 53.8%, specificity of 62.0%, NPV of 96.4%. AUROC was 0.91 and AUPRC was 0.52. A total of 44 n-grams and bi-grams were positively associated with AHT-CAN including \\\"domestic,\\\" \\\"various,\\\" \\\"bruise,\\\" \\\"cheek,\\\" \\\"multiple,\\\" \\\"doa,\\\" \\\"not respond,\\\" \\\"see EMS.\\\"</p><p><strong>Conclusions: </strong>AI and LLMs have high sensitivity and specificity to detect AHT-CAN in EMS free-text narratives. Words associated with physical signs of trauma are strongly associated with AHT-CAN. LLMs augmented with a list of n-grams may help EMS identify signs of trauma that aid in the detection of AHT in young children.</p>\",\"PeriodicalId\":20336,\"journal\":{\"name\":\"Prehospital Emergency Care\",\"volume\":\" \",\"pages\":\"1-11\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Prehospital Emergency Care\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/10903127.2025.2451209\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EMERGENCY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prehospital Emergency Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/10903127.2025.2451209","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0
摘要
目的:虐待性头部创伤(AHT)是幼儿死亡的主要原因。紧急医疗服务(EMS)对患者特征的分析通常局限于结构化数据字段。人工智能(AI)和大型语言模型(LLM)可能会通过结构化数据中没有发现的因素来识别像AHT这样的罕见表现。我们的目标是将AI和LLM应用于幼儿的EMS叙事文件中以检测AHT。方法:这是一项关于儿童急诊转运的回顾性队列研究。结果:我们的队列中有1082例遭遇;NA-CHI 1030例(95.2%),ah - can 52例(4.8%)。评分者之间的一致性是显著的(k= 0.71)。增强NLP框架的特异性和敏感性分别为72.4%和92.3%,NPV为99.5%。相比之下,GPT模型的敏感性为69.2%,特异性为97.1%,NPV为98.4%,单独使用n-g模型的敏感性为53.8%,特异性为62.0%,NPV为96.4%。AUROC为0.91,AUPRC为0.52。共有44个n-gram和bi-gram与AHT-CAN呈正相关,包括“domestic”、“各种”、“挫伤”、“cheek”、“multiple”、“doa”、“not response”、“see EMS”。结论:人工智能和llm检测EMS自由文本叙事中AHT-CAN具有较高的敏感性和特异性。与创伤体征相关的词语与AHT-CAN密切相关。带有n-gram列表的LLMs增强可能有助于EMS识别创伤迹象,有助于检测幼儿的AHT。
Factors Associated with Abusive Head Trauma in Young Children Presenting to Emergency Medical Services Using a Large Language Model.
Objectives: Abusive head trauma (AHT) is a leading cause of death in young children. Analyses of patient characteristics presenting to Emergency Medical Services (EMS) are often limited to structured data fields. Artificial Intelligence (AI) and Large Language Models (LLM) may identify rare presentations like AHT through factors not found in structured data. Our goal was to apply AI and LLM to EMS narrative documentation of young children to detect AHT.
Methods: This is a retrospective cohort study of EMS transports of children <36 months of age with a diagnosis of head injury from the 2018-2019 ESO Research Data Collaborative. Non-abusive closed head injury (NA-CHI) was distinguished from AHT and child maltreatment (AHT-CAN) through 2 expert reviewers; kappa statistic (k) assessed inter-rater reliability. A Natural Language Processing (NLP) framework using an LLM augmented with expert derived n-grams was developed to identify AHT-CAN. We compared test characteristics (sensitivity, specificity, negative predictive value (NPV)) between this NLP framework to a Generative Pretrained Transformer (GPT) or n-grams only models to detect AHT-CAN. Association of specific word tokens with AHT-CAN was analyzed using Pearson's chi-square. Area Under the Receiver Operator Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC) are also reported.
Results: There were 1082 encounters in our cohort; 1030 (95.2%) NA-CHI and 52 (4.8%) AHT-CAN. Inter-rater agreement was substantial (k = 0.71). The augmented NLP framework had a specificity and sensitivity of 72.4% and 92.3%, respectively with a NPV of 99.5%. In comparison, the GPT model had a sensitivity of 69.2%, specificity of 97.1% and NPV of 98.4% and n-grams alone had a sensitivity of 53.8%, specificity of 62.0%, NPV of 96.4%. AUROC was 0.91 and AUPRC was 0.52. A total of 44 n-grams and bi-grams were positively associated with AHT-CAN including "domestic," "various," "bruise," "cheek," "multiple," "doa," "not respond," "see EMS."
Conclusions: AI and LLMs have high sensitivity and specificity to detect AHT-CAN in EMS free-text narratives. Words associated with physical signs of trauma are strongly associated with AHT-CAN. LLMs augmented with a list of n-grams may help EMS identify signs of trauma that aid in the detection of AHT in young children.
期刊介绍:
Prehospital Emergency Care publishes peer-reviewed information relevant to the practice, educational advancement, and investigation of prehospital emergency care, including the following types of articles: Special Contributions - Original Articles - Education and Practice - Preliminary Reports - Case Conferences - Position Papers - Collective Reviews - Editorials - Letters to the Editor - Media Reviews.