将经过微调的转换语言模型推广到新的临床环境中。

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES

JAMIA Open Pub Date : 2023-08-16 eCollection Date: 2023-10-01 DOI:10.1093/jamiaopen/ooad070

Kevin Xie, Samuel W Terman, Ryan S Gallagher, Chloe E Hill, Kathryn A Davis, Brian Litt, Dan Roth, Colin A Ellis

{"title":"将经过微调的转换语言模型推广到新的临床环境中。","authors":"Kevin Xie, Samuel W Terman, Ryan S Gallagher, Chloe E Hill, Kathryn A Davis, Brian Litt, Dan Roth, Colin A Ellis","doi":"10.1093/jamiaopen/ooad070","DOIUrl":null,"url":null,"abstract":"Objective: We have previously developed a natural language processing pipeline using clinical notes written by epilepsy specialists to extract seizure freedom, seizure frequency text, and date of last seizure text for patients with epilepsy. It is important to understand how our methods generalize to new care contexts.Materials and methods: We evaluated our pipeline on unseen notes from nonepilepsy-specialist neurologists and non-neurologists without any additional algorithm training. We tested the pipeline out-of-institution using epilepsy specialist notes from an outside medical center with only minor preprocessing adaptations. We examined reasons for discrepancies in performance in new contexts by measuring physical and semantic similarities between documents.Results: Our ability to classify patient seizure freedom decreased by at least 0.12 agreement when moving from epilepsy specialists to nonspecialists or other institutions. On notes from our institution, textual overlap between the extracted outcomes and the gold standard annotations attained from manual chart review decreased by at least 0.11 F1 when an answer existed but did not change when no answer existed; here our models generalized on notes from the outside institution, losing at most 0.02 agreement. We analyzed textual differences and found that syntactic and semantic differences in both clinically relevant sentences and surrounding contexts significantly influenced model performance.Discussion and conclusion: Model generalization performance decreased on notes from nonspecialists; out-of-institution generalization on epilepsy specialist notes required small changes to preprocessing but was especially good for seizure frequency text and date of last seizure text, opening opportunities for multicenter collaborations using these outcomes.","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 3","pages":"ooad070"},"PeriodicalIF":2.5000,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432353/pdf/","citationCount":"0","resultStr":"{\"title\":\"Generalization of finetuned transformer language models to new clinical contexts.\",\"authors\":\"Kevin Xie, Samuel W Terman, Ryan S Gallagher, Chloe E Hill, Kathryn A Davis, Brian Litt, Dan Roth, Colin A Ellis\",\"doi\":\"10.1093/jamiaopen/ooad070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: We have previously developed a natural language processing pipeline using clinical notes written by epilepsy specialists to extract seizure freedom, seizure frequency text, and date of last seizure text for patients with epilepsy. It is important to understand how our methods generalize to new care contexts.Materials and methods: We evaluated our pipeline on unseen notes from nonepilepsy-specialist neurologists and non-neurologists without any additional algorithm training. We tested the pipeline out-of-institution using epilepsy specialist notes from an outside medical center with only minor preprocessing adaptations. We examined reasons for discrepancies in performance in new contexts by measuring physical and semantic similarities between documents.Results: Our ability to classify patient seizure freedom decreased by at least 0.12 agreement when moving from epilepsy specialists to nonspecialists or other institutions. On notes from our institution, textual overlap between the extracted outcomes and the gold standard annotations attained from manual chart review decreased by at least 0.11 F1 when an answer existed but did not change when no answer existed; here our models generalized on notes from the outside institution, losing at most 0.02 agreement. We analyzed textual differences and found that syntactic and semantic differences in both clinically relevant sentences and surrounding contexts significantly influenced model performance.Discussion and conclusion: Model generalization performance decreased on notes from nonspecialists; out-of-institution generalization on epilepsy specialist notes required small changes to preprocessing but was especially good for seizure frequency text and date of last seizure text, opening opportunities for multicenter collaborations using these outcomes.\",\"PeriodicalId\":36278,\"journal\":{\"name\":\"JAMIA Open\",\"volume\":\"6 3\",\"pages\":\"ooad070\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432353/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMIA Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamiaopen/ooad070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/10/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooad070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

目的：我们之前开发了一个自然语言处理管道，使用癫痫专家编写的临床笔记来提取癫痫患者的癫痫发作自由度、发作频率文本和上次发作日期文本。了解我们的方法如何推广到新的护理环境中是很重要的。材料和方法：我们在没有任何额外算法训练的情况下，根据非癫痫病专家神经学家和非神经学家的未公开笔记评估了我们的管道。我们使用来自外部医疗中心的癫痫专家笔记测试了机构外的管道，只进行了轻微的预处理调整。我们通过测量文档之间的物理和语义相似性，研究了在新环境中表现差异的原因。结果：当从癫痫专家转到非专家或其他机构时，我们对患者癫痫发作自由度的分类能力至少降低了0.12。根据我们机构的笔记，当有答案时，提取的结果和手动图表审查获得的金标准注释之间的文本重叠至少减少了0.11 F1，但当没有答案时没有变化；在这里，我们的模型在来自外部机构的注释上进行了推广，最多损失0.02的一致性。我们分析了文本差异，发现临床相关句子和周围环境中的句法和语义差异都会显著影响模型的性能。讨论和结论：非专业人员的笔记使模型泛化性能下降；癫痫专家笔记的机构外概括需要对预处理进行小的更改，但对癫痫发作频率文本和上次癫痫发作日期文本尤其有利，为利用这些结果进行多中心合作提供了机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Generalization of finetuned transformer language models to new clinical contexts.

查看原文本刊更多论文

Generalization of finetuned transformer language models to new clinical contexts.

Objective: We have previously developed a natural language processing pipeline using clinical notes written by epilepsy specialists to extract seizure freedom, seizure frequency text, and date of last seizure text for patients with epilepsy. It is important to understand how our methods generalize to new care contexts.

Materials and methods: We evaluated our pipeline on unseen notes from nonepilepsy-specialist neurologists and non-neurologists without any additional algorithm training. We tested the pipeline out-of-institution using epilepsy specialist notes from an outside medical center with only minor preprocessing adaptations. We examined reasons for discrepancies in performance in new contexts by measuring physical and semantic similarities between documents.

Results: Our ability to classify patient seizure freedom decreased by at least 0.12 agreement when moving from epilepsy specialists to nonspecialists or other institutions. On notes from our institution, textual overlap between the extracted outcomes and the gold standard annotations attained from manual chart review decreased by at least 0.11 F₁ when an answer existed but did not change when no answer existed; here our models generalized on notes from the outside institution, losing at most 0.02 agreement. We analyzed textual differences and found that syntactic and semantic differences in both clinically relevant sentences and surrounding contexts significantly influenced model performance.

Discussion and conclusion: Model generalization performance decreased on notes from nonspecialists; out-of-institution generalization on epilepsy specialist notes required small changes to preprocessing but was especially good for seizure frequency text and date of last seizure text, opening opportunities for multicenter collaborations using these outcomes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JAMIA Open Medicine-Health Informatics

CiteScore

4.10

自引率

4.80%

发文量

102

审稿时长

16 weeks