将经过微调的转换语言模型推广到新的临床环境中。

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES
JAMIA Open Pub Date : 2023-08-16 eCollection Date: 2023-10-01 DOI:10.1093/jamiaopen/ooad070
Kevin Xie, Samuel W Terman, Ryan S Gallagher, Chloe E Hill, Kathryn A Davis, Brian Litt, Dan Roth, Colin A Ellis
{"title":"将经过微调的转换语言模型推广到新的临床环境中。","authors":"Kevin Xie, Samuel W Terman, Ryan S Gallagher, Chloe E Hill, Kathryn A Davis, Brian Litt, Dan Roth, Colin A Ellis","doi":"10.1093/jamiaopen/ooad070","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>We have previously developed a natural language processing pipeline using clinical notes written by epilepsy specialists to extract seizure freedom, seizure frequency text, and date of last seizure text for patients with epilepsy. It is important to understand how our methods generalize to new care contexts.</p><p><strong>Materials and methods: </strong>We evaluated our pipeline on unseen notes from nonepilepsy-specialist neurologists and non-neurologists without any additional algorithm training. We tested the pipeline out-of-institution using epilepsy specialist notes from an outside medical center with only minor preprocessing adaptations. We examined reasons for discrepancies in performance in new contexts by measuring physical and semantic similarities between documents.</p><p><strong>Results: </strong>Our ability to classify patient seizure freedom decreased by at least 0.12 agreement when moving from epilepsy specialists to nonspecialists or other institutions. On notes from our institution, textual overlap between the extracted outcomes and the gold standard annotations attained from manual chart review decreased by at least 0.11 F<sub>1</sub> when an answer existed but did not change when no answer existed; here our models generalized on notes from the outside institution, losing at most 0.02 agreement. We analyzed textual differences and found that syntactic and semantic differences in both clinically relevant sentences and surrounding contexts significantly influenced model performance.</p><p><strong>Discussion and conclusion: </strong>Model generalization performance decreased on notes from nonspecialists; out-of-institution generalization on epilepsy specialist notes required small changes to preprocessing but was especially good for seizure frequency text and date of last seizure text, opening opportunities for multicenter collaborations using these outcomes.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 3","pages":"ooad070"},"PeriodicalIF":2.5000,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432353/pdf/","citationCount":"0","resultStr":"{\"title\":\"Generalization of finetuned transformer language models to new clinical contexts.\",\"authors\":\"Kevin Xie, Samuel W Terman, Ryan S Gallagher, Chloe E Hill, Kathryn A Davis, Brian Litt, Dan Roth, Colin A Ellis\",\"doi\":\"10.1093/jamiaopen/ooad070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>We have previously developed a natural language processing pipeline using clinical notes written by epilepsy specialists to extract seizure freedom, seizure frequency text, and date of last seizure text for patients with epilepsy. It is important to understand how our methods generalize to new care contexts.</p><p><strong>Materials and methods: </strong>We evaluated our pipeline on unseen notes from nonepilepsy-specialist neurologists and non-neurologists without any additional algorithm training. We tested the pipeline out-of-institution using epilepsy specialist notes from an outside medical center with only minor preprocessing adaptations. We examined reasons for discrepancies in performance in new contexts by measuring physical and semantic similarities between documents.</p><p><strong>Results: </strong>Our ability to classify patient seizure freedom decreased by at least 0.12 agreement when moving from epilepsy specialists to nonspecialists or other institutions. On notes from our institution, textual overlap between the extracted outcomes and the gold standard annotations attained from manual chart review decreased by at least 0.11 F<sub>1</sub> when an answer existed but did not change when no answer existed; here our models generalized on notes from the outside institution, losing at most 0.02 agreement. We analyzed textual differences and found that syntactic and semantic differences in both clinically relevant sentences and surrounding contexts significantly influenced model performance.</p><p><strong>Discussion and conclusion: </strong>Model generalization performance decreased on notes from nonspecialists; out-of-institution generalization on epilepsy specialist notes required small changes to preprocessing but was especially good for seizure frequency text and date of last seizure text, opening opportunities for multicenter collaborations using these outcomes.</p>\",\"PeriodicalId\":36278,\"journal\":{\"name\":\"JAMIA Open\",\"volume\":\"6 3\",\"pages\":\"ooad070\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432353/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMIA Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamiaopen/ooad070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/10/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooad070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

目的:我们之前开发了一个自然语言处理管道,使用癫痫专家编写的临床笔记来提取癫痫患者的癫痫发作自由度、发作频率文本和上次发作日期文本。了解我们的方法如何推广到新的护理环境中是很重要的。材料和方法:我们在没有任何额外算法训练的情况下,根据非癫痫病专家神经学家和非神经学家的未公开笔记评估了我们的管道。我们使用来自外部医疗中心的癫痫专家笔记测试了机构外的管道,只进行了轻微的预处理调整。我们通过测量文档之间的物理和语义相似性,研究了在新环境中表现差异的原因。结果:当从癫痫专家转到非专家或其他机构时,我们对患者癫痫发作自由度的分类能力至少降低了0.12。根据我们机构的笔记,当有答案时,提取的结果和手动图表审查获得的金标准注释之间的文本重叠至少减少了0.11 F1,但当没有答案时没有变化;在这里,我们的模型在来自外部机构的注释上进行了推广,最多损失0.02的一致性。我们分析了文本差异,发现临床相关句子和周围环境中的句法和语义差异都会显著影响模型的性能。讨论和结论:非专业人员的笔记使模型泛化性能下降;癫痫专家笔记的机构外概括需要对预处理进行小的更改,但对癫痫发作频率文本和上次癫痫发作日期文本尤其有利,为利用这些结果进行多中心合作提供了机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Generalization of finetuned transformer language models to new clinical contexts.

Generalization of finetuned transformer language models to new clinical contexts.

Generalization of finetuned transformer language models to new clinical contexts.

Generalization of finetuned transformer language models to new clinical contexts.

Objective: We have previously developed a natural language processing pipeline using clinical notes written by epilepsy specialists to extract seizure freedom, seizure frequency text, and date of last seizure text for patients with epilepsy. It is important to understand how our methods generalize to new care contexts.

Materials and methods: We evaluated our pipeline on unseen notes from nonepilepsy-specialist neurologists and non-neurologists without any additional algorithm training. We tested the pipeline out-of-institution using epilepsy specialist notes from an outside medical center with only minor preprocessing adaptations. We examined reasons for discrepancies in performance in new contexts by measuring physical and semantic similarities between documents.

Results: Our ability to classify patient seizure freedom decreased by at least 0.12 agreement when moving from epilepsy specialists to nonspecialists or other institutions. On notes from our institution, textual overlap between the extracted outcomes and the gold standard annotations attained from manual chart review decreased by at least 0.11 F1 when an answer existed but did not change when no answer existed; here our models generalized on notes from the outside institution, losing at most 0.02 agreement. We analyzed textual differences and found that syntactic and semantic differences in both clinically relevant sentences and surrounding contexts significantly influenced model performance.

Discussion and conclusion: Model generalization performance decreased on notes from nonspecialists; out-of-institution generalization on epilepsy specialist notes required small changes to preprocessing but was especially good for seizure frequency text and date of last seizure text, opening opportunities for multicenter collaborations using these outcomes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信