电子健康记录自然语言处理中的迁移学习和领域适应研究进展

Yearbook of medical informatics Pub Date : 2021-08-01 Epub Date: 2021-09-03 DOI:10.1055/s-0041-1726522
Egoitz Laparra, Aurelie Mascio, Sumithra Velupillai, Timothy Miller
{"title":"电子健康记录自然语言处理中的迁移学习和领域适应研究进展","authors":"Egoitz Laparra,&nbsp;Aurelie Mascio,&nbsp;Sumithra Velupillai,&nbsp;Timothy Miller","doi":"10.1055/s-0041-1726522","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>We survey recent work in biomedical NLP on building more adaptable or generalizable models, with a focus on work dealing with electronic health record (EHR) texts, to better understand recent trends in this area and identify opportunities for future research.</p><p><strong>Methods: </strong>We searched PubMed, the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computational Linguistics (ACL) anthology, the Association for the Advancement of Artificial Intelligence (AAAI) proceedings, and Google Scholar for the years 2018-2020. We reviewed abstracts to identify the most relevant and impactful work, and manually extracted data points from each of these papers to characterize the types of methods and tasks that were studied, in which clinical domains, and current state-of-the-art results.</p><p><strong>Results: </strong>The ubiquity of pre-trained transformers in clinical NLP research has contributed to an increase in domain adaptation and generalization-focused work that uses these models as the key component. Most recently, work has started to train biomedical transformers and to extend the fine-tuning process with additional domain adaptation techniques. We also highlight recent research in cross-lingual adaptation, as a special case of adaptation.</p><p><strong>Conclusions: </strong>While pre-trained transformer models have led to some large performance improvements, general domain pre-training does not always transfer adequately to the clinical domain due to its highly specialized language. There is also much work to be done in showing that the gains obtained by pre-trained transformers are beneficial in real world use cases. The amount of work in domain adaptation and transfer learning is limited by dataset availability and creating datasets for new domains is challenging. The growing body of research in languages other than English is encouraging, and more collaboration between researchers across the language divide would likely accelerate progress in non-English clinical NLP.</p>","PeriodicalId":40027,"journal":{"name":"Yearbook of medical informatics","volume":"30 1","pages":"239-244"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7d/bc/10-1055-s-0041-1726522.PMC8416218.pdf","citationCount":"18","resultStr":"{\"title\":\"A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records.\",\"authors\":\"Egoitz Laparra,&nbsp;Aurelie Mascio,&nbsp;Sumithra Velupillai,&nbsp;Timothy Miller\",\"doi\":\"10.1055/s-0041-1726522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>We survey recent work in biomedical NLP on building more adaptable or generalizable models, with a focus on work dealing with electronic health record (EHR) texts, to better understand recent trends in this area and identify opportunities for future research.</p><p><strong>Methods: </strong>We searched PubMed, the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computational Linguistics (ACL) anthology, the Association for the Advancement of Artificial Intelligence (AAAI) proceedings, and Google Scholar for the years 2018-2020. We reviewed abstracts to identify the most relevant and impactful work, and manually extracted data points from each of these papers to characterize the types of methods and tasks that were studied, in which clinical domains, and current state-of-the-art results.</p><p><strong>Results: </strong>The ubiquity of pre-trained transformers in clinical NLP research has contributed to an increase in domain adaptation and generalization-focused work that uses these models as the key component. Most recently, work has started to train biomedical transformers and to extend the fine-tuning process with additional domain adaptation techniques. We also highlight recent research in cross-lingual adaptation, as a special case of adaptation.</p><p><strong>Conclusions: </strong>While pre-trained transformer models have led to some large performance improvements, general domain pre-training does not always transfer adequately to the clinical domain due to its highly specialized language. There is also much work to be done in showing that the gains obtained by pre-trained transformers are beneficial in real world use cases. The amount of work in domain adaptation and transfer learning is limited by dataset availability and creating datasets for new domains is challenging. The growing body of research in languages other than English is encouraging, and more collaboration between researchers across the language divide would likely accelerate progress in non-English clinical NLP.</p>\",\"PeriodicalId\":40027,\"journal\":{\"name\":\"Yearbook of medical informatics\",\"volume\":\"30 1\",\"pages\":\"239-244\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7d/bc/10-1055-s-0041-1726522.PMC8416218.pdf\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Yearbook of medical informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1055/s-0041-1726522\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/9/3 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Yearbook of medical informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1055/s-0041-1726522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/9/3 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

目的:我们调查了最近生物医学NLP在建立更具适应性或可泛化模型方面的工作,重点是处理电子健康记录(EHR)文本的工作,以更好地了解该领域的最新趋势,并确定未来研究的机会。方法:我们检索了2018-2020年PubMed、电气与电子工程师学会(IEEE)、计算语言学协会(ACL)文集、人工智能进步协会(AAAI)论文集和谷歌学术。我们回顾了摘要,以确定最相关和最具影响力的工作,并手动从这些论文中提取数据点,以表征所研究的方法和任务类型、临床领域和当前最先进的结果。结果:在临床NLP研究中,预训练变压器的普遍存在,促进了领域适应和以概括为重点的工作的增加,这些工作将这些模型作为关键组成部分。最近,已经开始训练生物医学变压器,并使用额外的域适应技术扩展微调过程。我们还重点介绍了跨语言适应的最新研究,作为适应的一个特殊案例。结论:虽然预训练的变压器模型已经导致了一些巨大的性能改进,但由于其高度专业化的语言,一般的领域预训练并不总是充分地转移到临床领域。要证明预训练变压器获得的增益在现实世界用例中是有益的,还有很多工作要做。领域适应和迁移学习的工作量受到数据集可用性的限制,并且为新领域创建数据集具有挑战性。越来越多的非英语语言研究令人鼓舞,跨越语言鸿沟的研究人员之间的更多合作可能会加速非英语临床NLP的进展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records.

A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records.

A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records.

Objectives: We survey recent work in biomedical NLP on building more adaptable or generalizable models, with a focus on work dealing with electronic health record (EHR) texts, to better understand recent trends in this area and identify opportunities for future research.

Methods: We searched PubMed, the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computational Linguistics (ACL) anthology, the Association for the Advancement of Artificial Intelligence (AAAI) proceedings, and Google Scholar for the years 2018-2020. We reviewed abstracts to identify the most relevant and impactful work, and manually extracted data points from each of these papers to characterize the types of methods and tasks that were studied, in which clinical domains, and current state-of-the-art results.

Results: The ubiquity of pre-trained transformers in clinical NLP research has contributed to an increase in domain adaptation and generalization-focused work that uses these models as the key component. Most recently, work has started to train biomedical transformers and to extend the fine-tuning process with additional domain adaptation techniques. We also highlight recent research in cross-lingual adaptation, as a special case of adaptation.

Conclusions: While pre-trained transformer models have led to some large performance improvements, general domain pre-training does not always transfer adequately to the clinical domain due to its highly specialized language. There is also much work to be done in showing that the gains obtained by pre-trained transformers are beneficial in real world use cases. The amount of work in domain adaptation and transfer learning is limited by dataset availability and creating datasets for new domains is challenging. The growing body of research in languages other than English is encouraging, and more collaboration between researchers across the language divide would likely accelerate progress in non-English clinical NLP.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Yearbook of medical informatics
Yearbook of medical informatics Medicine-Medicine (all)
CiteScore
4.10
自引率
0.00%
发文量
20
期刊介绍: Published by the International Medical Informatics Association, this annual publication includes the best papers in medical informatics from around the world.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信