学习嵌入从自由文本分类笔记使用预训练的变压器模型

Proceedings of the International Conference on Health Informatics and Medical Application Technology Pub Date : 2022-01-01 DOI:10.5220/0011012800003123

Émilien Arnaud, Mahmoud Elbattah, Maxime Gignon, Gilles Dequen

{"title":"学习嵌入从自由文本分类笔记使用预训练的变压器模型","authors":"Émilien Arnaud, Mahmoud Elbattah, Maxime Gignon, Gilles Dequen","doi":"10.5220/0011012800003123","DOIUrl":null,"url":null,"abstract":": The advent of transformer models has allowed for tremendous progress in the Natural Language Processing (NLP) domain. Pretrained transformers could successfully deliver the state-of-the-art performance in a myriad of NLP tasks. This study presents an application of transformers to learn contextual embeddings from free-text triage notes, widely recorded at the emergency department. A large-scale retrospective cohort of triage notes of more than 260K records was provided by the University Hospital of Amiens-Picardy in France. We utilize a set of Bidirectional Encoder Representations from Transformers (BERT) for the French language. The quality of embeddings is empirically examined based on a set of clustering models. In this regard, we provide a comparative analysis of popular models including CamemBERT , FlauBERT , and mBART . The study could be generally regarded as an addition to the ongoing contributions of applying the BERT approach in the healthcare context.","PeriodicalId":20676,"journal":{"name":"Proceedings of the International Conference on Health Informatics and Medical Application Technology","volume":"61 1","pages":"835-841"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Learning Embeddings from Free-text Triage Notes using Pretrained Transformer Models\",\"authors\":\"Émilien Arnaud, Mahmoud Elbattah, Maxime Gignon, Gilles Dequen\",\"doi\":\"10.5220/0011012800003123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": The advent of transformer models has allowed for tremendous progress in the Natural Language Processing (NLP) domain. Pretrained transformers could successfully deliver the state-of-the-art performance in a myriad of NLP tasks. This study presents an application of transformers to learn contextual embeddings from free-text triage notes, widely recorded at the emergency department. A large-scale retrospective cohort of triage notes of more than 260K records was provided by the University Hospital of Amiens-Picardy in France. We utilize a set of Bidirectional Encoder Representations from Transformers (BERT) for the French language. The quality of embeddings is empirically examined based on a set of clustering models. In this regard, we provide a comparative analysis of popular models including CamemBERT , FlauBERT , and mBART . The study could be generally regarded as an addition to the ongoing contributions of applying the BERT approach in the healthcare context.\",\"PeriodicalId\":20676,\"journal\":{\"name\":\"Proceedings of the International Conference on Health Informatics and Medical Application Technology\",\"volume\":\"61 1\",\"pages\":\"835-841\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Health Informatics and Medical Application Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0011012800003123\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Health Informatics and Medical Application Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0011012800003123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

变压器模型的出现使得自然语言处理(NLP)领域取得了巨大的进步。预训练的变压器可以成功地在无数的NLP任务中提供最先进的性能。本研究介绍了转换器的应用，从自由文本分类笔记中学习上下文嵌入，广泛记录在急诊科。法国亚眠-皮卡第大学医院提供了260多万份分类记录的大规模回顾性队列研究。我们使用了一组来自变形金刚的双向编码器表示(BERT)来表示法语。基于一组聚类模型对嵌入的质量进行了实证检验。在这方面，我们对CamemBERT、福楼拜和mBART等流行模型进行了比较分析。该研究可以被普遍认为是对在医疗保健环境中应用BERT方法的持续贡献的补充。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Embeddings from Free-text Triage Notes using Pretrained Transformer Models

: The advent of transformer models has allowed for tremendous progress in the Natural Language Processing (NLP) domain. Pretrained transformers could successfully deliver the state-of-the-art performance in a myriad of NLP tasks. This study presents an application of transformers to learn contextual embeddings from free-text triage notes, widely recorded at the emergency department. A large-scale retrospective cohort of triage notes of more than 260K records was provided by the University Hospital of Amiens-Picardy in France. We utilize a set of Bidirectional Encoder Representations from Transformers (BERT) for the French language. The quality of embeddings is empirically examined based on a set of clustering models. In this regard, we provide a comparative analysis of popular models including CamemBERT , FlauBERT , and mBART . The study could be generally regarded as an addition to the ongoing contributions of applying the BERT approach in the healthcare context.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on Health Informatics and Medical Application Technology

自引率

0.00%

发文量