用三重丢失增强预训练的上下文嵌入，作为从电子健康记录衍生的精神健康临床笔记中提取临床特征的有效微调方法

Natural Language Processing Journal Pub Date : 2023-11-30 DOI:10.1016/j.nlp.2023.100045

Deepali Kulkarni , Abhijit Ghosh , Amey Girdhari , Shaomin Liu , L. Alexander Vance , Melissa Unruh , Joydeep Sarkar

{"title":"用三重丢失增强预训练的上下文嵌入，作为从电子健康记录衍生的精神健康临床笔记中提取临床特征的有效微调方法","authors":"Deepali Kulkarni , Abhijit Ghosh , Amey Girdhari , Shaomin Liu , L. Alexander Vance , Melissa Unruh , Joydeep Sarkar","doi":"10.1016/j.nlp.2023.100045","DOIUrl":null,"url":null,"abstract":"<div><p>The development and application of real-world evidence in the field of mental health trails other therapeutic areas like oncology and cardiovascular diseases, largely because of the lack of frequent, structured outcomes measures in routine clinical care. A wealth of valuable patient-level clinical data resides in an unstructured format in clinical notes documented at each clinical encounter. Manual extraction of this information is not scalable, and heterogeneity in recording patterns and the heavily context-dependent nature of the content renders keyword-based automated searches of little practical value. While state-of-the-art natural language processing (NLP) models based on the transformer architecture have been developed for information extraction tasks in the mental health space, they are not trained on unstructured clinical data that capture the nuances of different dimensions of mental health (e.g., symptomology, social history, etc.). We have developed a novel transformer architecture-based NLP model to capture core clinical features of patients with major depressive disorder (MDD). Initialized on MentalBERT model weights, we pre-trained our model further on clinical notes from routine mental health care and fine-tuned using triplet loss, an effective feature embedding regularizer which boosts classification and extraction of 3 specific features in patients with MDD: anhedonia, suicidal ideation with plan or intent (SP), and suicidal ideation without plan or intent (SI) or where plan or intent are unknown. Training and testing data were annotated by mental health clinicians. Using triplet loss for fine tuning led to improvement in model performance benchmarked against other standard models (MentalBERT and BioClinicalBERT) on the same tasks, achieving F1 scores of 0.99 for anhedonia, 0.94 for SP, and 0.88 for SI. Model robustness was tested by testing sensitivity of model predictions on modifications to test sentences. The application of such an NLP model can be further scaled to capture clinical features of other disorders as well as other domains like social history or history of illness.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"6 ","pages":"Article 100045"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719123000420/pdfft?md5=f1fb11ae818a47f417a3249822e0ac8b&pid=1-s2.0-S2949719123000420-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Enhancing pre-trained contextual embeddings with triplet loss as an effective fine-tuning method for extracting clinical features from electronic health record derived mental health clinical notes\",\"authors\":\"Deepali Kulkarni , Abhijit Ghosh , Amey Girdhari , Shaomin Liu , L. Alexander Vance , Melissa Unruh , Joydeep Sarkar\",\"doi\":\"10.1016/j.nlp.2023.100045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The development and application of real-world evidence in the field of mental health trails other therapeutic areas like oncology and cardiovascular diseases, largely because of the lack of frequent, structured outcomes measures in routine clinical care. A wealth of valuable patient-level clinical data resides in an unstructured format in clinical notes documented at each clinical encounter. Manual extraction of this information is not scalable, and heterogeneity in recording patterns and the heavily context-dependent nature of the content renders keyword-based automated searches of little practical value. While state-of-the-art natural language processing (NLP) models based on the transformer architecture have been developed for information extraction tasks in the mental health space, they are not trained on unstructured clinical data that capture the nuances of different dimensions of mental health (e.g., symptomology, social history, etc.). We have developed a novel transformer architecture-based NLP model to capture core clinical features of patients with major depressive disorder (MDD). Initialized on MentalBERT model weights, we pre-trained our model further on clinical notes from routine mental health care and fine-tuned using triplet loss, an effective feature embedding regularizer which boosts classification and extraction of 3 specific features in patients with MDD: anhedonia, suicidal ideation with plan or intent (SP), and suicidal ideation without plan or intent (SI) or where plan or intent are unknown. Training and testing data were annotated by mental health clinicians. Using triplet loss for fine tuning led to improvement in model performance benchmarked against other standard models (MentalBERT and BioClinicalBERT) on the same tasks, achieving F1 scores of 0.99 for anhedonia, 0.94 for SP, and 0.88 for SI. Model robustness was tested by testing sensitivity of model predictions on modifications to test sentences. The application of such an NLP model can be further scaled to capture clinical features of other disorders as well as other domains like social history or history of illness.</p></div>\",\"PeriodicalId\":100944,\"journal\":{\"name\":\"Natural Language Processing Journal\",\"volume\":\"6 \",\"pages\":\"Article 100045\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949719123000420/pdfft?md5=f1fb11ae818a47f417a3249822e0ac8b&pid=1-s2.0-S2949719123000420-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949719123000420\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719123000420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

心理健康领域真实世界证据的开发和应用落后于肿瘤学和心血管疾病等其他治疗领域，这主要是因为在常规临床护理中缺乏频繁的、结构化的结果测量。大量宝贵的患者级临床数据以非结构化的形式存在于每次临床会诊记录的临床笔记中。人工提取这些信息无法扩展，而记录模式的异质性和内容严重依赖上下文的特性使得基于关键字的自动搜索几乎没有实用价值。虽然基于转换器架构的先进自然语言处理（NLP）模型已被开发用于心理健康领域的信息提取任务，但这些模型并没有针对非结构化临床数据进行过训练，而非结构化临床数据能捕捉到心理健康不同维度（如症状学、社会历史等）的细微差别。我们开发了一种基于转换器架构的新型 NLP 模型，用于捕捉重度抑郁症（MDD）患者的核心临床特征。我们在 MentalBERT 模型权重的基础上对模型进行了初始化，并在常规精神健康护理的临床记录上对模型进行了进一步的预训练，然后使用三重损失对模型进行了微调，三重损失是一种有效的特征嵌入正则化器，可增强对 MDD 患者的 3 个特定特征的分类和提取：失神、有计划或意图的自杀意念（SP）、无计划或意图的自杀意念（SI）或计划或意图不明的自杀意念。心理健康临床医生对训练和测试数据进行了注释。在相同任务中，使用三重损失进行微调可提高模型的性能，与其他标准模型（MentalBERT 和 BioClinicalBERT）相比，厌世情绪的 F1 得分为 0.99，SP 为 0.94，SI 为 0.88。通过测试模型预测对测试句修改的敏感性，检验了模型的稳健性。这种 NLP 模型的应用范围可以进一步扩大，以捕捉其他疾病的临床特征以及社会史或病史等其他领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing pre-trained contextual embeddings with triplet loss as an effective fine-tuning method for extracting clinical features from electronic health record derived mental health clinical notes

The development and application of real-world evidence in the field of mental health trails other therapeutic areas like oncology and cardiovascular diseases, largely because of the lack of frequent, structured outcomes measures in routine clinical care. A wealth of valuable patient-level clinical data resides in an unstructured format in clinical notes documented at each clinical encounter. Manual extraction of this information is not scalable, and heterogeneity in recording patterns and the heavily context-dependent nature of the content renders keyword-based automated searches of little practical value. While state-of-the-art natural language processing (NLP) models based on the transformer architecture have been developed for information extraction tasks in the mental health space, they are not trained on unstructured clinical data that capture the nuances of different dimensions of mental health (e.g., symptomology, social history, etc.). We have developed a novel transformer architecture-based NLP model to capture core clinical features of patients with major depressive disorder (MDD). Initialized on MentalBERT model weights, we pre-trained our model further on clinical notes from routine mental health care and fine-tuned using triplet loss, an effective feature embedding regularizer which boosts classification and extraction of 3 specific features in patients with MDD: anhedonia, suicidal ideation with plan or intent (SP), and suicidal ideation without plan or intent (SI) or where plan or intent are unknown. Training and testing data were annotated by mental health clinicians. Using triplet loss for fine tuning led to improvement in model performance benchmarked against other standard models (MentalBERT and BioClinicalBERT) on the same tasks, achieving F1 scores of 0.99 for anhedonia, 0.94 for SP, and 0.88 for SI. Model robustness was tested by testing sensitivity of model predictions on modifications to test sentences. The application of such an NLP model can be further scaled to capture clinical features of other disorders as well as other domains like social history or history of illness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Natural Language Processing Journal

自引率

0.00%

发文量