从主诉早期预警痛风发作的有效自然语言处理算法

Forecasting Pub Date : 2024-03-10 DOI:10.3390/forecast6010013

Lucas Lopes Oliveira, Xiaorui Jiang, Aryalakshmi Nellippillipathil Babu, Poonam Karajagi, Alireza Daneshkhah

{"title":"从主诉早期预警痛风发作的有效自然语言处理算法","authors":"Lucas Lopes Oliveira, Xiaorui Jiang, Aryalakshmi Nellippillipathil Babu, Poonam Karajagi, Alireza Daneshkhah","doi":"10.3390/forecast6010013","DOIUrl":null,"url":null,"abstract":"Early identification of acute gout is crucial, enabling healthcare professionals to implement targeted interventions for rapid pain relief and preventing disease progression, ensuring improved long-term joint function. In this study, we comprehensively explored the potential early detection of gout flares (GFs) based on nurses’ chief complaint notes in the Emergency Department (ED). Addressing the challenge of identifying GFs prospectively during an ED visit, where documentation is typically minimal, our research focused on employing alternative Natural Language Processing (NLP) techniques to enhance detection accuracy. We investigated GF detection algorithms using both sparse representations by traditional NLP methods and dense encodings by medical domain-specific Large Language Models (LLMs), distinguishing between generative and discriminative models. Three methods were used to alleviate the issue of severe data imbalances, including oversampling, class weights, and focal loss. Extensive empirical studies were performed on the Gout Emergency Department Chief Complaint Corpora. Sparse text representations like tf-idf proved to produce strong performances, achieving F1 scores higher than 0.75. The best deep learning models were RoBERTa-large-PM-M3-Voc and BioGPT, which had the best F1 scores for each dataset, with a 0.8 on the 2019 dataset and a 0.85 F1 score on the 2020 dataset, respectively. We concluded that although discriminative LLMs performed better for this classification task when compared to generative LLMs, a combination of using generative models as feature extractors and employing a support vector machine for classification yielded promising results comparable to those obtained with discriminative models.","PeriodicalId":508737,"journal":{"name":"Forecasting","volume":"52 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effective Natural Language Processing Algorithms for Early Alerts of Gout Flares from Chief Complaints\",\"authors\":\"Lucas Lopes Oliveira, Xiaorui Jiang, Aryalakshmi Nellippillipathil Babu, Poonam Karajagi, Alireza Daneshkhah\",\"doi\":\"10.3390/forecast6010013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Early identification of acute gout is crucial, enabling healthcare professionals to implement targeted interventions for rapid pain relief and preventing disease progression, ensuring improved long-term joint function. In this study, we comprehensively explored the potential early detection of gout flares (GFs) based on nurses’ chief complaint notes in the Emergency Department (ED). Addressing the challenge of identifying GFs prospectively during an ED visit, where documentation is typically minimal, our research focused on employing alternative Natural Language Processing (NLP) techniques to enhance detection accuracy. We investigated GF detection algorithms using both sparse representations by traditional NLP methods and dense encodings by medical domain-specific Large Language Models (LLMs), distinguishing between generative and discriminative models. Three methods were used to alleviate the issue of severe data imbalances, including oversampling, class weights, and focal loss. Extensive empirical studies were performed on the Gout Emergency Department Chief Complaint Corpora. Sparse text representations like tf-idf proved to produce strong performances, achieving F1 scores higher than 0.75. The best deep learning models were RoBERTa-large-PM-M3-Voc and BioGPT, which had the best F1 scores for each dataset, with a 0.8 on the 2019 dataset and a 0.85 F1 score on the 2020 dataset, respectively. We concluded that although discriminative LLMs performed better for this classification task when compared to generative LLMs, a combination of using generative models as feature extractors and employing a support vector machine for classification yielded promising results comparable to those obtained with discriminative models.\",\"PeriodicalId\":508737,\"journal\":{\"name\":\"Forecasting\",\"volume\":\"52 5\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forecasting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/forecast6010013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forecasting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/forecast6010013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

急性痛风的早期识别至关重要，可使医护人员采取有针对性的干预措施，迅速缓解疼痛并预防疾病进展，从而确保改善长期关节功能。在这项研究中，我们根据急诊科（ED）护士的主诉记录，全面探讨了早期发现痛风发作（GFs）的可能性。在急诊科就诊过程中，由于记录通常很少，因此我们的研究重点是采用其他自然语言处理（NLP）技术来提高检测的准确性，以应对前瞻性地识别痛风发作的挑战。我们研究了使用传统 NLP 方法的稀疏表示法和使用特定医学领域大语言模型 (LLM) 的密集编码法的 GF 检测算法，并对生成模型和判别模型进行了区分。我们采用了三种方法来缓解严重的数据不平衡问题，包括超采样、类权重和焦点丢失。在痛风急诊科主诉语料库中进行了广泛的实证研究。事实证明，tf-idf 等稀疏文本表示法表现出色，F1 分数高于 0.75。最好的深度学习模型是 RoBERTa-large-PM-M3-Voc 和 BioGPT，它们在每个数据集上都有最好的 F1 分数，在 2019 年数据集上的 F1 分数分别为 0.8 和 0.85。我们得出的结论是，虽然与生成式 LLM 相比，判别式 LLM 在这项分类任务中表现更好，但将生成式模型作为特征提取器并采用支持向量机进行分类的组合产生了与判别式模型相当的可喜结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Effective Natural Language Processing Algorithms for Early Alerts of Gout Flares from Chief Complaints

Early identification of acute gout is crucial, enabling healthcare professionals to implement targeted interventions for rapid pain relief and preventing disease progression, ensuring improved long-term joint function. In this study, we comprehensively explored the potential early detection of gout flares (GFs) based on nurses’ chief complaint notes in the Emergency Department (ED). Addressing the challenge of identifying GFs prospectively during an ED visit, where documentation is typically minimal, our research focused on employing alternative Natural Language Processing (NLP) techniques to enhance detection accuracy. We investigated GF detection algorithms using both sparse representations by traditional NLP methods and dense encodings by medical domain-specific Large Language Models (LLMs), distinguishing between generative and discriminative models. Three methods were used to alleviate the issue of severe data imbalances, including oversampling, class weights, and focal loss. Extensive empirical studies were performed on the Gout Emergency Department Chief Complaint Corpora. Sparse text representations like tf-idf proved to produce strong performances, achieving F1 scores higher than 0.75. The best deep learning models were RoBERTa-large-PM-M3-Voc and BioGPT, which had the best F1 scores for each dataset, with a 0.8 on the 2019 dataset and a 0.85 F1 score on the 2020 dataset, respectively. We concluded that although discriminative LLMs performed better for this classification task when compared to generative LLMs, a combination of using generative models as feature extractors and employing a support vector machine for classification yielded promising results comparable to those obtained with discriminative models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Forecasting

CiteScore

5.80

自引率

0.00%

发文量