利用双向长短期记忆（Bi-LSTM）和变压器双向编码器表征（BERT）模型对临床病历进行上下文分类

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Intelligence Pub Date : 2024-08-21 DOI:10.1111/coin.12692

Jaya Zalte, Harshal Shah

{"title":"利用双向长短期记忆（Bi-LSTM）和变压器双向编码器表征（BERT）模型对临床病历进行上下文分类","authors":"Jaya Zalte, Harshal Shah","doi":"10.1111/coin.12692","DOIUrl":null,"url":null,"abstract":"<p>Deep learning models have overcome traditional machine learning techniques for text classification domains in the field of natural language processing (NLP). Since, NLP is a branch of machine learning, used for interpreting language, classifying text of interest, and the same can be applied to analyse the medical clinical electronic health records. Medical text consists of lot of rich data which can altogether provide a good insight, by determining patterns from the clinical text data. In this paper, bidirectional-long short-term memory (Bi-LSTM), bi-LSTM attention and bidirectional encoder representations from transformers (BERT) base models are used to classify the text which are of privacy concern to a person and which should be extracted and can be tagged as sensitive. This text data which we might think not of privacy concern would majorly reveal a lot about the patient's integrity and personal life. Clinical data not only have patient demographic data but lot of hidden data which might go unseen and thus could arise privacy issues. Bi-LSTM with attention layer is also added on top to realize the importance of critical words which will be of great importance in terms of classification, we are able to achieve accuracy of about 92%. About 206,926 sentences are used out of which 80% are used for training and rest for testing we get accuracy of 90% approx. with Bi-LSTM alone. The same set of datasets is used for BERT model with accuracy of 93% approx.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Contextual classification of clinical records with bidirectional long short-term memory (Bi-LSTM) and bidirectional encoder representations from transformers (BERT) model\",\"authors\":\"Jaya Zalte, Harshal Shah\",\"doi\":\"10.1111/coin.12692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Deep learning models have overcome traditional machine learning techniques for text classification domains in the field of natural language processing (NLP). Since, NLP is a branch of machine learning, used for interpreting language, classifying text of interest, and the same can be applied to analyse the medical clinical electronic health records. Medical text consists of lot of rich data which can altogether provide a good insight, by determining patterns from the clinical text data. In this paper, bidirectional-long short-term memory (Bi-LSTM), bi-LSTM attention and bidirectional encoder representations from transformers (BERT) base models are used to classify the text which are of privacy concern to a person and which should be extracted and can be tagged as sensitive. This text data which we might think not of privacy concern would majorly reveal a lot about the patient's integrity and personal life. Clinical data not only have patient demographic data but lot of hidden data which might go unseen and thus could arise privacy issues. Bi-LSTM with attention layer is also added on top to realize the importance of critical words which will be of great importance in terms of classification, we are able to achieve accuracy of about 92%. About 206,926 sentences are used out of which 80% are used for training and rest for testing we get accuracy of 90% approx. with Bi-LSTM alone. The same set of datasets is used for BERT model with accuracy of 93% approx.</p>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":\"40 4\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.12692\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12692","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在自然语言处理（NLP）领域，深度学习模型克服了文本分类领域的传统机器学习技术。NLP 是机器学习的一个分支，用于解释语言、对感兴趣的文本进行分类，同样也可用于分析医学临床电子健康记录。医学文本由大量丰富的数据组成，通过确定临床文本数据的模式，可以提供良好的洞察力。本文使用双向长短期记忆（Bi-LSTM）、双向 LSTM 注意和来自变换器的双向编码器表征（BERT）基础模型来对涉及个人隐私的文本进行分类，并将其提取和标记为敏感文本。这些我们可能认为不涉及隐私的文本数据，在很大程度上揭示了病人的诚信和个人生活。临床数据中不仅有病人的人口统计数据，还有很多隐藏数据，这些数据可能不为人知，因此可能会产生隐私问题。在此基础上，我们还添加了带有注意力层的 Bi-LSTM 来了解关键词语的重要性，这对分类非常重要，因此我们的准确率达到了 92%。我们使用了约 206,926 个句子，其中 80% 用于训练，其余用于测试，仅使用 Bi-LSTM 就获得了约 90% 的准确率。同样的数据集用于 BERT 模型，准确率约为 93%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Contextual classification of clinical records with bidirectional long short-term memory (Bi-LSTM) and bidirectional encoder representations from transformers (BERT) model

Deep learning models have overcome traditional machine learning techniques for text classification domains in the field of natural language processing (NLP). Since, NLP is a branch of machine learning, used for interpreting language, classifying text of interest, and the same can be applied to analyse the medical clinical electronic health records. Medical text consists of lot of rich data which can altogether provide a good insight, by determining patterns from the clinical text data. In this paper, bidirectional-long short-term memory (Bi-LSTM), bi-LSTM attention and bidirectional encoder representations from transformers (BERT) base models are used to classify the text which are of privacy concern to a person and which should be extracted and can be tagged as sensitive. This text data which we might think not of privacy concern would majorly reveal a lot about the patient's integrity and personal life. Clinical data not only have patient demographic data but lot of hidden data which might go unseen and thus could arise privacy issues. Bi-LSTM with attention layer is also added on top to realize the importance of critical words which will be of great importance in terms of classification, we are able to achieve accuracy of about 92%. About 206,926 sentences are used out of which 80% are used for training and rest for testing we get accuracy of 90% approx. with Bi-LSTM alone. The same set of datasets is used for BERT model with accuracy of 93% approx.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Intelligence 工程技术-计算机：人工智能

CiteScore

6.90

自引率

3.60%

发文量

审稿时长

>12 weeks

期刊介绍： This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.