{"title":"利用双向长短期记忆(Bi-LSTM)和变压器双向编码器表征(BERT)模型对临床病历进行上下文分类","authors":"Jaya Zalte, Harshal Shah","doi":"10.1111/coin.12692","DOIUrl":null,"url":null,"abstract":"<p>Deep learning models have overcome traditional machine learning techniques for text classification domains in the field of natural language processing (NLP). Since, NLP is a branch of machine learning, used for interpreting language, classifying text of interest, and the same can be applied to analyse the medical clinical electronic health records. Medical text consists of lot of rich data which can altogether provide a good insight, by determining patterns from the clinical text data. In this paper, bidirectional-long short-term memory (Bi-LSTM), bi-LSTM attention and bidirectional encoder representations from transformers (BERT) base models are used to classify the text which are of privacy concern to a person and which should be extracted and can be tagged as sensitive. This text data which we might think not of privacy concern would majorly reveal a lot about the patient's integrity and personal life. Clinical data not only have patient demographic data but lot of hidden data which might go unseen and thus could arise privacy issues. Bi-LSTM with attention layer is also added on top to realize the importance of critical words which will be of great importance in terms of classification, we are able to achieve accuracy of about 92%. About 206,926 sentences are used out of which 80% are used for training and rest for testing we get accuracy of 90% approx. with Bi-LSTM alone. The same set of datasets is used for BERT model with accuracy of 93% approx.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Contextual classification of clinical records with bidirectional long short-term memory (Bi-LSTM) and bidirectional encoder representations from transformers (BERT) model\",\"authors\":\"Jaya Zalte, Harshal Shah\",\"doi\":\"10.1111/coin.12692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Deep learning models have overcome traditional machine learning techniques for text classification domains in the field of natural language processing (NLP). Since, NLP is a branch of machine learning, used for interpreting language, classifying text of interest, and the same can be applied to analyse the medical clinical electronic health records. Medical text consists of lot of rich data which can altogether provide a good insight, by determining patterns from the clinical text data. In this paper, bidirectional-long short-term memory (Bi-LSTM), bi-LSTM attention and bidirectional encoder representations from transformers (BERT) base models are used to classify the text which are of privacy concern to a person and which should be extracted and can be tagged as sensitive. This text data which we might think not of privacy concern would majorly reveal a lot about the patient's integrity and personal life. Clinical data not only have patient demographic data but lot of hidden data which might go unseen and thus could arise privacy issues. Bi-LSTM with attention layer is also added on top to realize the importance of critical words which will be of great importance in terms of classification, we are able to achieve accuracy of about 92%. About 206,926 sentences are used out of which 80% are used for training and rest for testing we get accuracy of 90% approx. with Bi-LSTM alone. The same set of datasets is used for BERT model with accuracy of 93% approx.</p>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.12692\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12692","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Contextual classification of clinical records with bidirectional long short-term memory (Bi-LSTM) and bidirectional encoder representations from transformers (BERT) model
Deep learning models have overcome traditional machine learning techniques for text classification domains in the field of natural language processing (NLP). Since, NLP is a branch of machine learning, used for interpreting language, classifying text of interest, and the same can be applied to analyse the medical clinical electronic health records. Medical text consists of lot of rich data which can altogether provide a good insight, by determining patterns from the clinical text data. In this paper, bidirectional-long short-term memory (Bi-LSTM), bi-LSTM attention and bidirectional encoder representations from transformers (BERT) base models are used to classify the text which are of privacy concern to a person and which should be extracted and can be tagged as sensitive. This text data which we might think not of privacy concern would majorly reveal a lot about the patient's integrity and personal life. Clinical data not only have patient demographic data but lot of hidden data which might go unseen and thus could arise privacy issues. Bi-LSTM with attention layer is also added on top to realize the importance of critical words which will be of great importance in terms of classification, we are able to achieve accuracy of about 92%. About 206,926 sentences are used out of which 80% are used for training and rest for testing we get accuracy of 90% approx. with Bi-LSTM alone. The same set of datasets is used for BERT model with accuracy of 93% approx.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.