C. Rahul, Merin Meleet, G. Srinivasan, Nagaraj G Cholli
{"title":"Clinical Data Analysis For Recognizing Named Entities","authors":"C. Rahul, Merin Meleet, G. Srinivasan, Nagaraj G Cholli","doi":"10.1109/CSITSS54238.2021.9683039","DOIUrl":null,"url":null,"abstract":"Ever since the introduction of the internet the amount of digital media and documents available easily has grown rapidly. Even though the digital data may be any language and format it does not change the fact that processing such data needs time and effort. This is true especially in the medical field where reports are constantly generated in huge amounts. Recent innovations in a field called as NLP and with the introduction of several efficient language models to analyze such data like BERT shows that the decrease in time to process and analyze such tasks once they are trained. One such approach that is proposed aims at recognizing entities for every word that are present in a clinical text by assigning appropriate IOB tags that indicate the position of the word or tokens and also the proteins and cell types of the clinical entities in the text data chunk by utilizing a BERT model. The results demonstrate that the labels are assigned to every word present in the clinical text. And by obtaining the metrics results that are calculated after every epoch where it is noticed that the validation accuracy and F1 score increases after every epoch. Hence for a BERT model it is concluded that with the increase in training period of the model the efficiency also increases.","PeriodicalId":252628,"journal":{"name":"2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSITSS54238.2021.9683039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Ever since the introduction of the internet the amount of digital media and documents available easily has grown rapidly. Even though the digital data may be any language and format it does not change the fact that processing such data needs time and effort. This is true especially in the medical field where reports are constantly generated in huge amounts. Recent innovations in a field called as NLP and with the introduction of several efficient language models to analyze such data like BERT shows that the decrease in time to process and analyze such tasks once they are trained. One such approach that is proposed aims at recognizing entities for every word that are present in a clinical text by assigning appropriate IOB tags that indicate the position of the word or tokens and also the proteins and cell types of the clinical entities in the text data chunk by utilizing a BERT model. The results demonstrate that the labels are assigned to every word present in the clinical text. And by obtaining the metrics results that are calculated after every epoch where it is noticed that the validation accuracy and F1 score increases after every epoch. Hence for a BERT model it is concluded that with the increase in training period of the model the efficiency also increases.