Clinical Data Analysis For Recognizing Named Entities

2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS) Pub Date : 2021-12-16 DOI:10.1109/CSITSS54238.2021.9683039

C. Rahul, Merin Meleet, G. Srinivasan, Nagaraj G Cholli

{"title":"Clinical Data Analysis For Recognizing Named Entities","authors":"C. Rahul, Merin Meleet, G. Srinivasan, Nagaraj G Cholli","doi":"10.1109/CSITSS54238.2021.9683039","DOIUrl":null,"url":null,"abstract":"Ever since the introduction of the internet the amount of digital media and documents available easily has grown rapidly. Even though the digital data may be any language and format it does not change the fact that processing such data needs time and effort. This is true especially in the medical field where reports are constantly generated in huge amounts. Recent innovations in a field called as NLP and with the introduction of several efficient language models to analyze such data like BERT shows that the decrease in time to process and analyze such tasks once they are trained. One such approach that is proposed aims at recognizing entities for every word that are present in a clinical text by assigning appropriate IOB tags that indicate the position of the word or tokens and also the proteins and cell types of the clinical entities in the text data chunk by utilizing a BERT model. The results demonstrate that the labels are assigned to every word present in the clinical text. And by obtaining the metrics results that are calculated after every epoch where it is noticed that the validation accuracy and F1 score increases after every epoch. Hence for a BERT model it is concluded that with the increase in training period of the model the efficiency also increases.","PeriodicalId":252628,"journal":{"name":"2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSITSS54238.2021.9683039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Ever since the introduction of the internet the amount of digital media and documents available easily has grown rapidly. Even though the digital data may be any language and format it does not change the fact that processing such data needs time and effort. This is true especially in the medical field where reports are constantly generated in huge amounts. Recent innovations in a field called as NLP and with the introduction of several efficient language models to analyze such data like BERT shows that the decrease in time to process and analyze such tasks once they are trained. One such approach that is proposed aims at recognizing entities for every word that are present in a clinical text by assigning appropriate IOB tags that indicate the position of the word or tokens and also the proteins and cell types of the clinical entities in the text data chunk by utilizing a BERT model. The results demonstrate that the labels are assigned to every word present in the clinical text. And by obtaining the metrics results that are calculated after every epoch where it is noticed that the validation accuracy and F1 score increases after every epoch. Hence for a BERT model it is concluded that with the increase in training period of the model the efficiency also increases.

查看原文本刊更多论文

命名实体识别的临床数据分析

自从互联网出现以来，数字媒体和文件的数量迅速增长。尽管数字数据可以是任何语言和格式它不会改变这一事实处理这些数据需要时间和努力。这是真的，特别是在医学领域，报告不断产生大量。最近在NLP领域的创新，以及引入几个有效的语言模型来分析这些数据，如BERT，表明处理和分析这些任务的时间一旦被训练就会减少。提出的一种这样的方法旨在通过分配适当的IOB标签来识别临床文本中存在的每个单词的实体，这些标签指示单词或标记的位置，以及利用BERT模型在文本数据块中临床实体的蛋白质和细胞类型。结果表明,标签被分配到每一个字出现在临床的文本。并得到指标的结果后,计算每一个时代的地方发现验证准确性和F1得分增加在每一个时代。因此伯特模型得出在培训期间增加模型的效率也会增加。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)

自引率

0.00%

发文量