Clinical Natural Language Processing Workshop最新文献_第2页

Harnessing the Power of BERT in the Turkish Clinical Domain: Pretraining Approaches for Limited Data Scenarios 利用BERT在土耳其临床领域的力量:有限数据场景的预训练方法

Clinical Natural Language Processing Workshop Pub Date : 2023-05-05 DOI: 10.48550/arXiv.2305.03788

Hazal Türkmen, Oguz Dikenelli, C. Eraslan, Mehmet Cem Çalli, S. Özbek

{"title":"Harnessing the Power of BERT in the Turkish Clinical Domain: Pretraining Approaches for Limited Data Scenarios","authors":"Hazal Türkmen, Oguz Dikenelli, C. Eraslan, Mehmet Cem Çalli, S. Özbek","doi":"10.48550/arXiv.2305.03788","DOIUrl":"https://doi.org/10.48550/arXiv.2305.03788","url":null,"abstract":"Recent advancements in natural language processing (NLP) have been driven by large language models (LLMs), thereby revolutionizing the field. Our study investigates the impact of diverse pre-training strategies on the performance of Turkish clinical language models in a multi-label classification task involving radiology reports, with a focus on overcoming language resource limitations. Additionally, for the first time, we evaluated the simultaneous pre-training approach by utilizing limited clinical task data. We developed four models: TurkRadBERT-task v1, TurkRadBERT-task v2, TurkRadBERT-sim v1, and TurkRadBERT-sim v2. Our results revealed superior performance from BERTurk and TurkRadBERT-task v1, both of which leverage a broad general-domain corpus. Although task-adaptive pre-training is capable of identifying domain-specific patterns, it may be prone to overfitting because of the constraints of the task-specific corpus. Our findings highlight the importance of domain-specific vocabulary during pre-training to improve performance. They also affirmed that a combination of general domain knowledge and task-specific fine-tuning is crucial for optimal performance across various categories. This study offers key insights for future research on pre-training techniques in the clinical domain, particularly for low-resource languages.","PeriodicalId":216954,"journal":{"name":"Clinical Natural Language Processing Workshop","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115165341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

WangLab at MEDIQA-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language Models WangLab在MEDIQA-Chat 2023:使用大型语言模型从医患对话中生成临床笔记

Clinical Natural Language Processing Workshop Pub Date : 2023-05-03 DOI: 10.18653/v1/2023.clinicalnlp-1.36

John Giorgi, Augustin Toma, Ronald Xie, Sondra S. Chen, Kevin R. An, Grace X. Zheng, Bo Wang

引用次数: 2

Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning 基于迁移学习的临床领域大规模多语言预训练机器翻译模型研究

Clinical Natural Language Processing Workshop Pub Date : 2022-10-12 DOI: 10.18653/v1/2023.clinicalnlp-1.5

Lifeng Han, G. Erofeev, I. Sorokina, S. Gladkoff, G. Nenadic

{"title":"Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning","authors":"Lifeng Han, G. Erofeev, I. Sorokina, S. Gladkoff, G. Nenadic","doi":"10.18653/v1/2023.clinicalnlp-1.5","DOIUrl":"https://doi.org/10.18653/v1/2023.clinicalnlp-1.5","url":null,"abstract":"Massively multilingual pre-trained language models (MMPLMs) are developed in recent years demonstrating superpowers and the pre-knowledge they acquire for downstream tasks.This work investigates whether MMPLMs can be applied to clinical domain machine translation (MT) towards entirely unseen languages via transfer learning.We carry out an experimental investigation using Meta-AI’s MMPLMs “wmt21-dense-24-wide-en-X and X-en (WMT21fb)” which were pre-trained on 7 language pairs and 14 translation directions including English to Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese, and the opposite direction.We fine-tune these MMPLMs towards English-Spanish language pair which did not exist at all in their original pre-trained corpora both implicitly and explicitly.We prepare carefully aligned clinical domain data for this fine-tuning, which is different from their original mixed domain knowledge.Our experimental result shows that the fine-tuning is very successful using just 250k well-aligned in-domain EN-ES segments for three sub-task translation testings: clinical cases, clinical terms, and ontology concepts. It achieves very close evaluation scores to another MMPLM NLLB from Meta-AI, which included Spanish as a high-resource setting in the pre-training.To the best of our knowledge, this is the first work on using MMPLMs towards clinical domain transfer-learning NMT successfully for totally unseen languages during pre-training.","PeriodicalId":216954,"journal":{"name":"Clinical Natural Language Processing Workshop","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116504745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art 生物医学和临床任务的预训练语言模型:理解和扩展最新技术

Clinical Natural Language Processing Workshop Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.clinicalnlp-1.17

Patrick Lewis, Myle Ott, Jingfei Du, Veselin Stoyanov

引用次数: 129

Evaluation of Transfer Learning for Adverse Drug Event (ADE) and Medication Entity Extraction 药物不良事件(ADE)和药物实体提取的迁移学习评价

Clinical Natural Language Processing Workshop Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.clinicalnlp-1.6

S. Narayanan, Kaivalya Mannam, S. Rajan, P. Rangan

引用次数: 8

How You Ask Matters: The Effect of Paraphrastic Questions to BERT Performance on a Clinical SQuAD Dataset 你如何提问:意译问题对BERT在临床小组数据集上表现的影响

Clinical Natural Language Processing Workshop Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.clinicalnlp-1.13

Sungrim Moon, Jungwei Fan

引用次数: 7

Automatic recognition of abdominal lymph nodes from clinical text 自动识别腹部淋巴结从临床文本

Clinical Natural Language Processing Workshop Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.clinicalnlp-1.12

Yifan Peng, Sungwon Lee, D. Elton, Thomas C. Shen, Yuxing Tang, Qingyu Chen, Shuai Wang, Yingying Zhu, R. Summers, Zhiyong Lu

引用次数: 6

MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining 奖章:用于自然语言理解预训练的医学缩写消歧数据集

Clinical Natural Language Processing Workshop Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.clinicalnlp-1.15

Zhi Wen, Xing Han Lu, Siva Reddy

引用次数: 21

Cancer Registry Information Extraction via Transfer Learning 基于迁移学习的癌症登记信息提取

Clinical Natural Language Processing Workshop Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.clinicalnlp-1.22

Yan-Jie Lin, Hong-Jie Dai, You-Chen Zhang, Chung-Yang Wu, Yu-Cheng Chang, Pin-Jou Lu, Chih-Jen Huang, Yu-Tsang Wang, H. Hsieh, K. Chao, T. Liu, I. Chang, Yi-Hsin Connie Yang, Ti-Hao Wang, Ko-Jiunn Liu, Li‐Tzong Chen, Sheau-Fang Yang

{"title":"Cancer Registry Information Extraction via Transfer Learning","authors":"Yan-Jie Lin, Hong-Jie Dai, You-Chen Zhang, Chung-Yang Wu, Yu-Cheng Chang, Pin-Jou Lu, Chih-Jen Huang, Yu-Tsang Wang, H. Hsieh, K. Chao, T. Liu, I. Chang, Yi-Hsin Connie Yang, Ti-Hao Wang, Ko-Jiunn Liu, Li‐Tzong Chen, Sheau-Fang Yang","doi":"10.18653/v1/2020.clinicalnlp-1.22","DOIUrl":"https://doi.org/10.18653/v1/2020.clinicalnlp-1.22","url":null,"abstract":"A cancer registry is a critical and massive database for which various types of domain knowledge are needed and whose maintenance requires labor-intensive data curation. In order to facilitate the curation process for building a high-quality and integrated cancer registry database, we compiled a cross-hospital corpus and applied neural network methods to develop a natural language processing system for extracting cancer registry variables buried in unstructured pathology reports. The performance of the developed networks was compared with various baselines using standard micro-precision, recall and F-measure. Furthermore, we conducted experiments to study the feasibility of applying transfer learning to rapidly develop a well-performing system for processing reports from different sources that might be presented in different writing styles and formats. The results demonstrate that the transfer learning method enables us to develop a satisfactory system for a new hospital with only a few annotations and suggest more opportunities to reduce the burden of cancer registry curation.","PeriodicalId":216954,"journal":{"name":"Clinical Natural Language Processing Workshop","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134140508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Extracting Semantic Aspects for Structured Representation of Clinical Trial Eligibility Criteria 提取临床试验资格标准结构化表示的语义方面

Clinical Natural Language Processing Workshop Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.clinicalnlp-1.27

Tirthankar Dasgupta, Ishani Mondal, Abir Naskar, Lipika Dey

{"title":"Extracting Semantic Aspects for Structured Representation of Clinical Trial Eligibility Criteria","authors":"Tirthankar Dasgupta, Ishani Mondal, Abir Naskar, Lipika Dey","doi":"10.18653/v1/2020.clinicalnlp-1.27","DOIUrl":"https://doi.org/10.18653/v1/2020.clinicalnlp-1.27","url":null,"abstract":"Eligibility criteria in the clinical trials specify the characteristics that a patient must or must not possess in order to be treated according to a standard clinical care guideline. As the process of manual eligibility determination is time-consuming, automatic structuring of the eligibility criteria into various semantic categories or aspects is the need of the hour. Existing methods use hand-crafted rules and feature-based statistical machine learning methods to dynamically induce semantic aspects. However, in order to deal with paucity of aspect-annotated clinical trials data, we propose a novel weakly-supervised co-training based method which can exploit a large pool of unlabeled criteria sentences to augment the limited supervised training data, and consequently enhance the performance. Experiments with 0.2M criteria sentences show that the proposed approach outperforms the competitive supervised baselines by 12% in terms of micro-averaged F1 score for all the aspects. Probing deeper into analysis, we observe domain-specific information boosts up the performance by a significant margin.","PeriodicalId":216954,"journal":{"name":"Clinical Natural Language Processing Workshop","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124754130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2