Workshop on Biomedical Natural Language Processing最新文献

BIOptimus: Pre-training an Optimal Biomedical Language Model with Curriculum Learning for Named Entity Recognition 基于课程学习的生物医学语言模型的命名实体识别预训练

Workshop on Biomedical Natural Language Processing Pub Date : 2023-08-16 DOI: 10.18653/v1/2023.bionlp-1.31

Vera Pavlova, M. Makhlouf

{"title":"BIOptimus: Pre-training an Optimal Biomedical Language Model with Curriculum Learning for Named Entity Recognition","authors":"Vera Pavlova, M. Makhlouf","doi":"10.18653/v1/2023.bionlp-1.31","DOIUrl":"https://doi.org/10.18653/v1/2023.bionlp-1.31","url":null,"abstract":"Using language models (LMs) pre-trained in a self-supervised setting on large corpora and then fine-tuning for a downstream task has helped to deal with the problem of limited label data for supervised learning tasks such as Named Entity Recognition (NER). Recent research in biomedical language processing has offered a number of biomedical LMs pre-trained using different methods and techniques that advance results on many BioNLP tasks, including NER. However, there is still a lack of a comprehensive comparison of pre-training approaches that would work more optimally in the biomedical domain. This paper aims to investigate different pre-training methods, such as pre-training the biomedical LM from scratch and pre-training it in a continued fashion. We compare existing methods with our proposed pre-training method of initializing weights for new tokens by distilling existing weights from the BERT model inside the context where the tokens were found. The method helps to speed up the pre-training stage and improve performance on NER. In addition, we compare how masking rate, corruption strategy, and masking strategies impact the performance of the biomedical LM. Finally, using the insights from our experiments, we introduce a new biomedical LM (BIOptimus), which is pre-trained using Curriculum Learning (CL) and contextualized weight distillation method. Our model sets new states of the art on several biomedical Named Entity Recognition (NER) tasks. We release our code and all pre-trained models.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114576888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Source (Pre-)Training for Cross-Domain Measurement, Unit and Context Extraction 跨域测量、单元和上下文提取的多源(预)训练

Workshop on Biomedical Natural Language Processing Pub Date : 2023-08-05 DOI: 10.18653/v1/2023.bionlp-1.1

Yueling Li, Sebastian Martschat, Simone Paolo Ponzetto

引用次数: 0

Building a Corpus for Biomedical Relation Extraction of Species Mentions 构建生物医学物种提及关系提取语料库

Workshop on Biomedical Natural Language Processing Pub Date : 2023-06-14 DOI: 10.48550/arXiv.2306.08403

Oumaima El Khettari, Solen Quiniou, Samuel Chaffron

引用次数: 0

Good Data, Large Data, or No Data? Comparing Three Approaches in Developing Research Aspect Classifiers for Biomedical Papers 好数据，大数据，还是没有数据?生物医学论文研究方向分类器开发的三种方法比较

Workshop on Biomedical Natural Language Processing Pub Date : 2023-06-07 DOI: 10.48550/arXiv.2306.04820

S. Chandrasekhar, Chieh-Yang Huang, Ting Huang

引用次数: 0

Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers ChatGPT在生物医学任务上的评估:与微调生成变压器的零射击比较

Workshop on Biomedical Natural Language Processing Pub Date : 2023-06-07 DOI: 10.48550/arXiv.2306.04504

Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, J. Huang

引用次数: 5

shs-nlp at RadSum23: Domain-Adaptive Pre-training of Instruction-tuned LLMs for Radiology Report Impression Generation shs-nlp在RadSum23:面向放射学报告印象生成的指令调谐llm的领域自适应预训练

Workshop on Biomedical Natural Language Processing Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.03264

Sanjeev Kumar Karn, Rikhiya Ghosh, P. Kusuma, Oladimeji Farri

引用次数: 4

Team:PULSAR at ProbSum 2023:PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients’ Problems and Data Augmentation with Black-box Large Language Models PULSAR:使用提取的医疗保健术语进行预训练，用于总结患者问题和使用黑盒大型语言模型进行数据增强

Workshop on Biomedical Natural Language Processing Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.02754

Hao Li, Yuping Wu, Viktor Schlegel, R. Batista-Navarro, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Xiaojun Zeng, Daniel Beck, Stefan Winkler, G. Nenadic

{"title":"Team:PULSAR at ProbSum 2023:PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients’ Problems and Data Augmentation with Black-box Large Language Models","authors":"Hao Li, Yuping Wu, Viktor Schlegel, R. Batista-Navarro, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Xiaojun Zeng, Daniel Beck, Stefan Winkler, G. Nenadic","doi":"10.48550/arXiv.2306.02754","DOIUrl":"https://doi.org/10.48550/arXiv.2306.02754","url":null,"abstract":"Medical progress notes play a crucial role in documenting a patient’s hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient’s problems in the form of a “problem list” can aid stakeholders in understanding a patient’s condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focusses on generating a list of diagnoses and problems from the provider’s progress notes during hospitalisation. In this paper, we introduce our proposed approach to this task, which integrates two complementary components. One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients’ problems summarised as a list. Our approach was ranked second among all submissions to the shared task. The performance of our model on the development and test datasets shows that our approach is more robust on unknown data, with an improvement of up to 3.1 points over the same size of the larger model.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131104235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of Biomedical Definitions Generated from Ontological Knowledge 临床术语自动词汇表:从本体论知识生成的大型生物医学定义词典

Workshop on Biomedical Natural Language Processing Pub Date : 2023-06-01 DOI: 10.48550/arXiv.2306.00665

François Remy, Thomas Demeester

{"title":"Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of Biomedical Definitions Generated from Ontological Knowledge","authors":"François Remy, Thomas Demeester","doi":"10.48550/arXiv.2306.00665","DOIUrl":"https://doi.org/10.48550/arXiv.2306.00665","url":null,"abstract":"Background: More than 400.000 biomedical concepts and some of their relationships are contained in SnomedCT, a comprehensive biomedical ontology. However, their concept names are not always readily interpretable by non-experts, or patients looking at their own electronic health records (EHR). Clear definitions or descriptions in understandable language or often not available. Therefore, generating human-readable definitions for biomedical concepts might help make the information they encode more accessible and understandable to a wider public.Objective: In this article, we introduce the Automatic Glossary of Clinical Terminology (AGCT), a large-scale biomedical dictionary of clinical concepts generated using high-quality information extracted from the biomedical knowledge contained in SnomedCT.Methods: We generate a novel definition for every SnomedCT concept, after prompting the OpenAI Turbo model, a variant of GPT 3.5, using a high-quality verbalization of the SnomedCT relationships of the to-be-defined concept. A significant subset of the generated definitions was subsequently evaluated by NLP researchers with biomedical expertise on 5-point scales along the following three axes: factuality, insight, and fluency.Results: AGCT contains 422,070 computer-generated definitions for SnomedCT concepts, covering various domains such as diseases, procedures, drugs, and anatomy. The average length of the definitions is 49 words. The definitions were assigned average scores of over 4.5 out of 5 on all three axes, indicating a majority of factual, insightful, and fluent definitions.Conclusion: AGCT is a novel and valuable resource for biomedical tasks that require human-readable definitions for SnomedCT concepts. It can also serve as a base for developing robust biomedical retrieval models or other applications that leverage natural language understanding of biomedical knowledge.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116537898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing and combining some popular NER approaches on Biomedical tasks 比较和结合一些流行的生物医学任务的NER方法

Workshop on Biomedical Natural Language Processing Pub Date : 2023-05-30 DOI: 10.48550/arXiv.2305.19120

Harsh Verma, S. Bergler, Narjes Tahaei

{"title":"Comparing and combining some popular NER approaches on Biomedical tasks","authors":"Harsh Verma, S. Bergler, Narjes Tahaei","doi":"10.48550/arXiv.2305.19120","DOIUrl":"https://doi.org/10.48550/arXiv.2305.19120","url":null,"abstract":"We compare three simple and popular approaches for NER: 1) SEQ (sequence labeling with a linear token classifier) 2) SeqCRF (sequence labeling with Conditional Random Fields), and 3) SpanPred (span prediction with boundary token embeddings). We compare the approaches on 4 biomedical NER tasks: GENIA, NCBI-Disease, LivingNER (Spanish), and SocialDisNER (Spanish). The SpanPred model demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 1.3 and 0.6 F1 respectively. The SeqCRF model also demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 0.2 F1 and 0.7 respectively. The SEQ model is competitive with the state-of-the-art on LivingNER dataset. We explore some simple ways of combining the three approaches. We find that majority voting consistently gives high precision and high F1 across all 4 datasets.Lastly, we implement a system that learns to combine SEQ’s and SpanPred’s predictions, generating systems that give high recall and high F1 across all 4 datasets. On the GENIA dataset, we find that our learned combiner system significantly boosts F1(+1.2) and recall(+2.1) over the systems being combined.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133666308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Zero-shot Temporal Relation Extraction with ChatGPT 基于ChatGPT的零距时间关系提取

Workshop on Biomedical Natural Language Processing Pub Date : 2023-04-11 DOI: 10.48550/arXiv.2304.05454

Chenhan Yuan, Qianqian Xie, S. Ananiadou

引用次数: 20