Ayesha Seerat , Sarah Nasir , Muhammad Wasim , Nuno M. Garcia
{"title":"使用基于 BERT 的模型对临床试验进行生物医学自然语言推理","authors":"Ayesha Seerat , Sarah Nasir , Muhammad Wasim , Nuno M. Garcia","doi":"10.1016/j.procs.2024.08.083","DOIUrl":null,"url":null,"abstract":"<div><p>Clinical trials are crucial in experimental medicine as they assess the safety and efficiency of new treatments. Due to its unstructured and plain language nature, clinical text data often presents challenges in understanding the relationships between various elements like disease, symptoms, diagnosis, and treatment. This task is challenging as the Multi-evidence Natural Language Inference for Clinical Trial Data (NLI4CT) requires intricate reasoning involving textual and numerical elements. It involves integrating information from one or two Clinical Trial Reports (CTRs) to validate hypotheses, demanding a multi-faceted approach. To address these problems, we use BERT-base models’ ability to predict entailment or contradiction labels and compare the use of transformer-based feature extraction and pre-trained models. We utilize seven pre-trained models, including six BERT-based and one T5-based model: BERT-base uncased, BioBERT-base-cased-v1.1-mnli, DeBERTa-v3-base-mnli-fever-anli, DeBERTa-v3-base-mnli-fever-docnli-ling-2c, DeBERTa-large-mnli, BioLinkBERT-base, and Flan-T5-base. We achieve an F1-score of 61% on both DeBERTa-v3-base-mnli-fever-anli and DeBERTa-large-mnli models and 95% faithfulness on the BioLinkBERT-base model.</p></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"241 ","pages":"Pages 576-581"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1877050924017940/pdf?md5=6bf6dec48ae67280955e31952f52981f&pid=1-s2.0-S1877050924017940-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Biomedical Natural Language Inference on Clinical trials using the BERT-based Models\",\"authors\":\"Ayesha Seerat , Sarah Nasir , Muhammad Wasim , Nuno M. Garcia\",\"doi\":\"10.1016/j.procs.2024.08.083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Clinical trials are crucial in experimental medicine as they assess the safety and efficiency of new treatments. Due to its unstructured and plain language nature, clinical text data often presents challenges in understanding the relationships between various elements like disease, symptoms, diagnosis, and treatment. This task is challenging as the Multi-evidence Natural Language Inference for Clinical Trial Data (NLI4CT) requires intricate reasoning involving textual and numerical elements. It involves integrating information from one or two Clinical Trial Reports (CTRs) to validate hypotheses, demanding a multi-faceted approach. To address these problems, we use BERT-base models’ ability to predict entailment or contradiction labels and compare the use of transformer-based feature extraction and pre-trained models. We utilize seven pre-trained models, including six BERT-based and one T5-based model: BERT-base uncased, BioBERT-base-cased-v1.1-mnli, DeBERTa-v3-base-mnli-fever-anli, DeBERTa-v3-base-mnli-fever-docnli-ling-2c, DeBERTa-large-mnli, BioLinkBERT-base, and Flan-T5-base. We achieve an F1-score of 61% on both DeBERTa-v3-base-mnli-fever-anli and DeBERTa-large-mnli models and 95% faithfulness on the BioLinkBERT-base model.</p></div>\",\"PeriodicalId\":20465,\"journal\":{\"name\":\"Procedia Computer Science\",\"volume\":\"241 \",\"pages\":\"Pages 576-581\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1877050924017940/pdf?md5=6bf6dec48ae67280955e31952f52981f&pid=1-s2.0-S1877050924017940-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Procedia Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1877050924017940\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050924017940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Biomedical Natural Language Inference on Clinical trials using the BERT-based Models
Clinical trials are crucial in experimental medicine as they assess the safety and efficiency of new treatments. Due to its unstructured and plain language nature, clinical text data often presents challenges in understanding the relationships between various elements like disease, symptoms, diagnosis, and treatment. This task is challenging as the Multi-evidence Natural Language Inference for Clinical Trial Data (NLI4CT) requires intricate reasoning involving textual and numerical elements. It involves integrating information from one or two Clinical Trial Reports (CTRs) to validate hypotheses, demanding a multi-faceted approach. To address these problems, we use BERT-base models’ ability to predict entailment or contradiction labels and compare the use of transformer-based feature extraction and pre-trained models. We utilize seven pre-trained models, including six BERT-based and one T5-based model: BERT-base uncased, BioBERT-base-cased-v1.1-mnli, DeBERTa-v3-base-mnli-fever-anli, DeBERTa-v3-base-mnli-fever-docnli-ling-2c, DeBERTa-large-mnli, BioLinkBERT-base, and Flan-T5-base. We achieve an F1-score of 61% on both DeBERTa-v3-base-mnli-fever-anli and DeBERTa-large-mnli models and 95% faithfulness on the BioLinkBERT-base model.