利用中小型语料库识别矛盾医学研究主张的深度神经网络模型性能实验评价

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Malaysian Journal of Computer Science Pub Date : 2021-12-31 DOI:10.22452/mjcs.sp2021no2.5

Fatin Syafiqah Yazi, Wan-Tze Vong, V. Raman, Patrick Hang Hui Then, Mukulraj J Lunia

{"title":"利用中小型语料库识别矛盾医学研究主张的深度神经网络模型性能实验评价","authors":"Fatin Syafiqah Yazi, Wan-Tze Vong, V. Raman, Patrick Hang Hui Then, Mukulraj J Lunia","doi":"10.22452/mjcs.sp2021no2.5","DOIUrl":null,"url":null,"abstract":"Corpora come in various shapes and sizes and play an essential role in facilitating Natural Language Processing (NLP) tasks. However, the availability of corpora specialized for Evidence-Based Medicine (EBM) related tasks is limited. The study is aimed to discover how the size of a corpus influence the performance of our Deep Neural Network (DNN) model developed for contradiction detection in medical literature. We explored the potential of the EBM Summarizer corpus by Mollá and Santiago-Martínez, a medium-sized corpus to be used with our contradiction detection model. The dataset preparation involves the filtering of open-ended questions, duplicates of claims, and vague claims. As a result, two datasets were created with the claim input represented by sniptext in one dataset and longtext in the other. Experiments were conducted with varying numbers of hidden layers and units of the model using different datasets. The performance of the DNN model was recorded and compared with the result of using a small-sized corpus. It was found that the DNN model performance did not improve even after it was trained with a larger dataset derived from the medium-sized corpus. The factors may include the limitation of the DNN model itself and the quality of the datasets.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"AN EXPERIMENTAL EVALUATION OF DEEP NEURAL NETWORK MODEL PERFORMANCE FOR THE RECOGNITION OF CONTRADICTORY MEDICAL RESEARCH CLAIMS USING SMALL AND MEDIUM-SIZED CORPORA\",\"authors\":\"Fatin Syafiqah Yazi, Wan-Tze Vong, V. Raman, Patrick Hang Hui Then, Mukulraj J Lunia\",\"doi\":\"10.22452/mjcs.sp2021no2.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Corpora come in various shapes and sizes and play an essential role in facilitating Natural Language Processing (NLP) tasks. However, the availability of corpora specialized for Evidence-Based Medicine (EBM) related tasks is limited. The study is aimed to discover how the size of a corpus influence the performance of our Deep Neural Network (DNN) model developed for contradiction detection in medical literature. We explored the potential of the EBM Summarizer corpus by Mollá and Santiago-Martínez, a medium-sized corpus to be used with our contradiction detection model. The dataset preparation involves the filtering of open-ended questions, duplicates of claims, and vague claims. As a result, two datasets were created with the claim input represented by sniptext in one dataset and longtext in the other. Experiments were conducted with varying numbers of hidden layers and units of the model using different datasets. The performance of the DNN model was recorded and compared with the result of using a small-sized corpus. It was found that the DNN model performance did not improve even after it was trained with a larger dataset derived from the medium-sized corpus. The factors may include the limitation of the DNN model itself and the quality of the datasets.\",\"PeriodicalId\":49894,\"journal\":{\"name\":\"Malaysian Journal of Computer Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2021-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Malaysian Journal of Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.22452/mjcs.sp2021no2.5\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.sp2021no2.5","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

摘要

语料库有各种形状和大小，在促进自然语言处理(NLP)任务中起着至关重要的作用。然而，专门用于循证医学(EBM)相关任务的语料库的可用性有限。该研究旨在发现语料库的大小如何影响我们为医学文献中的矛盾检测开发的深度神经网络(DNN)模型的性能。我们探索了moll和Santiago-Martínez的EBM Summarizer语料库的潜力，这是一个中型语料库，将与我们的矛盾检测模型一起使用。数据集的准备包括对开放式问题、重复声明和模糊声明的过滤。结果，创建了两个数据集，其中一个数据集中的snippet text和另一个数据集中的longtext分别表示索赔输入。使用不同的数据集对模型的不同隐藏层和单元数量进行实验。记录了DNN模型的性能，并与使用小型语料库的结果进行了比较。研究发现，即使在使用来自中型语料库的更大数据集进行训练后，DNN模型的性能也没有提高。这些因素可能包括深度神经网络模型本身的局限性和数据集的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AN EXPERIMENTAL EVALUATION OF DEEP NEURAL NETWORK MODEL PERFORMANCE FOR THE RECOGNITION OF CONTRADICTORY MEDICAL RESEARCH CLAIMS USING SMALL AND MEDIUM-SIZED CORPORA

Corpora come in various shapes and sizes and play an essential role in facilitating Natural Language Processing (NLP) tasks. However, the availability of corpora specialized for Evidence-Based Medicine (EBM) related tasks is limited. The study is aimed to discover how the size of a corpus influence the performance of our Deep Neural Network (DNN) model developed for contradiction detection in medical literature. We explored the potential of the EBM Summarizer corpus by Mollá and Santiago-Martínez, a medium-sized corpus to be used with our contradiction detection model. The dataset preparation involves the filtering of open-ended questions, duplicates of claims, and vague claims. As a result, two datasets were created with the claim input represented by sniptext in one dataset and longtext in the other. Experiments were conducted with varying numbers of hidden layers and units of the model using different datasets. The performance of the DNN model was recorded and compared with the result of using a small-sized corpus. It was found that the DNN model performance did not improve even after it was trained with a larger dataset derived from the medium-sized corpus. The factors may include the limitation of the DNN model itself and the quality of the datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Malaysian Journal of Computer Science COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

2.20

自引率

33.30%

发文量

审稿时长

7.5 months

期刊介绍： The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication. The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus