D. Purwitasari, A. Abdillah, Safitri Juanita, M. Purnomo
{"title":"印尼生物医学实体识别的迁移学习方法","authors":"D. Purwitasari, A. Abdillah, Safitri Juanita, M. Purnomo","doi":"10.1109/ICTS52701.2021.9608496","DOIUrl":null,"url":null,"abstract":"Biomedical Named Entity Recognition (BioNER) could be found in high-quality annotated biomedical dataset of some applications such as medical question answering, clinical documents classification and decision support system. However, high-quality biomedical documents (i.e., PubMed, MPlus) as the main source of BioNER dataset is only available in English while it is lack in Indonesian. Efforts to annotate such documents is also burdensome since it requires extensive work of experts. Transformers based model, i.e. BERT and pretrained multilingual language models lead to an opportunity to perform crosslingual transfer learning from well progressed English BioNER to Indonesian language. This paper investigates XLM-Roberta and M-BERT as pretrained multi-lingual model to perform BioNER for Indonesian biomedical corpora. The model is fine-tuned in English documents before being evaluated in Indonesian biomedical test data. As the results, XLM-Roberta achieves better than M-BERT model in all measurements metrics. The investigations also compare the performance of multilingual with monolingual language model to evaluate the BioNER task and found no significant result difference between both models.","PeriodicalId":6738,"journal":{"name":"2021 13th International Conference on Information & Communication Technology and System (ICTS)","volume":"7 1","pages":"348-353"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Transfer Learning Approaches for Indonesian Biomedical Entity Recognition\",\"authors\":\"D. Purwitasari, A. Abdillah, Safitri Juanita, M. Purnomo\",\"doi\":\"10.1109/ICTS52701.2021.9608496\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Biomedical Named Entity Recognition (BioNER) could be found in high-quality annotated biomedical dataset of some applications such as medical question answering, clinical documents classification and decision support system. However, high-quality biomedical documents (i.e., PubMed, MPlus) as the main source of BioNER dataset is only available in English while it is lack in Indonesian. Efforts to annotate such documents is also burdensome since it requires extensive work of experts. Transformers based model, i.e. BERT and pretrained multilingual language models lead to an opportunity to perform crosslingual transfer learning from well progressed English BioNER to Indonesian language. This paper investigates XLM-Roberta and M-BERT as pretrained multi-lingual model to perform BioNER for Indonesian biomedical corpora. The model is fine-tuned in English documents before being evaluated in Indonesian biomedical test data. As the results, XLM-Roberta achieves better than M-BERT model in all measurements metrics. The investigations also compare the performance of multilingual with monolingual language model to evaluate the BioNER task and found no significant result difference between both models.\",\"PeriodicalId\":6738,\"journal\":{\"name\":\"2021 13th International Conference on Information & Communication Technology and System (ICTS)\",\"volume\":\"7 1\",\"pages\":\"348-353\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Information & Communication Technology and System (ICTS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTS52701.2021.9608496\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Information & Communication Technology and System (ICTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTS52701.2021.9608496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Transfer Learning Approaches for Indonesian Biomedical Entity Recognition
Biomedical Named Entity Recognition (BioNER) could be found in high-quality annotated biomedical dataset of some applications such as medical question answering, clinical documents classification and decision support system. However, high-quality biomedical documents (i.e., PubMed, MPlus) as the main source of BioNER dataset is only available in English while it is lack in Indonesian. Efforts to annotate such documents is also burdensome since it requires extensive work of experts. Transformers based model, i.e. BERT and pretrained multilingual language models lead to an opportunity to perform crosslingual transfer learning from well progressed English BioNER to Indonesian language. This paper investigates XLM-Roberta and M-BERT as pretrained multi-lingual model to perform BioNER for Indonesian biomedical corpora. The model is fine-tuned in English documents before being evaluated in Indonesian biomedical test data. As the results, XLM-Roberta achieves better than M-BERT model in all measurements metrics. The investigations also compare the performance of multilingual with monolingual language model to evaluate the BioNER task and found no significant result difference between both models.