Domain-Specific Contextualized Embedding: A Systematic Literature Review

2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE) Pub Date : 2020-10-06 DOI:10.1109/ICITEE49829.2020.9271752

Ide Yunianto, A. E. Permanasari, Widyawan Widyawan

{"title":"Domain-Specific Contextualized Embedding: A Systematic Literature Review","authors":"Ide Yunianto, A. E. Permanasari, Widyawan Widyawan","doi":"10.1109/ICITEE49829.2020.9271752","DOIUrl":null,"url":null,"abstract":"Word embedding has successfully resolved various Natural Language Processing (NLP) problems. Unfortunately, the method has a weakness in detecting polysemy and homonym. Those issues led to the emergence of a new approach, which is named contextualized embedding. Many researchers examined such embedding to resolve problems in various particular areas. However, the studies are published disparate and complex. To provide a more comprehensive overview of contextualized embedding research in specific domains, a Systematic Literature Review (SLR) was conducted. The SLR results show that research on domain-specific contextualized embedding pays more attention to solving NLP problems in the Healthcare domain, by the percentage of more than 65%, followed by the Academic & Research field and other areas. The popularity of the Healthcare domain is associated with the availability of abundant datasets, mostly in English. BERT is the most contextualized embedding models used for domain-specific tasks, followed by ELMo, and finally GPT-1, as well as XLNET. Almost all reviewed papers reported performance improvements by using domain-specific contextualized embedding in their proposed model. Contextualized embedding can resolve polysemy problems and reduces overfitting. Besides, many downstream tasks have proved the ease implementation of the embedding. The shortcomings of this embedding are the high requirements of computation resources, the long execution time, and the computation complexity. Domain-specific contextualized embedding has resolved many problems, mostly classification tasks (e.g., Question and Answering) and tagging tasks (e.g., Named Entity Recognition). The two evaluation methods for measuring the performance of domain-specific contextualized embedding are Intrinsic evaluation and Extrinsic evaluation.","PeriodicalId":245013,"journal":{"name":"2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEE49829.2020.9271752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Word embedding has successfully resolved various Natural Language Processing (NLP) problems. Unfortunately, the method has a weakness in detecting polysemy and homonym. Those issues led to the emergence of a new approach, which is named contextualized embedding. Many researchers examined such embedding to resolve problems in various particular areas. However, the studies are published disparate and complex. To provide a more comprehensive overview of contextualized embedding research in specific domains, a Systematic Literature Review (SLR) was conducted. The SLR results show that research on domain-specific contextualized embedding pays more attention to solving NLP problems in the Healthcare domain, by the percentage of more than 65%, followed by the Academic & Research field and other areas. The popularity of the Healthcare domain is associated with the availability of abundant datasets, mostly in English. BERT is the most contextualized embedding models used for domain-specific tasks, followed by ELMo, and finally GPT-1, as well as XLNET. Almost all reviewed papers reported performance improvements by using domain-specific contextualized embedding in their proposed model. Contextualized embedding can resolve polysemy problems and reduces overfitting. Besides, many downstream tasks have proved the ease implementation of the embedding. The shortcomings of this embedding are the high requirements of computation resources, the long execution time, and the computation complexity. Domain-specific contextualized embedding has resolved many problems, mostly classification tasks (e.g., Question and Answering) and tagging tasks (e.g., Named Entity Recognition). The two evaluation methods for measuring the performance of domain-specific contextualized embedding are Intrinsic evaluation and Extrinsic evaluation.

查看原文本刊更多论文

特定领域情境化嵌入:系统文献综述

词嵌入已经成功地解决了各种自然语言处理(NLP)问题。但该方法在多义同音检测方面存在不足。这些问题导致了一种新方法的出现，这种方法被称为情境化嵌入。许多研究人员研究了这种嵌入来解决各种特定领域的问题。然而，这些研究发表的内容不同且复杂。为了对特定领域的情境化嵌入研究提供更全面的概述，本文进行了系统文献综述(SLR)。SLR结果显示，针对特定领域情境化嵌入的研究更关注解决医疗保健领域的NLP问题，比例超过65%，其次是学术与研究领域和其他领域。医疗保健领域的流行与大量数据集的可用性有关，这些数据集主要是英文的。BERT是用于特定领域任务的最具上下文化的嵌入模型，其次是ELMo，最后是GPT-1，以及XLNET。几乎所有被审查的论文都报告了通过在他们提出的模型中使用特定领域的上下文嵌入来提高性能。情境化嵌入可以解决多义问题，减少过拟合。此外，许多下游任务也证明了该嵌入方法易于实现。这种嵌入的缺点是对计算资源的要求高、执行时间长、计算复杂。特定领域的上下文化嵌入解决了许多问题，主要是分类任务(如问答)和标记任务(如命名实体识别)。衡量特定领域语境化嵌入性能的两种评价方法是内在评价和外在评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE)

自引率

0.00%

发文量