{"title":"Automatic Term Extraction with Joint Multilingual Learning","authors":"Ipek Nur Karaman, I. Çiçekli, Gonenc Ercan","doi":"10.1109/UBMK55850.2022.9919455","DOIUrl":null,"url":null,"abstract":"Automatic term extraction using deep learning achieves promising results if sufficient training data exists. Unfortunately, some languages may lack these resources in some domains causing poor performance due to under-fitting. In this study, we propose a joint multilingual deep learning model with sequence labeling to extract terms, trained on multilingual data and aligned word embeddings to tackle this problem. Our evaluation results demonstrate that the multilingual model provides an improvement for automatic term extraction task when it is compared with a monolingual model trained with limited training data. Although the improvement rate varies according to domain and the size of the data, our evaluation shows that the highest improvement in F1-score is 10.1 % in the domain of Computer Science, the least improvement is 7.6% in the domain of Electronic Engineering. Our multilingual model also achieves competitive results when it is compared with a monolingual model trained with sufficient training data.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Automatic term extraction using deep learning achieves promising results if sufficient training data exists. Unfortunately, some languages may lack these resources in some domains causing poor performance due to under-fitting. In this study, we propose a joint multilingual deep learning model with sequence labeling to extract terms, trained on multilingual data and aligned word embeddings to tackle this problem. Our evaluation results demonstrate that the multilingual model provides an improvement for automatic term extraction task when it is compared with a monolingual model trained with limited training data. Although the improvement rate varies according to domain and the size of the data, our evaluation shows that the highest improvement in F1-score is 10.1 % in the domain of Computer Science, the least improvement is 7.6% in the domain of Electronic Engineering. Our multilingual model also achieves competitive results when it is compared with a monolingual model trained with sufficient training data.