零距离跨语言迁移是德语细粒度实体分型中难以突破的基线

Proceedings of the Second Workshop on Insights from Negative Results in NLP Pub Date : 1900-01-01 DOI:10.18653/v1/2021.insights-1.7

Sabine Weber, Mark Steedman

{"title":"零距离跨语言迁移是德语细粒度实体分型中难以突破的基线","authors":"Sabine Weber, Mark Steedman","doi":"10.18653/v1/2021.insights-1.7","DOIUrl":null,"url":null,"abstract":"The training of NLP models often requires large amounts of labelled training data, which makes it difficult to expand existing models to new languages. While zero-shot cross-lingual transfer relies on multilingual word embeddings to apply a model trained on one language to another, Yarowski and Ngai (2001) propose the method of annotation projection to generate training data without manual annotation. This method was successfully used for the tasks of named entity recognition and coarse-grained entity typing, but we show that it is outperformed by zero-shot cross-lingual transfer when applied to the similar task of fine-grained entity typing. In our study of fine-grained entity typing with the FIGER type ontology for German, we show that annotation projection amplifies the English model’s tendency to underpredict level 2 labels and is beaten by zero-shot cross-lingual transfer on three novel test sets.","PeriodicalId":166055,"journal":{"name":"Proceedings of the Second Workshop on Insights from Negative Results in NLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Zero-Shot Cross-Lingual Transfer is a Hard Baseline to Beat in German Fine-Grained Entity Typing\",\"authors\":\"Sabine Weber, Mark Steedman\",\"doi\":\"10.18653/v1/2021.insights-1.7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The training of NLP models often requires large amounts of labelled training data, which makes it difficult to expand existing models to new languages. While zero-shot cross-lingual transfer relies on multilingual word embeddings to apply a model trained on one language to another, Yarowski and Ngai (2001) propose the method of annotation projection to generate training data without manual annotation. This method was successfully used for the tasks of named entity recognition and coarse-grained entity typing, but we show that it is outperformed by zero-shot cross-lingual transfer when applied to the similar task of fine-grained entity typing. In our study of fine-grained entity typing with the FIGER type ontology for German, we show that annotation projection amplifies the English model’s tendency to underpredict level 2 labels and is beaten by zero-shot cross-lingual transfer on three novel test sets.\",\"PeriodicalId\":166055,\"journal\":{\"name\":\"Proceedings of the Second Workshop on Insights from Negative Results in NLP\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Second Workshop on Insights from Negative Results in NLP\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2021.insights-1.7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Insights from Negative Results in NLP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.insights-1.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

NLP模型的训练通常需要大量的标记训练数据，这使得将现有模型扩展到新的语言变得困难。零射击跨语言迁移依靠多语言词嵌入将一种语言训练的模型应用于另一种语言，Yarowski和Ngai(2001)提出了标注投影的方法来生成训练数据，而无需人工标注。该方法成功地用于命名实体识别和粗粒度实体类型的任务，但我们表明，当应用于细粒度实体类型的类似任务时，它的性能优于零采样跨语言迁移。在我们对德语FIGER类型本体的细粒度实体分类的研究中，我们表明注释投影放大了英语模型低估2级标签的倾向，并且在三个新的测试集上被零概率跨语言迁移击败。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Zero-Shot Cross-Lingual Transfer is a Hard Baseline to Beat in German Fine-Grained Entity Typing

The training of NLP models often requires large amounts of labelled training data, which makes it difficult to expand existing models to new languages. While zero-shot cross-lingual transfer relies on multilingual word embeddings to apply a model trained on one language to another, Yarowski and Ngai (2001) propose the method of annotation projection to generate training data without manual annotation. This method was successfully used for the tasks of named entity recognition and coarse-grained entity typing, but we show that it is outperformed by zero-shot cross-lingual transfer when applied to the similar task of fine-grained entity typing. In our study of fine-grained entity typing with the FIGER type ontology for German, we show that annotation projection amplifies the English model’s tendency to underpredict level 2 labels and is beaten by zero-shot cross-lingual transfer on three novel test sets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Second Workshop on Insights from Negative Results in NLP

自引率

0.00%

发文量