翻译驱动的德语矛盾检测方法

2019 IEEE Symposium Series on Computational Intelligence (SSCI) Pub Date : 2019-12-01 DOI:10.1109/SSCI44817.2019.9003090

R. Sifa, Maren Pielka, Rajkumar Ramamurthy, Anna Ladi, L. Hillebrand, C. Bauckhage

{"title":"翻译驱动的德语矛盾检测方法","authors":"R. Sifa, Maren Pielka, Rajkumar Ramamurthy, Anna Ladi, L. Hillebrand, C. Bauckhage","doi":"10.1109/SSCI44817.2019.9003090","DOIUrl":null,"url":null,"abstract":"With the recent advancements in Machine Learning based Natural Language Processing (NLP), language dependency has always been a limiting factor for a majority of NLP applications. Typically, models are trained for the English language due to the availability of very large labeled and unlabeled datasets, which also allow to fine tune models for that language. Contradiction Detection is one such problem that has found many practical applications in NLP and up to this point has only been studied in the context of English language. The scope of this paper is to examine a set of baseline methods for the Contradiction Detection task on German text. For this purpose, the well-known Stanford Natural Language Inference (SNLI) data set (110,000 sentence pairs) is machine-translated from English to German. We train and evaluate four classifiers on both the original and the translated data, using state-of-the-art textual data representations. Our main contribution is the first large-scale assessment for this problem in German, and a validation of machine translation as a data generation method. We also present a novel approach to learn sentence embeddings by exploiting the hidden states of an encoder-decoder Sequence-To-Sequence RNN trained for autoencoding or translation.","PeriodicalId":6729,"journal":{"name":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"46 1","pages":"2497-2505"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Towards Contradiction Detection in German: a Translation-Driven Approach\",\"authors\":\"R. Sifa, Maren Pielka, Rajkumar Ramamurthy, Anna Ladi, L. Hillebrand, C. Bauckhage\",\"doi\":\"10.1109/SSCI44817.2019.9003090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the recent advancements in Machine Learning based Natural Language Processing (NLP), language dependency has always been a limiting factor for a majority of NLP applications. Typically, models are trained for the English language due to the availability of very large labeled and unlabeled datasets, which also allow to fine tune models for that language. Contradiction Detection is one such problem that has found many practical applications in NLP and up to this point has only been studied in the context of English language. The scope of this paper is to examine a set of baseline methods for the Contradiction Detection task on German text. For this purpose, the well-known Stanford Natural Language Inference (SNLI) data set (110,000 sentence pairs) is machine-translated from English to German. We train and evaluate four classifiers on both the original and the translated data, using state-of-the-art textual data representations. Our main contribution is the first large-scale assessment for this problem in German, and a validation of machine translation as a data generation method. We also present a novel approach to learn sentence embeddings by exploiting the hidden states of an encoder-decoder Sequence-To-Sequence RNN trained for autoencoding or translation.\",\"PeriodicalId\":6729,\"journal\":{\"name\":\"2019 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"volume\":\"46 1\",\"pages\":\"2497-2505\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSCI44817.2019.9003090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI44817.2019.9003090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

随着基于机器学习的自然语言处理(NLP)的最新进展，语言依赖一直是大多数NLP应用的限制因素。通常，模型是针对英语进行训练的，因为有非常大的标记和未标记数据集，这也允许对该语言的模型进行微调。矛盾检测就是这样一个问题，它在NLP中有很多实际应用，到目前为止只在英语语言的背景下进行了研究。本文的研究范围是研究一套用于德语文本矛盾检测任务的基线方法。为此，著名的斯坦福自然语言推理(SNLI)数据集(11万个句子对)被机器从英语翻译成德语。我们使用最先进的文本数据表示，在原始和翻译数据上训练和评估四个分类器。我们的主要贡献是首次在德语中对该问题进行大规模评估，并验证了机器翻译作为数据生成方法。我们还提出了一种新的方法，通过利用编码器-解码器序列到序列RNN的隐藏状态来学习句子嵌入，该RNN训练用于自动编码或翻译。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Contradiction Detection in German: a Translation-Driven Approach

With the recent advancements in Machine Learning based Natural Language Processing (NLP), language dependency has always been a limiting factor for a majority of NLP applications. Typically, models are trained for the English language due to the availability of very large labeled and unlabeled datasets, which also allow to fine tune models for that language. Contradiction Detection is one such problem that has found many practical applications in NLP and up to this point has only been studied in the context of English language. The scope of this paper is to examine a set of baseline methods for the Contradiction Detection task on German text. For this purpose, the well-known Stanford Natural Language Inference (SNLI) data set (110,000 sentence pairs) is machine-translated from English to German. We train and evaluate four classifiers on both the original and the translated data, using state-of-the-art textual data representations. Our main contribution is the first large-scale assessment for this problem in German, and a validation of machine translation as a data generation method. We also present a novel approach to learn sentence embeddings by exploiting the hidden states of an encoder-decoder Sequence-To-Sequence RNN trained for autoencoding or translation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Symposium Series on Computational Intelligence (SSCI)

自引率

0.00%

发文量