The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

European Association for Machine Translation Conferences/Workshops Pub Date : 2020-05-16 DOI:10.17863/CAM.49422

Felix Stahlberg

{"title":"The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction","authors":"Felix Stahlberg","doi":"10.17863/CAM.49422","DOIUrl":null,"url":null,"abstract":"With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks (Hochreiter and Schmidhuber, 1997) are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as bioinformatics (Min et al., 2016). Recent advances in contextual word embeddings like BERT (Devlin et al., 2019) boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand. At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements that tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This thesis can be understood as an antithesis to the prevailing paradigm. We show how traditional symbolic statistical machine translation (Koehn, 2009) models can still improve neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al., 2014; Bahdanau et al., 2015, NMT) while reducing the risk of common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural models to correct grammatical errors in text.","PeriodicalId":137211,"journal":{"name":"European Association for Machine Translation Conferences/Workshops","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Association for Machine Translation Conferences/Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17863/CAM.49422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks (Hochreiter and Schmidhuber, 1997) are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as bioinformatics (Min et al., 2016). Recent advances in contextual word embeddings like BERT (Devlin et al., 2019) boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand. At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements that tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This thesis can be understood as an antithesis to the prevailing paradigm. We show how traditional symbolic statistical machine translation (Koehn, 2009) models can still improve neural machine translation (Kalchbrenner and Blunsom, 2013; Sutskever et al., 2014; Bahdanau et al., 2015, NMT) while reducing the risk of common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural models to correct grammatical errors in text.

查看原文本刊更多论文

语言模型和层次模型在神经序列到序列预测中的作用

随着深度学习的出现，机器学习的许多领域的研究正在向同一套方法和模型融合。例如，长短期记忆网络(Hochreiter和Schmidhuber, 1997)不仅在语音识别、机器翻译、手写识别、句法解析等自然语言处理(NLP)中的各种任务中很受欢迎，而且还适用于生物信息学等看似不相关的领域(Min et al.， 2016)。上下文词嵌入的最新进展，如BERT (Devlin等人，2019)，在使用相同模型的11个NLP任务上取得了最先进的结果。在深度学习之前，语音识别器和语法解析器几乎没有共同点，因为系统更适合手头的任务。这种开发的核心是倾向于将每个任务视为另一个数据映射问题，而忽略了任务在实践中经常具有的特定特征和(软)需求。这通常伴随着深度学习方法与特定领域先前研究的急剧断裂。这篇论文可以被理解为对主流范式的反对。我们展示了传统的符号统计机器翻译(Koehn, 2009)模型如何仍然可以改进神经机器翻译(Kalchbrenner and Blunsom, 2013;Sutskever et al.， 2014;Bahdanau等人，2015,NMT)，同时降低NMT常见病理(如幻觉和新词)的风险。其他外部符号模型，如拼写检查器和词法数据库，可以帮助神经模型纠正文本中的语法错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Association for Machine Translation Conferences/Workshops

自引率

0.00%

发文量