Kanyanut Kriengket, Kanchana Saengthongpattana, Peerachet Porkaew, Vorapon Luantangsrisuk, P. Boonkwan, T. Supnithi
{"title":"Behavioral Analysis of Transformer Models on Complex Grammatical Structures","authors":"Kanyanut Kriengket, Kanchana Saengthongpattana, Peerachet Porkaew, Vorapon Luantangsrisuk, P. Boonkwan, T. Supnithi","doi":"10.1109/iSAI-NLP51646.2020.9376782","DOIUrl":null,"url":null,"abstract":"State-of-the-art neural MT, e.g. Transformer, yields quite promising translation accuracy. However, these models are easy to be interfered by noises, causing over- and undertranslation issues. This paper presents a behavioral analysis of Transformer models in translating complex grammatical structures, i.e. multiple-word expressions and long-distance dependency. Results consistently show that the more complex structures, the less translation accuracy the models yield. We imply that as phrase structures become more complex, the focus patterns learned by the attention mechanism may get erratically sporadic due to the issue of data sparseness. We suggest the use of locality penalty and the increase of attention heads to mitigate the issue, but their trade-offs should also be aware.","PeriodicalId":311014,"journal":{"name":"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"4 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP51646.2020.9376782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
State-of-the-art neural MT, e.g. Transformer, yields quite promising translation accuracy. However, these models are easy to be interfered by noises, causing over- and undertranslation issues. This paper presents a behavioral analysis of Transformer models in translating complex grammatical structures, i.e. multiple-word expressions and long-distance dependency. Results consistently show that the more complex structures, the less translation accuracy the models yield. We imply that as phrase structures become more complex, the focus patterns learned by the attention mechanism may get erratically sporadic due to the issue of data sparseness. We suggest the use of locality penalty and the increase of attention heads to mitigate the issue, but their trade-offs should also be aware.