{"title":"从计算特性的角度比较BERT和XLNet","authors":"Hailong Li, Jaewan Choi, Sunjung Lee, Jung Ho Ahn","doi":"10.1109/ICEIC49074.2020.9051081","DOIUrl":null,"url":null,"abstract":"Exploiting attention mechanism, Transformer provides superior performance compared to traditional CNN and RNN models on various NLP (Natural Language Processing) tasks. BERT and XLNet are two popular models utilizing Transformer. In this paper, we compare the computational characteristics of the inference of BERT and XLNet using MPRC (Microsoft Research Paraphrase Corpus), one of the popular language understanding benchmarks. Through evaluation, we observe that the both models exhibit similar computational characteristics except the target-position-aware representation and relative position encoding features of XLNet, leading to a better benchmark score at the cost of $\\mathit{1.2}\\times$ arithmetic operations and $\\mathit{1.5}\\times$ execution time on a modern CPU.","PeriodicalId":271345,"journal":{"name":"2020 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Comparing BERT and XLNet from the Perspective of Computational Characteristics\",\"authors\":\"Hailong Li, Jaewan Choi, Sunjung Lee, Jung Ho Ahn\",\"doi\":\"10.1109/ICEIC49074.2020.9051081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Exploiting attention mechanism, Transformer provides superior performance compared to traditional CNN and RNN models on various NLP (Natural Language Processing) tasks. BERT and XLNet are two popular models utilizing Transformer. In this paper, we compare the computational characteristics of the inference of BERT and XLNet using MPRC (Microsoft Research Paraphrase Corpus), one of the popular language understanding benchmarks. Through evaluation, we observe that the both models exhibit similar computational characteristics except the target-position-aware representation and relative position encoding features of XLNet, leading to a better benchmark score at the cost of $\\\\mathit{1.2}\\\\times$ arithmetic operations and $\\\\mathit{1.5}\\\\times$ execution time on a modern CPU.\",\"PeriodicalId\":271345,\"journal\":{\"name\":\"2020 International Conference on Electronics, Information, and Communication (ICEIC)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Electronics, Information, and Communication (ICEIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEIC49074.2020.9051081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIC49074.2020.9051081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparing BERT and XLNet from the Perspective of Computational Characteristics
Exploiting attention mechanism, Transformer provides superior performance compared to traditional CNN and RNN models on various NLP (Natural Language Processing) tasks. BERT and XLNet are two popular models utilizing Transformer. In this paper, we compare the computational characteristics of the inference of BERT and XLNet using MPRC (Microsoft Research Paraphrase Corpus), one of the popular language understanding benchmarks. Through evaluation, we observe that the both models exhibit similar computational characteristics except the target-position-aware representation and relative position encoding features of XLNet, leading to a better benchmark score at the cost of $\mathit{1.2}\times$ arithmetic operations and $\mathit{1.5}\times$ execution time on a modern CPU.