Yasuhito Ohsugi, Itsumi Saito, Kyosuke Nishida, Sen Yoshida
{"title":"Japanese ASR-Robust Pre-trained Language Model with Pseudo-Error Sentences Generated by Grapheme-Phoneme Conversion","authors":"Yasuhito Ohsugi, Itsumi Saito, Kyosuke Nishida, Sen Yoshida","doi":"10.21437/interspeech.2022-327","DOIUrl":null,"url":null,"abstract":"Spoken language understanding systems typically consist of a pipeline of automatic speech recognition (ASR) and natural language processing (NLP) modules. Although pre-trained language models (PLMs) have been successful in NLP by training on large corpora of written texts; spoken language with serious ASR errors that change its meaning is difficult to understand. We propose a method for pre-training Japanese LMs robust against ASR errors without using ASR. With the proposed method using written texts, sentences containing pseudo-ASR errors are generated using a pseudo-error dictionary constructed using grapheme-to-phoneme and phoneme-to-grapheme models based on neural networks. Experiments on spoken dialogue summarization showed that the ASR-robust LM pre-trained with the proposed method outperformed the LM pre-trained with standard masked language modeling by 3.17 points on ROUGE-L when fine-tuning with dialogues including ASR errors.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"2688-2692"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Spoken language understanding systems typically consist of a pipeline of automatic speech recognition (ASR) and natural language processing (NLP) modules. Although pre-trained language models (PLMs) have been successful in NLP by training on large corpora of written texts; spoken language with serious ASR errors that change its meaning is difficult to understand. We propose a method for pre-training Japanese LMs robust against ASR errors without using ASR. With the proposed method using written texts, sentences containing pseudo-ASR errors are generated using a pseudo-error dictionary constructed using grapheme-to-phoneme and phoneme-to-grapheme models based on neural networks. Experiments on spoken dialogue summarization showed that the ASR-robust LM pre-trained with the proposed method outperformed the LM pre-trained with standard masked language modeling by 3.17 points on ROUGE-L when fine-tuning with dialogues including ASR errors.