Gary Wang, Ekin D.Cubuk, A. Rosenberg, Shuyang Cheng, Ron J. Weiss, B. Ramabhadran, P. Moreno, Quoc V. Le, Daniel S. Park
{"title":"G-Augment:面向ASR的数据增强策略元结构的搜索","authors":"Gary Wang, Ekin D.Cubuk, A. Rosenberg, Shuyang Cheng, Ron J. Weiss, B. Ramabhadran, P. Moreno, Quoc V. Le, Daniel S. Park","doi":"10.1109/SLT54892.2023.10022748","DOIUrl":null,"url":null,"abstract":"Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more “end-to-end,” the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present G(raph)-Augment, a technique to define the augmentation space as directed acyclic graphs (DAGs) and search over this space to optimize the augmentation policy itself. We show that given the same computational budget, policies produced by G-Augment are able to perform better than SpecAugment policies obtained by random search on fine-tuning tasks on CHiME-6 and AMI. G-Augment is also able to establish a new state-of-the-art ASR performance on the CHiME-6 evaluation set (30.7% WER). We further demonstrate that G- Augment policies show better transfer properties across warm-start to cold-start training and model size compared to random-searched SpecAugment policies.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR\",\"authors\":\"Gary Wang, Ekin D.Cubuk, A. Rosenberg, Shuyang Cheng, Ron J. Weiss, B. Ramabhadran, P. Moreno, Quoc V. Le, Daniel S. Park\",\"doi\":\"10.1109/SLT54892.2023.10022748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more “end-to-end,” the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present G(raph)-Augment, a technique to define the augmentation space as directed acyclic graphs (DAGs) and search over this space to optimize the augmentation policy itself. We show that given the same computational budget, policies produced by G-Augment are able to perform better than SpecAugment policies obtained by random search on fine-tuning tasks on CHiME-6 and AMI. G-Augment is also able to establish a new state-of-the-art ASR performance on the CHiME-6 evaluation set (30.7% WER). We further demonstrate that G- Augment policies show better transfer properties across warm-start to cold-start training and model size compared to random-searched SpecAugment policies.\",\"PeriodicalId\":352002,\"journal\":{\"name\":\"2022 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT54892.2023.10022748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT54892.2023.10022748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR
Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more “end-to-end,” the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present G(raph)-Augment, a technique to define the augmentation space as directed acyclic graphs (DAGs) and search over this space to optimize the augmentation policy itself. We show that given the same computational budget, policies produced by G-Augment are able to perform better than SpecAugment policies obtained by random search on fine-tuning tasks on CHiME-6 and AMI. G-Augment is also able to establish a new state-of-the-art ASR performance on the CHiME-6 evaluation set (30.7% WER). We further demonstrate that G- Augment policies show better transfer properties across warm-start to cold-start training and model size compared to random-searched SpecAugment policies.