特定领域的知识精馏为会话式商务产生更小、更好的模型

Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI:10.18653/v1/2022.ecnlp-1.18

Kristen Howell, Jian Wang, Akshay Hazare, Joe Bradley, Chris Brew, Xi Chen, Matthew Dunn, Beth-Ann Hockey, Andrew Maurer, D. Widdows

{"title":"特定领域的知识精馏为会话式商务产生更小、更好的模型","authors":"Kristen Howell, Jian Wang, Akshay Hazare, Joe Bradley, Chris Brew, Xi Chen, Matthew Dunn, Beth-Ann Hockey, Andrew Maurer, D. Widdows","doi":"10.18653/v1/2022.ecnlp-1.18","DOIUrl":null,"url":null,"abstract":"We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Domain-specific knowledge distillation yields smaller and better models for conversational commerce\",\"authors\":\"Kristen Howell, Jian Wang, Akshay Hazare, Joe Bradley, Chris Brew, Xi Chen, Matthew Dunn, Beth-Ann Hockey, Andrew Maurer, D. Widdows\",\"doi\":\"10.18653/v1/2022.ecnlp-1.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.\",\"PeriodicalId\":384006,\"journal\":{\"name\":\"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.ecnlp-1.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.ecnlp-1.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们证明了知识蒸馏不仅可以用来减少模型的大小，而且可以同时使上下文语言模型适应特定的领域。我们使用多语言BERT (mBERT;Devlin等人，2019)作为起点，并遵循(Sahn等人，2019)的知识蒸馏方法来训练适应手头领域的较小的多语言BERT模型。我们表明，对于领域内任务，领域特定模型的F1分数平均提高了2.3%，相对于在领域通用数据上提炼的模型。尽管BERT之前的许多工作在任务训练期间对编码器权重进行了微调，但我们表明，即使在任务训练期间编码器权重冻结时，对域内数据进行蒸馏的模型改进仍然存在，从而允许单个编码器支持多个任务和语言的分类器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Domain-specific knowledge distillation yields smaller and better models for conversational commerce

We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)

自引率

0.00%

发文量