Understanding BERT Rankers Under Distillation

Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval Pub Date : 2020-07-21 DOI:10.1145/3409256.3409838

Luyu Gao, Zhuyun Dai, Jamie Callan

引用次数: 33

Abstract

Deep language models, such as BERT pre-trained on large corpora, have given a huge performance boost to state-of-the-art information retrieval ranking systems. Knowledge embedded in such models allows them to pick up complex matching signals between passages and queries. However, the high computation cost during inference limits their deployment in real-world search scenarios. In this paper, we study if and how the knowledge for search within BERT can be transferred to a smaller ranker through distillation. Our experiments demonstrate that it is crucial to use a proper distillation procedure, which produces up to nine times speedup while preserving the state-of-the-art performance.

查看原文本刊更多论文

在蒸馏下理解BERT Rankers

深度语言模型，比如在大型语料库上预先训练的BERT，已经给最先进的信息检索排名系统带来了巨大的性能提升。这些模型中嵌入的知识使它们能够在段落和查询之间找到复杂的匹配信号。然而，推理过程中的高计算成本限制了它们在实际搜索场景中的部署。在本文中，我们研究了BERT中用于搜索的知识是否以及如何通过蒸馏转移到更小的秩。我们的实验表明，使用适当的蒸馏程序是至关重要的，它可以在保持最先进性能的同时产生高达9倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

自引率

0.00%

发文量