基于bert的口语文档检索增强排序模型

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003890

Hsiao-Yun Lin, Tien-Hong Lo, Berlin Chen

{"title":"基于bert的口语文档检索增强排序模型","authors":"Hsiao-Yun Lin, Tien-Hong Lo, Berlin Chen","doi":"10.1109/ASRU46091.2019.9003890","DOIUrl":null,"url":null,"abstract":"The Bidirectional Encoder Representations from Transformers (BERT) model has recently achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. However, relatively little work has been done on ad-hoc information retrieval (IR), especially for spoken document retrieval (SDR). This paper adopts and extends BERT for SDR, while its contributions are at least three-fold. First, we augment BERT with extra language features such as unigram and inverse document frequency (IDF) statistics to make it more applicable to SDR. Second, we also explore the incorporation of confidence scores into document representations to see if they could help alleviate the negative effects resulting from imperfect automatic speech recognition (ASR). Third, we conduct a comprehensive set of experiments to compare our BERT-based ranking methods with other state-of-the-art ones and investigate the synergy effect of them as well.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Enhanced Bert-Based Ranking Models for Spoken Document Retrieval\",\"authors\":\"Hsiao-Yun Lin, Tien-Hong Lo, Berlin Chen\",\"doi\":\"10.1109/ASRU46091.2019.9003890\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Bidirectional Encoder Representations from Transformers (BERT) model has recently achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. However, relatively little work has been done on ad-hoc information retrieval (IR), especially for spoken document retrieval (SDR). This paper adopts and extends BERT for SDR, while its contributions are at least three-fold. First, we augment BERT with extra language features such as unigram and inverse document frequency (IDF) statistics to make it more applicable to SDR. Second, we also explore the incorporation of confidence scores into document representations to see if they could help alleviate the negative effects resulting from imperfect automatic speech recognition (ASR). Third, we conduct a comprehensive set of experiments to compare our BERT-based ranking methods with other state-of-the-art ones and investigate the synergy effect of them as well.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003890\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

来自变形金刚的双向编码器表示(BERT)模型最近在许多自然语言处理(NLP)任务(如问答和语言理解)上取得了破纪录的成功。然而，关于特别信息检索(IR)，特别是口头文件检索(SDR)的研究相对较少。本文在SDR中采用并扩展了BERT，其贡献至少有三倍。首先，我们增加了额外的语言特征，如一元和逆文档频率(IDF)统计，使BERT更适用于SDR。其次，我们还探讨了将置信度分数纳入文档表示，以了解它们是否有助于减轻不完美的自动语音识别(ASR)所带来的负面影响。第三，我们进行了一套全面的实验，将我们基于bert的排名方法与其他最先进的排名方法进行比较，并研究它们的协同效应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhanced Bert-Based Ranking Models for Spoken Document Retrieval

The Bidirectional Encoder Representations from Transformers (BERT) model has recently achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. However, relatively little work has been done on ad-hoc information retrieval (IR), especially for spoken document retrieval (SDR). This paper adopts and extends BERT for SDR, while its contributions are at least three-fold. First, we augment BERT with extra language features such as unigram and inverse document frequency (IDF) statistics to make it more applicable to SDR. Second, we also explore the incorporation of confidence scores into document representations to see if they could help alleviate the negative effects resulting from imperfect automatic speech recognition (ASR). Third, we conduct a comprehensive set of experiments to compare our BERT-based ranking methods with other state-of-the-art ones and investigate the synergy effect of them as well.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量