Knowledge-aware audio-grounded generative slot filling for limited annotated data

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Guangzhi Sun , Chao Zhang , Ivan Vulić , Paweł Budzianowski , Philip C. Woodland
{"title":"Knowledge-aware audio-grounded generative slot filling for limited annotated data","authors":"Guangzhi Sun ,&nbsp;Chao Zhang ,&nbsp;Ivan Vulić ,&nbsp;Paweł Budzianowski ,&nbsp;Philip C. Woodland","doi":"10.1016/j.csl.2024.101707","DOIUrl":null,"url":null,"abstract":"<div><p>Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by (1) framing it as a text generation task, (2) grounding text generation additionally in the audio modality, and (3) conditioning on available external knowledge (<em>e.g.</em> a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000901/pdfft?md5=f629f96f3e24fa1b58c6bf9d7f53386f&pid=1-s2.0-S0885230824000901-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000901","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by (1) framing it as a text generation task, (2) grounding text generation additionally in the audio modality, and (3) conditioning on available external knowledge (e.g. a predefined list of possible slot values). We show that combining both modalities within the KA2G framework improves the robustness against ASR errors. Further, the knowledge-aware slot-value generator in KA2G, implemented via a pointer generator mechanism, particularly benefits few-shot and zero-shot learning. Experiments, conducted on the standard speech-based single-turn SLURP dataset and a multi-turn dataset extracted from a commercial ToD system, display strong and consistent gains over prior work, especially in few-shot and zero-shot setups.

针对有限注释数据的知识感知音频生成槽填充
为面向任务的对话(ToD)系统手动标注细粒度的槽值标签是一项既费钱又费时的工作。这就促使人们研究利用有限的标注数据进行时隙填充的方法。此外,目前有关 ToD 的大部分研究工作都是以文本作为输入模式,而忽略了在处理口语时不完善的自动语音识别(ASR)所带来的额外挑战。在这项工作中,我们提出了一个知识感知音频-地基生成式插槽填充框架(称为 KA2G),该框架专注于使用语音输入进行 ToD 的少镜头和零镜头插槽填充。KA2G 通过(1)将语音 ToD 定义为文本生成任务,(2)将文本生成额外建立在音频模态上,以及(3)以可用的外部知识(预定义的可能槽值列表)为条件,实现了基于语音的 ToD 的稳健且数据高效的槽填充。我们的研究表明,在 KA2G 框架内将两种模态结合在一起可提高抗 ASR 错误的鲁棒性。此外,KA2G 中的知识感知时隙值生成器是通过指针生成器机制实现的,尤其有利于少次学习和零次学习。在基于标准语音的单匝 SLURP 数据集和从商业 ToD 系统中提取的多匝数据集上进行的实验显示,KA2G 比之前的研究成果具有更强、更稳定的优势,尤其是在少匝和零匝设置中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信