Hongru Wang , Wai-Chung Kwan , Min Li , Zimo Zhou , Kam-Fai Wong
{"title":"KddRES:面向餐厅的多层次知识驱动型对话数据集--迈向定制化对话系统","authors":"Hongru Wang , Wai-Chung Kwan , Min Li , Zimo Zhou , Kam-Fai Wong","doi":"10.1016/j.csl.2024.101637","DOIUrl":null,"url":null,"abstract":"<div><p>To alleviate the shortage of dialogue datasets for Cantonese, one of the low-resource languages, and facilitate the development of customized task-oriented dialogue systems, we propose <strong>KddRES</strong>, the first Cantonese <strong>K</strong>nowledge-driven <strong>d</strong>ialogue <strong>d</strong>ataset for <strong>RES</strong>taurants. It contains 834 multi-turn dialogues, 8000 utterances, and 26 distinct slots. The slots are hierarchical, and beneath the 26 coarse-grained slots are the additional 16 fine-grained slots. Annotations of dialogue states and dialogue actions at both the user and system sides are provided to suit multiple downstream tasks such as natural language understanding and dialogue state tracking. To effectively detect hierarchical slots, we propose a framework HierBERT by modelling label semantics and relationships between different slots. Experimental results demonstrate that KddRES is more challenging compared with existing datasets due to the introduction of hierarchical slots and our framework is particularly effective in detecting secondary slots and achieving a new state-of-the-art. Given the rich annotation and hierarchical slot structure of KddRES, we hope it will promote research on the development of customized dialogue systems in Cantonese and other conversational AI tasks, such as dialogue state tracking and policy learning.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101637"},"PeriodicalIF":3.1000,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"KddRES: A Multi-level Knowledge-driven Dialogue Dataset for Restaurant Towards Customized Dialogue System\",\"authors\":\"Hongru Wang , Wai-Chung Kwan , Min Li , Zimo Zhou , Kam-Fai Wong\",\"doi\":\"10.1016/j.csl.2024.101637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>To alleviate the shortage of dialogue datasets for Cantonese, one of the low-resource languages, and facilitate the development of customized task-oriented dialogue systems, we propose <strong>KddRES</strong>, the first Cantonese <strong>K</strong>nowledge-driven <strong>d</strong>ialogue <strong>d</strong>ataset for <strong>RES</strong>taurants. It contains 834 multi-turn dialogues, 8000 utterances, and 26 distinct slots. The slots are hierarchical, and beneath the 26 coarse-grained slots are the additional 16 fine-grained slots. Annotations of dialogue states and dialogue actions at both the user and system sides are provided to suit multiple downstream tasks such as natural language understanding and dialogue state tracking. To effectively detect hierarchical slots, we propose a framework HierBERT by modelling label semantics and relationships between different slots. Experimental results demonstrate that KddRES is more challenging compared with existing datasets due to the introduction of hierarchical slots and our framework is particularly effective in detecting secondary slots and achieving a new state-of-the-art. Given the rich annotation and hierarchical slot structure of KddRES, we hope it will promote research on the development of customized dialogue systems in Cantonese and other conversational AI tasks, such as dialogue state tracking and policy learning.</p></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"87 \",\"pages\":\"Article 101637\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000202\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000202","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
KddRES: A Multi-level Knowledge-driven Dialogue Dataset for Restaurant Towards Customized Dialogue System
To alleviate the shortage of dialogue datasets for Cantonese, one of the low-resource languages, and facilitate the development of customized task-oriented dialogue systems, we propose KddRES, the first Cantonese Knowledge-driven dialogue dataset for REStaurants. It contains 834 multi-turn dialogues, 8000 utterances, and 26 distinct slots. The slots are hierarchical, and beneath the 26 coarse-grained slots are the additional 16 fine-grained slots. Annotations of dialogue states and dialogue actions at both the user and system sides are provided to suit multiple downstream tasks such as natural language understanding and dialogue state tracking. To effectively detect hierarchical slots, we propose a framework HierBERT by modelling label semantics and relationships between different slots. Experimental results demonstrate that KddRES is more challenging compared with existing datasets due to the introduction of hierarchical slots and our framework is particularly effective in detecting secondary slots and achieving a new state-of-the-art. Given the rich annotation and hierarchical slot structure of KddRES, we hope it will promote research on the development of customized dialogue systems in Cantonese and other conversational AI tasks, such as dialogue state tracking and policy learning.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.