Jingyan Zhou, Xiaoying Zhang, Xiaohan Feng, King Keung Wu, H. Meng
{"title":"Automatic Extraction of Semantic Patterns in Dialogs using Convex Polytopic Model","authors":"Jingyan Zhou, Xiaoying Zhang, Xiaohan Feng, King Keung Wu, H. Meng","doi":"10.1109/ISCSLP49672.2021.9362051","DOIUrl":null,"url":null,"abstract":"Natural Language Understanding (NLU) in task-oriented dialog systems usually requires annotated data for training the understanding module. Annotation of large data sets is a costly process. This paper proposes an unsupervised framework based on Convex Polytopic Model (CPM), which automatically extracts semantic patterns from a raw dialog corpus using a geometric approach to assist in generating the semantic frames. We discover that the semantic patterns extracted are easily interpretable and have a strong correlation with the intent and slots of the semantic frames and may potentially serve as the basic units for NLU. This is an initial investigation of the properties of CPM to explore its semantic interpretability. Experiments are based on the ATIS (Air Travel Information System) corpora and show that CPM can generate semantic frames with minimal supervision.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Natural Language Understanding (NLU) in task-oriented dialog systems usually requires annotated data for training the understanding module. Annotation of large data sets is a costly process. This paper proposes an unsupervised framework based on Convex Polytopic Model (CPM), which automatically extracts semantic patterns from a raw dialog corpus using a geometric approach to assist in generating the semantic frames. We discover that the semantic patterns extracted are easily interpretable and have a strong correlation with the intent and slots of the semantic frames and may potentially serve as the basic units for NLU. This is an initial investigation of the properties of CPM to explore its semantic interpretability. Experiments are based on the ATIS (Air Travel Information System) corpora and show that CPM can generate semantic frames with minimal supervision.