{"title":"An Entropy Minimization Approach to Dialogue Segmentation","authors":"M. Gnjatović, N. Maček","doi":"10.1109/CogInfoCom50765.2020.9237832","DOIUrl":null,"url":null,"abstract":"This paper introduces an approach to segmentation of short dialogue fragments, based on the entropy of linguistic cues. Starting from the assumption that a dialogue fragment consists of two non-overlapping segments, the segment boundary is determined to minimize the maximum interactional entropy amongst the segments. The approach is evaluated on a corpus of 4500 artificially generated two-segment dialogues, each of which containing from 8 to 12 dialogue acts. In 29.20 percent of the dialogues, the detected segment boundary coincides with the actual segment boundary, and in 69.27 percent of the dialogues, the detected segment boundary either coincides with the actual boundary or is immediately preceded or succeeded by it.","PeriodicalId":236400,"journal":{"name":"2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CogInfoCom50765.2020.9237832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper introduces an approach to segmentation of short dialogue fragments, based on the entropy of linguistic cues. Starting from the assumption that a dialogue fragment consists of two non-overlapping segments, the segment boundary is determined to minimize the maximum interactional entropy amongst the segments. The approach is evaluated on a corpus of 4500 artificially generated two-segment dialogues, each of which containing from 8 to 12 dialogue acts. In 29.20 percent of the dialogues, the detected segment boundary coincides with the actual segment boundary, and in 69.27 percent of the dialogues, the detected segment boundary either coincides with the actual boundary or is immediately preceded or succeeded by it.