A Comparative Study of the Performance of Unsupervised Text Segmentation Techniques on Dialogue Transcripts

Vidhi Gupta, Guangda Zhu, Andi Yu, Donald E. Brown
{"title":"A Comparative Study of the Performance of Unsupervised Text Segmentation Techniques on Dialogue Transcripts","authors":"Vidhi Gupta, Guangda Zhu, Andi Yu, Donald E. Brown","doi":"10.1109/SIEDS49339.2020.9106639","DOIUrl":null,"url":null,"abstract":"Contact centers provide customer interaction support to numerous organizations. In 2017, the contact center industry generated $200 billion in revenue worldwide, contributing to a significant proportion of market share, and yet businesses lost $75 billion due to poor customer satisfaction. Around 48% of consumers prefer using phones as their mode of communication with contact centers. Analysis of these calls can give insights into customer views and help businesses improve their customer engagement. To understand the structure and flow of the conversation, the conversation transcript can be segmented into meaningful sections such as “greeting exchange” “problem description” and “problem resolution”, to name a few. In this paper, we present a comparative study of various unsupervised methods of dialogue segmentation. We choose three classic unsupervised text segmentation techniques: TextTiling, TopicTiling, and Content Vector Segmentation, and evaluate their performance on 50 manually labeled dialogue conversation transcripts. The transcripts used span across contact center calls, live chat, interactions with chat-bots and talk show conversations. Additionally, we build on the TextTiling algorithm by incorporating semantic word embeddings for text representation. We show that this modification outperforms the three benchmarked approaches with a mean Pk value of 0.31, indicating that 69% of the boundaries are identified accurately at an average.","PeriodicalId":331495,"journal":{"name":"2020 Systems and Information Engineering Design Symposium (SIEDS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS49339.2020.9106639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Contact centers provide customer interaction support to numerous organizations. In 2017, the contact center industry generated $200 billion in revenue worldwide, contributing to a significant proportion of market share, and yet businesses lost $75 billion due to poor customer satisfaction. Around 48% of consumers prefer using phones as their mode of communication with contact centers. Analysis of these calls can give insights into customer views and help businesses improve their customer engagement. To understand the structure and flow of the conversation, the conversation transcript can be segmented into meaningful sections such as “greeting exchange” “problem description” and “problem resolution”, to name a few. In this paper, we present a comparative study of various unsupervised methods of dialogue segmentation. We choose three classic unsupervised text segmentation techniques: TextTiling, TopicTiling, and Content Vector Segmentation, and evaluate their performance on 50 manually labeled dialogue conversation transcripts. The transcripts used span across contact center calls, live chat, interactions with chat-bots and talk show conversations. Additionally, we build on the TextTiling algorithm by incorporating semantic word embeddings for text representation. We show that this modification outperforms the three benchmarked approaches with a mean Pk value of 0.31, indicating that 69% of the boundaries are identified accurately at an average.
无监督文本分割技术对白文本分割性能的比较研究
联络中心为许多组织提供客户交互支持。2017年,呼叫中心行业在全球创造了2000亿美元的收入,占据了相当大的市场份额,但由于客户满意度不佳,企业损失了750亿美元。大约48%的消费者更喜欢使用电话作为他们与联络中心的沟通方式。对这些电话的分析可以洞察客户的观点,并帮助企业提高客户参与度。为了理解对话的结构和流程,对话记录可以被分割成有意义的部分,如“问候交流”、“问题描述”和“问题解决”等等。本文对各种无监督的对话分割方法进行了比较研究。我们选择了三种经典的无监督文本分割技术:TextTiling, TopicTiling和内容向量分割,并在50个手动标记的对话对话文本上评估了它们的性能。使用的文字记录涵盖了呼叫中心呼叫、实时聊天、与聊天机器人的互动以及脱口秀对话。此外,我们在TextTiling算法的基础上,结合了用于文本表示的语义词嵌入。我们表明,这种修改优于三种基准方法,其平均Pk值为0.31,表明平均69%的边界被准确识别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信