Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods最新文献

筛选
英文 中文
C2D2E2: Using Call Centers to Motivate the Use of Dialog and Diarization in Entity Extraction C2D2E2:使用呼叫中心激励在实体抽取中使用对话和分组
Kenneth Ward Church, Weizhong Zhu, Jason W. Pelecanos
{"title":"C2D2E2: Using Call Centers to Motivate the Use of Dialog and Diarization in Entity Extraction","authors":"Kenneth Ward Church, Weizhong Zhu, Jason W. Pelecanos","doi":"10.18653/v1/W16-6008","DOIUrl":"https://doi.org/10.18653/v1/W16-6008","url":null,"abstract":"This paper introduces a deceptively simple entity extraction task intended to encourage more interdisciplinary collaboration between fields that don’t normally work together: diarization, dialog and entity extraction. Given a corpus of 1.4M call center calls, extract mentions of trouble ticket numbers. The task is challenging because first mentions need to be distinguished from confirmations to avoid undesirable repetitions. It is common for agents to say part of the ticket number, and customers confirm with a repetition. There are opportunities for dialog (given/new) and diarization (who said what) to help remove repetitions. New information is spoken slowly by one side of a conversation; confirmations are spoken more quickly by the other side of the conversation. 1 Extracting Ticket Numbers Much has been written on extracting entities from text (Etzioni et al., 2005), and even speech (Kubala et al., 1998), but less has been written in the context of dialog (Clark and Haviland, 1977) and diarization (Tranter and Reynolds, 2006; Anguera et al., 2012; Shum, 2011). This paper describes a ticket extraction task illustrated in Table 1. The challenge is to extract a 7 byte ticket number, “902MDYK,” from the dialog. Confirmations ought to improve communication, but steps need to be taken to avoid undesirable repetition in extracted entities. Dialog theory suggests it should be possible to distinguish first mentions (bold) from confirmations (italics) based on prosodic cues such as pitch, energy and duration. t0 t1 S1 S2 278.16 281.07 I do have the new hardware case number for you when you’re ready 282.60 282.85 okay 284.19 284.80 nine 285.03 285.86 zero 286.22 286.74 two 290.82 291.30 nine 292.87 293.95 zero two 297.87 298.24 okay 299.30 300.49 M. as in Mike 301.97 303.56 D. as in delta 304.89 306.31 Y. as in Yankee 307.50 308.81 K. as in kilo 310.14 310.57 okay 310.77 311.70 nine zero two 311.73 312.49 M. D. 312.53 313.18 Y. T. 313.75 314.21 correct 314.21 317.28 and thank you for calling IBM is there anything else I can assist you with Table 1: A ticket dialog: 7 bytes (902MDYK) at 1.4 bps. First mentions (bold) are slower than confirmations (italics). phone matches calls ticket matches (edit dist) 66% 238 0 59% 82 1 55% 40 2 4.1% 4033 3+ Table 2: Phone numbers are used to confirm ticket matches. Good ticket matches (top row) are confirmed more often than poor matches (bottom row). Poor matches are more common because ticket numbers are relatively rare, and most calls don’t","PeriodicalId":274608,"journal":{"name":"Proceedings of the Workshop on Uphill Battles in Language Processing:\n Scaling Early Achievements to Robust Methods","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126753826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信