Annotation Process for the Dialog Act Classification of a Taglish E-commerce Q&A Corpus

Jared Rivera, Jan Caleb Oliver Pensica, Jolene Valenzuela, Alfonso Secuya, C. Cheng
{"title":"Annotation Process for the Dialog Act Classification of a Taglish E-commerce Q&A Corpus","authors":"Jared Rivera, Jan Caleb Oliver Pensica, Jolene Valenzuela, Alfonso Secuya, C. Cheng","doi":"10.18653/v1/D19-5108","DOIUrl":null,"url":null,"abstract":"With conversational agents or chatbots making up in quantity of replies rather than quality, the need to identify user intent has become a main concern to improve these agents. Dialog act (DA) classification tackles this concern, and while existing studies have already addressed DA classification in general contexts, no training corpora in the context of e-commerce is available to the public. This research addressed the said insufficiency by building a text-based corpus of 7,265 posts from the question and answer section of products on Lazada Philippines. The SWBD-DAMSL tagset for DA classification was modified to 28 tags fitting the categories applicable to e-commerce conversations. The posts were annotated manually by three (3) human annotators and preprocessing techniques decreased the vocabulary size from 6,340 to 1,134. After analysis, the corpus was composed dominantly of single-label posts, with 34% of the corpus having multiple intent tags. The annotated corpus allowed insights toward the structure of posts created with single to multiple intents.","PeriodicalId":119881,"journal":{"name":"Proceedings of the Second Workshop on Economics and Natural Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Economics and Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/D19-5108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

With conversational agents or chatbots making up in quantity of replies rather than quality, the need to identify user intent has become a main concern to improve these agents. Dialog act (DA) classification tackles this concern, and while existing studies have already addressed DA classification in general contexts, no training corpora in the context of e-commerce is available to the public. This research addressed the said insufficiency by building a text-based corpus of 7,265 posts from the question and answer section of products on Lazada Philippines. The SWBD-DAMSL tagset for DA classification was modified to 28 tags fitting the categories applicable to e-commerce conversations. The posts were annotated manually by three (3) human annotators and preprocessing techniques decreased the vocabulary size from 6,340 to 1,134. After analysis, the corpus was composed dominantly of single-label posts, with 34% of the corpus having multiple intent tags. The annotated corpus allowed insights toward the structure of posts created with single to multiple intents.
英语电子商务问答语料库对话行为分类的标注过程
由于会话代理或聊天机器人的回复数量大于质量,识别用户意图的需求已成为改进这些代理的主要关注点。对话行为(DA)分类解决了这一问题,虽然现有的研究已经解决了一般情况下的DA分类问题,但没有电子商务背景下的培训语料库可供公众使用。本研究通过建立一个基于文本的语料库,解决了上述不足,该语料库来自Lazada菲律宾产品问答部分的7,265个帖子。用于数据数据分类的SWBD-DAMSL标记集被修改为28个标记,适合适用于电子商务对话的类别。这些帖子由3名人工注释者手工注释,预处理技术将词汇量从6340个减少到1134个。经过分析,语料库主要由单标签帖子组成,其中34%的语料库具有多个意图标签。带注释的语料库允许深入了解由单个或多个意图创建的帖子的结构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信