使用机器学习的基于检索的端到端泰米尔语封闭域会话代理

Kumaran Kugathasan, Uthayasanker Thayasivam
{"title":"使用机器学习的基于检索的端到端泰米尔语封闭域会话代理","authors":"Kumaran Kugathasan, Uthayasanker Thayasivam","doi":"10.1145/3508230.3508251","DOIUrl":null,"url":null,"abstract":"Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Retrieval-based End-to-End Tamil language Conversational Agent for Closed Domain using Machine Learning\",\"authors\":\"Kumaran Kugathasan, Uthayasanker Thayasivam\",\"doi\":\"10.1145/3508230.3508251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.\",\"PeriodicalId\":252146,\"journal\":{\"name\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508230.3508251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

世界各地的企业已经开始采用基于文本的会话代理来提供出色的客户体验,作为最小化昂贵的客户服务代理的替代方案。对于那些为讲英语等资源丰富的语言的客户提供服务的企业来说,想出一个对话代理相对容易,因为有足够多的、更多的付费和开源聊天机器人框架可用。但是对于像泰米尔语这样的低资源语言,没有这样的框架支持。由于缺乏大量的语言相关资源,研究中提出的构建高资源语言聊天机器人的方法并不适合泰米尔语。本文提出了一种利用从FAQ语料库中抓取的数据集构建泰米尔语会话代理的新方法,并对其进行扩展,以捕获泰米尔语的形态丰富性和高度非弹性特性。每个问题都映射到意图,并建立了一个多类意图分类器来识别用户的意图。基于CNN的分类器表现最好,准确率为98.72%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Retrieval-based End-to-End Tamil language Conversational Agent for Closed Domain using Machine Learning
Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信