使用机器学习的基于检索的端到端泰米尔语封闭域会话代理

Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2021-12-17 DOI:10.1145/3508230.3508251

Kumaran Kugathasan, Uthayasanker Thayasivam

{"title":"使用机器学习的基于检索的端到端泰米尔语封闭域会话代理","authors":"Kumaran Kugathasan, Uthayasanker Thayasivam","doi":"10.1145/3508230.3508251","DOIUrl":null,"url":null,"abstract":"Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Retrieval-based End-to-End Tamil language Conversational Agent for Closed Domain using Machine Learning\",\"authors\":\"Kumaran Kugathasan, Uthayasanker Thayasivam\",\"doi\":\"10.1145/3508230.3508251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.\",\"PeriodicalId\":252146,\"journal\":{\"name\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508230.3508251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

世界各地的企业已经开始采用基于文本的会话代理来提供出色的客户体验，作为最小化昂贵的客户服务代理的替代方案。对于那些为讲英语等资源丰富的语言的客户提供服务的企业来说，想出一个对话代理相对容易，因为有足够多的、更多的付费和开源聊天机器人框架可用。但是对于像泰米尔语这样的低资源语言，没有这样的框架支持。由于缺乏大量的语言相关资源，研究中提出的构建高资源语言聊天机器人的方法并不适合泰米尔语。本文提出了一种利用从FAQ语料库中抓取的数据集构建泰米尔语会话代理的新方法，并对其进行扩展，以捕获泰米尔语的形态丰富性和高度非弹性特性。每个问题都映射到意图，并建立了一个多类意图分类器来识别用户的意图。基于CNN的分类器表现最好，准确率为98.72%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Retrieval-based End-to-End Tamil language Conversational Agent for Closed Domain using Machine Learning

Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval

自引率

0.00%

发文量