Using TF-IDF on Kisan Call Centre Dataset for Obtaining Query Answers

2018 International Conference on Communication, Computing and Internet of Things (IC3IoT) Pub Date : 2018-02-01 DOI:10.1109/IC3IOT.2018.8668134

S. K. Mohapatra, Anamika Upadhyay

{"title":"Using TF-IDF on Kisan Call Centre Dataset for Obtaining Query Answers","authors":"S. K. Mohapatra, Anamika Upadhyay","doi":"10.1109/IC3IOT.2018.8668134","DOIUrl":null,"url":null,"abstract":"Getting semantic similarity in short texts plays an important role for many tasks in the field of information retrieval. This helps in getting search results, fetching answers to queries, building summary of documents etc. We present an approach for manually and automatically getting answers to the different problems and queries of the farmers for their day to day agricultural work. Using this approach, we can provide a query to the model, to find relevant questions asked for that query and their possible answers. We have first preprocessed the data and converted to a similarity matrix which we save in a database using mongoDb. By taking the saved data from database we trained the model to get the information of the query based on similarity between the sentences of the queries, and then the application will find the best possible answer according to the similarity. We will be using Term-frequency-inverse document frequency (TF-IDF) to find the similar queries. With TF-IDF, every word is given weight, the TF-IDF is measured by frequency the relevance is not taken into consideration for this model. This information can be used in trainings to boost call agent effectiveness and improve the customer experience. As an added bonus, more effective communication reduces handle time and operating costs.","PeriodicalId":155587,"journal":{"name":"2018 International Conference on Communication, Computing and Internet of Things (IC3IoT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Communication, Computing and Internet of Things (IC3IoT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3IOT.2018.8668134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Getting semantic similarity in short texts plays an important role for many tasks in the field of information retrieval. This helps in getting search results, fetching answers to queries, building summary of documents etc. We present an approach for manually and automatically getting answers to the different problems and queries of the farmers for their day to day agricultural work. Using this approach, we can provide a query to the model, to find relevant questions asked for that query and their possible answers. We have first preprocessed the data and converted to a similarity matrix which we save in a database using mongoDb. By taking the saved data from database we trained the model to get the information of the query based on similarity between the sentences of the queries, and then the application will find the best possible answer according to the similarity. We will be using Term-frequency-inverse document frequency (TF-IDF) to find the similar queries. With TF-IDF, every word is given weight, the TF-IDF is measured by frequency the relevance is not taken into consideration for this model. This information can be used in trainings to boost call agent effectiveness and improve the customer experience. As an added bonus, more effective communication reduces handle time and operating costs.

查看原文本刊更多论文

在Kisan呼叫中心数据集上使用TF-IDF获取查询答案

摘要在信息检索领域，短文本语义相似度的获取对许多任务都起着重要的作用。这有助于获得搜索结果，获取查询的答案，构建文档摘要等。我们提出了一种人工和自动获取农民日常农业工作中不同问题和查询的答案的方法。使用这种方法，我们可以向模型提供查询，以查找该查询所要求的相关问题及其可能的答案。我们首先对数据进行预处理，并将其转换为使用mongoDb保存在数据库中的相似性矩阵。我们从数据库中获取保存的数据，训练模型根据查询句子之间的相似度来获取查询信息，然后应用程序根据相似度找到可能的最佳答案。我们将使用术语频率逆文档频率(TF-IDF)来查找类似的查询。对于TF-IDF，每个单词都被赋予权重，TF-IDF是通过频率来衡量的，该模型不考虑相关性。这些信息可以用于培训，以提高呼叫座席的效率和改善客户体验。作为额外的好处，更有效的沟通减少了处理时间和运营成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 International Conference on Communication, Computing and Internet of Things (IC3IoT)

自引率

0.00%

发文量