Hindi Chatbot for Supporting Maternal and Child Health Related Queries in Rural India

Clinical Natural Language Processing Workshop Pub Date : 1900-01-01 DOI:10.18653/v1/2023.clinicalnlp-1.9

Ritwik Mishra, Simranjeet Singh, Jasmeet Kaur, Pushpendra Singh, R. Shah

{"title":"Hindi Chatbot for Supporting Maternal and Child Health Related Queries in Rural India","authors":"Ritwik Mishra, Simranjeet Singh, Jasmeet Kaur, Pushpendra Singh, R. Shah","doi":"10.18653/v1/2023.clinicalnlp-1.9","DOIUrl":null,"url":null,"abstract":"In developing countries like India, doctors and healthcare professionals working in public health spend significant time answering health queries that are fact-based and repetitive. Therefore, we propose an automated way to answer maternal and child health-related queries. A database of Frequently Asked Questions (FAQs) and their corresponding answers generated by experts is curated from rural health workers and young mothers. We develop a Hindi chatbot that identifies k relevant Question and Answer (QnA) pairs from the database in response to a healthcare query (q) written in Devnagri script or Hindi-English (Hinglish) code-mixed script. The curated database covers 80% of all the queries that a user of our study is likely to ask. We experimented with (i) rule-based methods, (ii) sentence embeddings, and (iii) a paraphrasing classifier, to calculate the q-Q similarity. We observed that paraphrasing classifier gives the best result when trained first on an open-domain text and then on the healthcare domain. Our chatbot uses an ensemble of all three approaches. We observed that if a given q can be answered using the database, then our chatbot can provide at least one relevant QnA pair among its top three suggestions for up to 70% of the queries.","PeriodicalId":216954,"journal":{"name":"Clinical Natural Language Processing Workshop","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Natural Language Processing Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2023.clinicalnlp-1.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In developing countries like India, doctors and healthcare professionals working in public health spend significant time answering health queries that are fact-based and repetitive. Therefore, we propose an automated way to answer maternal and child health-related queries. A database of Frequently Asked Questions (FAQs) and their corresponding answers generated by experts is curated from rural health workers and young mothers. We develop a Hindi chatbot that identifies k relevant Question and Answer (QnA) pairs from the database in response to a healthcare query (q) written in Devnagri script or Hindi-English (Hinglish) code-mixed script. The curated database covers 80% of all the queries that a user of our study is likely to ask. We experimented with (i) rule-based methods, (ii) sentence embeddings, and (iii) a paraphrasing classifier, to calculate the q-Q similarity. We observed that paraphrasing classifier gives the best result when trained first on an open-domain text and then on the healthcare domain. Our chatbot uses an ensemble of all three approaches. We observed that if a given q can be answered using the database, then our chatbot can provide at least one relevant QnA pair among its top three suggestions for up to 70% of the queries.

查看原文本刊更多论文

用于支持印度农村妇幼保健相关查询的印地语聊天机器人

在印度等发展中国家，从事公共卫生工作的医生和卫生保健专业人员花费大量时间回答基于事实和重复的健康问题。因此，我们提出了一种自动化的方式来回答母婴健康相关的问题。一个由农村卫生工作者和年轻母亲编制的常见问题数据库及其相应答案由专家编制。我们开发了一个印地语聊天机器人，它从数据库中识别出k个相关的问答(QnA)对，以响应以Devnagri脚本或印地语-英语(Hinglish)代码混合脚本编写的医疗保健查询(q)。精心策划的数据库涵盖了我们研究的用户可能提出的所有查询的80%。我们尝试了(i)基于规则的方法，(ii)句子嵌入，以及(iii)释义分类器来计算q-Q相似度。我们观察到，当先在开放领域文本上训练，然后在医疗保健领域训练时，释义分类器给出了最好的结果。我们的聊天机器人使用了这三种方法的集合。我们观察到，如果一个给定的问题可以使用数据库回答，那么我们的聊天机器人可以为多达70%的查询提供前三个建议中的至少一个相关的QnA对。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical Natural Language Processing Workshop

自引率

0.00%

发文量