{"title":"Indic Language Question Answering: A Survey","authors":"Dhruv Kolhatkar, Devika Verma","doi":"10.1109/ICAIS56108.2023.10073689","DOIUrl":null,"url":null,"abstract":"Over the past few years, research interest in the sub-domain of question answering has tremendously increased. Yet, most of the work on QA and more generally, on natural language processing has been predominantly limited to the English language. In contrast, with each passing year, the number of people with access to the internet is exponentially increasing, especially those residing in South Asian countries whose primary language is not English. With this in mind, the survey’s aim is to recognize, review and analyze the various question-answering datasets that exist for resource-scare Indic languages such as Hindi, Urdu, Tamil, and Marathi. It also intends to shed light on the state-of-the-art of Indic question-answering itself, in terms of methods used, best-performing models, and evaluation metrics. The review also includes multilingual benchmarks which have been recently published.","PeriodicalId":164345,"journal":{"name":"2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIS56108.2023.10073689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Over the past few years, research interest in the sub-domain of question answering has tremendously increased. Yet, most of the work on QA and more generally, on natural language processing has been predominantly limited to the English language. In contrast, with each passing year, the number of people with access to the internet is exponentially increasing, especially those residing in South Asian countries whose primary language is not English. With this in mind, the survey’s aim is to recognize, review and analyze the various question-answering datasets that exist for resource-scare Indic languages such as Hindi, Urdu, Tamil, and Marathi. It also intends to shed light on the state-of-the-art of Indic question-answering itself, in terms of methods used, best-performing models, and evaluation metrics. The review also includes multilingual benchmarks which have been recently published.