{"title":"Linguistically informed automatic speech recognition in Sanskrit","authors":"Rishabh Kumar , Devaraja Adiga , Rishav Ranjan , Amrith Krishna , Ganesh Ramakrishnan , Pawan Goyal , Preethi Jyothi","doi":"10.1016/j.csl.2025.101861","DOIUrl":null,"url":null,"abstract":"<div><div>The field of Automatic Speech Recognition (ASR) for Sanskrit is marked by distinctive challenges, primarily due to the language’s intricate linguistic and morphological characteristics. Recognizing the burgeoning interest in this domain, we present the ‘Vāksañcayah’ speech corpus, a comprehensive collection that captures the linguistic depth and complexities of Sanskrit. Building upon our prior work, which focused on various acoustic model (AM) and language model (LM) units, we present an enhanced ASR system. This system integrates innovative subword tokenization methods and enriches the search space with linguistic insights. Addressing the issue of high out-of-vocabulary (OOV) rates and the prevalence of infrequently used words in Sanskrit, we employed a subword-based language model. Our approach mitigates these challenges and facilitates the generation of a subword-based search space. While effective in numerous scenarios, this model encounters limitations regarding long-range dependencies and semantic context comprehension. To counter these limitations, we leveraged Sanskrit’s rich morphological framework, thus achieving a more holistic understanding. The subword-based search space is subsequently transformed into a word-based format and augmented with morphological and lexical data, derived from a lexically driven shallow parser. Enhancing this further, we rescore transitions within this enriched space using a supervised morphological parser specifically designed for Sanskrit. Our proposed methodology is currently acclaimed as the most advanced in the realm of Sanskrit ASR, achieving a Word Error Rate (WER) of 12.54 and an improvement of 3.77 absolute points over the previous best. Additionally, we annotated 500 utterances with detailed morphological data and their corresponding lemmas, providing a basis for extensive linguistic analysis.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101861"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000865","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The field of Automatic Speech Recognition (ASR) for Sanskrit is marked by distinctive challenges, primarily due to the language’s intricate linguistic and morphological characteristics. Recognizing the burgeoning interest in this domain, we present the ‘Vāksañcayah’ speech corpus, a comprehensive collection that captures the linguistic depth and complexities of Sanskrit. Building upon our prior work, which focused on various acoustic model (AM) and language model (LM) units, we present an enhanced ASR system. This system integrates innovative subword tokenization methods and enriches the search space with linguistic insights. Addressing the issue of high out-of-vocabulary (OOV) rates and the prevalence of infrequently used words in Sanskrit, we employed a subword-based language model. Our approach mitigates these challenges and facilitates the generation of a subword-based search space. While effective in numerous scenarios, this model encounters limitations regarding long-range dependencies and semantic context comprehension. To counter these limitations, we leveraged Sanskrit’s rich morphological framework, thus achieving a more holistic understanding. The subword-based search space is subsequently transformed into a word-based format and augmented with morphological and lexical data, derived from a lexically driven shallow parser. Enhancing this further, we rescore transitions within this enriched space using a supervised morphological parser specifically designed for Sanskrit. Our proposed methodology is currently acclaimed as the most advanced in the realm of Sanskrit ASR, achieving a Word Error Rate (WER) of 12.54 and an improvement of 3.77 absolute points over the previous best. Additionally, we annotated 500 utterances with detailed morphological data and their corresponding lemmas, providing a basis for extensive linguistic analysis.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.