{"title":"A Comprehensive Stop-Word Compilation for Kannada Language Processing","authors":"Sowmya M.S, Panduranga Rao M.V","doi":"10.14445/23488549/ijece-v11i2p108","DOIUrl":null,"url":null,"abstract":"- In this work, a vital aspect of Kannada Natural Language Processing (NLP) takes the stage, with the construction of a standardized stop-word list emerging as a pioneering endeavor. This essential list serves as a foundation for improving language comprehension and processing activities. The work offers a rigorous technique that includes data gathering, tokenization, and TF-IDF score computation using the IndicCorp Kannada dataset. The study innovatively pioneers the construction of a stop-word list exclusively designed for the Kannada language, a first in this domain. The findings highlight the significance of these stop words and their prospective applications in diverse NLP endeavors, providing the framework for the upcoming construction of a Kannada-specific text summarizing work. The human refinement procedure ensures precision in stop-word compilation while considering inherent subjectivity and dataset-specific restrictions. Importantly, this study not only gives valuable insights into linguistic characteristics but also pioneers an innovative approach for stop-word generation in Kannada, establishing itself as a pioneering effort in this specific area of research. Furthermore, the study goes beyond its immediate findings by offering methodologies for the automated compilation and validation of stop words, thus laying the groundwork for further research. This foresight adds to the ongoing advancement of Kannada NLP methods.","PeriodicalId":289221,"journal":{"name":"International Journal of Electronics and Communication Engineering","volume":"36 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electronics and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14445/23488549/ijece-v11i2p108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
- In this work, a vital aspect of Kannada Natural Language Processing (NLP) takes the stage, with the construction of a standardized stop-word list emerging as a pioneering endeavor. This essential list serves as a foundation for improving language comprehension and processing activities. The work offers a rigorous technique that includes data gathering, tokenization, and TF-IDF score computation using the IndicCorp Kannada dataset. The study innovatively pioneers the construction of a stop-word list exclusively designed for the Kannada language, a first in this domain. The findings highlight the significance of these stop words and their prospective applications in diverse NLP endeavors, providing the framework for the upcoming construction of a Kannada-specific text summarizing work. The human refinement procedure ensures precision in stop-word compilation while considering inherent subjectivity and dataset-specific restrictions. Importantly, this study not only gives valuable insights into linguistic characteristics but also pioneers an innovative approach for stop-word generation in Kannada, establishing itself as a pioneering effort in this specific area of research. Furthermore, the study goes beyond its immediate findings by offering methodologies for the automated compilation and validation of stop words, thus laying the groundwork for further research. This foresight adds to the ongoing advancement of Kannada NLP methods.