R. Rajan, Anna J. Joseph, Elizabeth K. Robin, Nishma T. K. Fathima
{"title":"Part-Of-Speech Tagger in Malayalam Using Bi-directional LSTM","authors":"R. Rajan, Anna J. Joseph, Elizabeth K. Robin, Nishma T. K. Fathima","doi":"10.1109/o-cocosda50338.2020.9295018","DOIUrl":null,"url":null,"abstract":"The majority of activities performed by humans are done through language, whether communicated directly or reported using natural language. As technology is increasingly making the methods and platforms on which we communicate ever more accessible, there is a great need to understand the languages we use to communicate. By combining the power of artificial intelligence, computational linguistics and computer science, natural language processing (NLP) helps machines “read” text by simulating the human ability to understand language. Part-of-speech tagging (POS Tagging) is done as a pre-requisite to simplify a lot of different NLP applications like question answering, speech recognition, machine translation, and so on. Here, we attempt a comparison between part-of-speech taggers in Malayalam using decision tree algorithm and bi-directional long short term memory (BLSTM). The experiments presented in this paper use two corpora, one of 29076 sentences and the other of 500 sentences for performance evaluation. The experiments demonstrate the potential of architectural choice of BLSTM-based tagger over conventional decision tree-based tagging in Malayalam.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/o-cocosda50338.2020.9295018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The majority of activities performed by humans are done through language, whether communicated directly or reported using natural language. As technology is increasingly making the methods and platforms on which we communicate ever more accessible, there is a great need to understand the languages we use to communicate. By combining the power of artificial intelligence, computational linguistics and computer science, natural language processing (NLP) helps machines “read” text by simulating the human ability to understand language. Part-of-speech tagging (POS Tagging) is done as a pre-requisite to simplify a lot of different NLP applications like question answering, speech recognition, machine translation, and so on. Here, we attempt a comparison between part-of-speech taggers in Malayalam using decision tree algorithm and bi-directional long short term memory (BLSTM). The experiments presented in this paper use two corpora, one of 29076 sentences and the other of 500 sentences for performance evaluation. The experiments demonstrate the potential of architectural choice of BLSTM-based tagger over conventional decision tree-based tagging in Malayalam.