{"title":"HMM based Named Entity Recognition for inflectional language","authors":"Nita Patil, A. Patil, B. Pawar","doi":"10.1109/COMPTELIX.2017.8004034","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition (NER) is the problem of identifying named entities in natural language text, classifying them into various classes and assigning the proper class tag to each word in its context. This paper describes a Named Entity Recognition system for Marathi using Hidden Markov Model (HMM). It addresses the problem of assigning the correct named entity class tag to each word using probabilistic model trained on a manually tagged corpus for the Marathi language. The most probable named entity tag is assigned to each word using the Viterbi algorithm. Proposed system reports an overall F1-score of 62.70% when no preprocessing was applied whereas it reports an overall F1-score of 77.79% when preprocessing was applied on the same data. Thus, the performance of the system is improved by 15% when linguistic knowledge is used to preprocess the test and training dataset.","PeriodicalId":6917,"journal":{"name":"2017 International Conference on Computer, Communications and Electronics (Comptelix)","volume":"42 1","pages":"565-572"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computer, Communications and Electronics (Comptelix)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPTELIX.2017.8004034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Named Entity Recognition (NER) is the problem of identifying named entities in natural language text, classifying them into various classes and assigning the proper class tag to each word in its context. This paper describes a Named Entity Recognition system for Marathi using Hidden Markov Model (HMM). It addresses the problem of assigning the correct named entity class tag to each word using probabilistic model trained on a manually tagged corpus for the Marathi language. The most probable named entity tag is assigned to each word using the Viterbi algorithm. Proposed system reports an overall F1-score of 62.70% when no preprocessing was applied whereas it reports an overall F1-score of 77.79% when preprocessing was applied on the same data. Thus, the performance of the system is improved by 15% when linguistic knowledge is used to preprocess the test and training dataset.