{"title":"Named Entity Recognition of Thai Documents using CRF with a Simple Data Masking Technique","authors":"","doi":"10.1109/iSAI-NLP54397.2021.9678156","DOIUrl":null,"url":null,"abstract":"We examined the Named Entity Recognition (NER) of organizations in the Thai government’s project documents using a simple data masking technique with the help of an external dictionary. Our framework demonstrated its potential in the case that the external dictionary was incomplete and might not be used to label the training data exhaustively. A data masking technique on the administrative area part of the organization names was employed in an attempt to discover more organization entities outside the dictionary. The experimental results showed that our model gained higher recall while sacrificing a relatively small amount of precision. The proposed approach was also capable of recognizing entities which were never seen in the dictionary.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We examined the Named Entity Recognition (NER) of organizations in the Thai government’s project documents using a simple data masking technique with the help of an external dictionary. Our framework demonstrated its potential in the case that the external dictionary was incomplete and might not be used to label the training data exhaustively. A data masking technique on the administrative area part of the organization names was employed in an attempt to discover more organization entities outside the dictionary. The experimental results showed that our model gained higher recall while sacrificing a relatively small amount of precision. The proposed approach was also capable of recognizing entities which were never seen in the dictionary.