{"title":"Building an Indonesian named entity recognizer using Wikipedia and DBPedia","authors":"A. Luthfi, Bayu Distiawan Trisedya, R. Manurung","doi":"10.1109/IALP.2014.6973520","DOIUrl":null,"url":null,"abstract":"This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.