L. Manik, Arida Ferti Syafiandini, Hani Febri Mustika, Achmad Fatchuttamam Abka, Y. Rianto
{"title":"印尼语基于词嵌入的POS标注器的词法和大写特征评价","authors":"L. Manik, Arida Ferti Syafiandini, Hani Febri Mustika, Achmad Fatchuttamam Abka, Y. Rianto","doi":"10.1109/IC3INA.2018.8629519","DOIUrl":null,"url":null,"abstract":"In this paper, morphological and capitalization features are employed to improve the current word embedding-based POS tagger for Bahasa Indonesia. The experiments are conducted with an architecture based on neural network model, that is a simple feedforward neural network with two input layers, one merge layer, and two hidden layers. The first input layer uses word embeddings (CBOW and Skip-gram) feature as the input while the second input layer uses morphological and capitalization features. The results show that the selected additional features improve the performance and accuracy of current word embedding-based POS tagger, although it is not really significant. The F1 score averages of all word embedding types are increasing from 93% to 94% and the accuracies are increasing from 92-93% to 94-95% on manually tagged corpus of about 250,000 tokens (12,775 unique tokens).","PeriodicalId":179466,"journal":{"name":"2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Evaluating the Morphological and Capitalization Features for Word Embedding-Based POS Tagger in Bahasa Indonesia\",\"authors\":\"L. Manik, Arida Ferti Syafiandini, Hani Febri Mustika, Achmad Fatchuttamam Abka, Y. Rianto\",\"doi\":\"10.1109/IC3INA.2018.8629519\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, morphological and capitalization features are employed to improve the current word embedding-based POS tagger for Bahasa Indonesia. The experiments are conducted with an architecture based on neural network model, that is a simple feedforward neural network with two input layers, one merge layer, and two hidden layers. The first input layer uses word embeddings (CBOW and Skip-gram) feature as the input while the second input layer uses morphological and capitalization features. The results show that the selected additional features improve the performance and accuracy of current word embedding-based POS tagger, although it is not really significant. The F1 score averages of all word embedding types are increasing from 93% to 94% and the accuracies are increasing from 92-93% to 94-95% on manually tagged corpus of about 250,000 tokens (12,775 unique tokens).\",\"PeriodicalId\":179466,\"journal\":{\"name\":\"2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA)\",\"volume\":\"189 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3INA.2018.8629519\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3INA.2018.8629519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating the Morphological and Capitalization Features for Word Embedding-Based POS Tagger in Bahasa Indonesia
In this paper, morphological and capitalization features are employed to improve the current word embedding-based POS tagger for Bahasa Indonesia. The experiments are conducted with an architecture based on neural network model, that is a simple feedforward neural network with two input layers, one merge layer, and two hidden layers. The first input layer uses word embeddings (CBOW and Skip-gram) feature as the input while the second input layer uses morphological and capitalization features. The results show that the selected additional features improve the performance and accuracy of current word embedding-based POS tagger, although it is not really significant. The F1 score averages of all word embedding types are increasing from 93% to 94% and the accuracies are increasing from 92-93% to 94-95% on manually tagged corpus of about 250,000 tokens (12,775 unique tokens).