Parmonangan R. Togatorop, Rosa Siagian, Yolanda Nainggolan, Kaleb Simanungkalit
{"title":"基于Word2Vec和DBSCAN的词性本体的实现","authors":"Parmonangan R. Togatorop, Rosa Siagian, Yolanda Nainggolan, Kaleb Simanungkalit","doi":"10.1145/3427423.3427431","DOIUrl":null,"url":null,"abstract":"POS tagging is a process of marking text into an appropriate word-class based on word definitions and word relationships. In general, several POS tagging approaches have been applied in Bahasa Indonesia namely rule-based, stochastic, and neural. Besides, there is another approach to POS tagging which has been applied to English, namely the approach using ontology. This approach has not yet been applied to Bahasa Indonesia so we will implement an ontology to conduct POS tagging in Bahasa Indonesia. In this study, the ontology was constructed using the Word2Vec and the DBSCAN clustering method. The Word2Vec model is implemented to extract each word in vector form based on its context and the DBSCAN clustering method is implemented for the classification process of word classes based on word vectors modeled by Word2Vec. The process of POS tagging with ontology is carried out in several stages, namely: data collection using web scraping techniques from Kompas.com and Detik.com online news articles, text preprocessing, Word2Vec feature building, clustering with DBSCAN, ontology construction and evaluation. The experiments carried out in this study were to choose the optimal parameter values from DBSCAN in forming word clusters for ontology construction. Overall, the implementation of ontology with Word2Vec and DBSCAN can do POS tagging with the highest accuracy value of 0.62, the highest precision value of 0.79, the highest recall value of 0.62, and the highest f1-score of 0.67.","PeriodicalId":120194,"journal":{"name":"Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Implementation of ontology-based on Word2Vec and DBSCAN for part-of-speech\",\"authors\":\"Parmonangan R. Togatorop, Rosa Siagian, Yolanda Nainggolan, Kaleb Simanungkalit\",\"doi\":\"10.1145/3427423.3427431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"POS tagging is a process of marking text into an appropriate word-class based on word definitions and word relationships. In general, several POS tagging approaches have been applied in Bahasa Indonesia namely rule-based, stochastic, and neural. Besides, there is another approach to POS tagging which has been applied to English, namely the approach using ontology. This approach has not yet been applied to Bahasa Indonesia so we will implement an ontology to conduct POS tagging in Bahasa Indonesia. In this study, the ontology was constructed using the Word2Vec and the DBSCAN clustering method. The Word2Vec model is implemented to extract each word in vector form based on its context and the DBSCAN clustering method is implemented for the classification process of word classes based on word vectors modeled by Word2Vec. The process of POS tagging with ontology is carried out in several stages, namely: data collection using web scraping techniques from Kompas.com and Detik.com online news articles, text preprocessing, Word2Vec feature building, clustering with DBSCAN, ontology construction and evaluation. The experiments carried out in this study were to choose the optimal parameter values from DBSCAN in forming word clusters for ontology construction. Overall, the implementation of ontology with Word2Vec and DBSCAN can do POS tagging with the highest accuracy value of 0.62, the highest precision value of 0.79, the highest recall value of 0.62, and the highest f1-score of 0.67.\",\"PeriodicalId\":120194,\"journal\":{\"name\":\"Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3427423.3427431\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3427423.3427431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementation of ontology-based on Word2Vec and DBSCAN for part-of-speech
POS tagging is a process of marking text into an appropriate word-class based on word definitions and word relationships. In general, several POS tagging approaches have been applied in Bahasa Indonesia namely rule-based, stochastic, and neural. Besides, there is another approach to POS tagging which has been applied to English, namely the approach using ontology. This approach has not yet been applied to Bahasa Indonesia so we will implement an ontology to conduct POS tagging in Bahasa Indonesia. In this study, the ontology was constructed using the Word2Vec and the DBSCAN clustering method. The Word2Vec model is implemented to extract each word in vector form based on its context and the DBSCAN clustering method is implemented for the classification process of word classes based on word vectors modeled by Word2Vec. The process of POS tagging with ontology is carried out in several stages, namely: data collection using web scraping techniques from Kompas.com and Detik.com online news articles, text preprocessing, Word2Vec feature building, clustering with DBSCAN, ontology construction and evaluation. The experiments carried out in this study were to choose the optimal parameter values from DBSCAN in forming word clusters for ontology construction. Overall, the implementation of ontology with Word2Vec and DBSCAN can do POS tagging with the highest accuracy value of 0.62, the highest precision value of 0.79, the highest recall value of 0.62, and the highest f1-score of 0.67.