{"title":"越南语情感分类问题表征学习的双通道模型","authors":"Q. Nguyen, Ly Vu, Quang-Uy Nguyen","doi":"10.15625/1813-9663/36/4/14829","DOIUrl":null,"url":null,"abstract":"Sentiment classification (SC) aims to determine whether a document conveys a positive or negative opinion. Due to the rapid development of the digital world, SC has become an important research topic that affects to many aspects of our life. In SC based on machine learning, the representation of the document strongly influences on its accuracy. Word embedding (WE)-based techniques, i.e., Word2vec techniques, are proved to be beneficial techniques to the SC problem. However, Word2vec is often not enough to represent the semantic of Vietnamese documents due to the complexity of semantics and syntactic structure. In this paper, we propose a new representation learning model called a two-channel vector to learn a higher-level feature of a document for SC. Our model uses two neural networks to learn both the semantic feature and the syntactic feature. The semantic feature is learnt using Word2vec and the syntactic feature is learnt through Parts of Speech tag (POS). Two features are then combined and input to a Softmax function to make the final classification. We carry out intensive experiments on 4 recent Vietnamese sentiment datasets to evaluate the performance of the proposed architecture. The experimental results demonstrate that the proposed model can enhance the accuracy of SC problems compared to two single models and three state-of-the-art ensemble methods.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"5 1","pages":"305-323"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A TWO-CHANNEL MODEL FOR REPRESENTATION LEARNING IN VIETNAMESE SENTIMENT CLASSIFICATION PROBLEM\",\"authors\":\"Q. Nguyen, Ly Vu, Quang-Uy Nguyen\",\"doi\":\"10.15625/1813-9663/36/4/14829\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment classification (SC) aims to determine whether a document conveys a positive or negative opinion. Due to the rapid development of the digital world, SC has become an important research topic that affects to many aspects of our life. In SC based on machine learning, the representation of the document strongly influences on its accuracy. Word embedding (WE)-based techniques, i.e., Word2vec techniques, are proved to be beneficial techniques to the SC problem. However, Word2vec is often not enough to represent the semantic of Vietnamese documents due to the complexity of semantics and syntactic structure. In this paper, we propose a new representation learning model called a two-channel vector to learn a higher-level feature of a document for SC. Our model uses two neural networks to learn both the semantic feature and the syntactic feature. The semantic feature is learnt using Word2vec and the syntactic feature is learnt through Parts of Speech tag (POS). Two features are then combined and input to a Softmax function to make the final classification. We carry out intensive experiments on 4 recent Vietnamese sentiment datasets to evaluate the performance of the proposed architecture. The experimental results demonstrate that the proposed model can enhance the accuracy of SC problems compared to two single models and three state-of-the-art ensemble methods.\",\"PeriodicalId\":15444,\"journal\":{\"name\":\"Journal of Computer Science and Cybernetics\",\"volume\":\"5 1\",\"pages\":\"305-323\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Science and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15625/1813-9663/36/4/14829\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15625/1813-9663/36/4/14829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A TWO-CHANNEL MODEL FOR REPRESENTATION LEARNING IN VIETNAMESE SENTIMENT CLASSIFICATION PROBLEM
Sentiment classification (SC) aims to determine whether a document conveys a positive or negative opinion. Due to the rapid development of the digital world, SC has become an important research topic that affects to many aspects of our life. In SC based on machine learning, the representation of the document strongly influences on its accuracy. Word embedding (WE)-based techniques, i.e., Word2vec techniques, are proved to be beneficial techniques to the SC problem. However, Word2vec is often not enough to represent the semantic of Vietnamese documents due to the complexity of semantics and syntactic structure. In this paper, we propose a new representation learning model called a two-channel vector to learn a higher-level feature of a document for SC. Our model uses two neural networks to learn both the semantic feature and the syntactic feature. The semantic feature is learnt using Word2vec and the syntactic feature is learnt through Parts of Speech tag (POS). Two features are then combined and input to a Softmax function to make the final classification. We carry out intensive experiments on 4 recent Vietnamese sentiment datasets to evaluate the performance of the proposed architecture. The experimental results demonstrate that the proposed model can enhance the accuracy of SC problems compared to two single models and three state-of-the-art ensemble methods.