越南语情感分类问题表征学习的双通道模型

Journal of Computer Science and Cybernetics Pub Date : 2020-12-14 DOI:10.15625/1813-9663/36/4/14829

Q. Nguyen, Ly Vu, Quang-Uy Nguyen

{"title":"越南语情感分类问题表征学习的双通道模型","authors":"Q. Nguyen, Ly Vu, Quang-Uy Nguyen","doi":"10.15625/1813-9663/36/4/14829","DOIUrl":null,"url":null,"abstract":"Sentiment classification (SC) aims to determine whether a document conveys a positive or negative opinion. Due to the rapid development of the digital world, SC has become an important research topic that affects to many aspects of our life. In SC based on machine learning, the representation of the document strongly influences on its accuracy. Word embedding (WE)-based techniques, i.e., Word2vec techniques, are proved to be beneficial techniques to the SC problem. However, Word2vec is often not enough to represent the semantic of Vietnamese documents due to the complexity of semantics and syntactic structure. In this paper, we propose a new representation learning model called a two-channel vector to learn a higher-level feature of a document for SC. Our model uses two neural networks to learn both the semantic feature and the syntactic feature. The semantic feature is learnt using Word2vec and the syntactic feature is learnt through Parts of Speech tag (POS). Two features are then combined and input to a Softmax function to make the final classification. We carry out intensive experiments on 4 recent Vietnamese sentiment datasets to evaluate the performance of the proposed architecture. The experimental results demonstrate that the proposed model can enhance the accuracy of SC problems compared to two single models and three state-of-the-art ensemble methods.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"5 1","pages":"305-323"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A TWO-CHANNEL MODEL FOR REPRESENTATION LEARNING IN VIETNAMESE SENTIMENT CLASSIFICATION PROBLEM\",\"authors\":\"Q. Nguyen, Ly Vu, Quang-Uy Nguyen\",\"doi\":\"10.15625/1813-9663/36/4/14829\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment classification (SC) aims to determine whether a document conveys a positive or negative opinion. Due to the rapid development of the digital world, SC has become an important research topic that affects to many aspects of our life. In SC based on machine learning, the representation of the document strongly influences on its accuracy. Word embedding (WE)-based techniques, i.e., Word2vec techniques, are proved to be beneficial techniques to the SC problem. However, Word2vec is often not enough to represent the semantic of Vietnamese documents due to the complexity of semantics and syntactic structure. In this paper, we propose a new representation learning model called a two-channel vector to learn a higher-level feature of a document for SC. Our model uses two neural networks to learn both the semantic feature and the syntactic feature. The semantic feature is learnt using Word2vec and the syntactic feature is learnt through Parts of Speech tag (POS). Two features are then combined and input to a Softmax function to make the final classification. We carry out intensive experiments on 4 recent Vietnamese sentiment datasets to evaluate the performance of the proposed architecture. The experimental results demonstrate that the proposed model can enhance the accuracy of SC problems compared to two single models and three state-of-the-art ensemble methods.\",\"PeriodicalId\":15444,\"journal\":{\"name\":\"Journal of Computer Science and Cybernetics\",\"volume\":\"5 1\",\"pages\":\"305-323\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Science and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15625/1813-9663/36/4/14829\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15625/1813-9663/36/4/14829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

情感分类(SC)的目的是确定一个文件传达的是积极的还是消极的意见。随着数字世界的飞速发展，供应链已成为影响我们生活方方面面的重要研究课题。在基于机器学习的SC中，文档的表示对其准确性有很大影响。基于词嵌入(WE)的技术，即Word2vec技术，被证明是解决SC问题的有益技术。然而，由于语义和句法结构的复杂性，Word2vec往往不足以表示越南语文档的语义。在本文中，我们提出了一种新的表征学习模型，称为双通道向量，用于学习SC文档的高级特征。我们的模型使用两个神经网络来学习语义特征和句法特征。使用Word2vec学习语义特征，通过词性标签(POS)学习句法特征。然后将两个特征组合并输入到Softmax函数中以进行最终分类。我们在4个最近的越南情感数据集上进行了密集的实验，以评估所提出架构的性能。实验结果表明，与两种单一模型和三种最先进的集成方法相比，该模型可以提高SC问题的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A TWO-CHANNEL MODEL FOR REPRESENTATION LEARNING IN VIETNAMESE SENTIMENT CLASSIFICATION PROBLEM

Sentiment classification (SC) aims to determine whether a document conveys a positive or negative opinion. Due to the rapid development of the digital world, SC has become an important research topic that affects to many aspects of our life. In SC based on machine learning, the representation of the document strongly influences on its accuracy. Word embedding (WE)-based techniques, i.e., Word2vec techniques, are proved to be beneficial techniques to the SC problem. However, Word2vec is often not enough to represent the semantic of Vietnamese documents due to the complexity of semantics and syntactic structure. In this paper, we propose a new representation learning model called a two-channel vector to learn a higher-level feature of a document for SC. Our model uses two neural networks to learn both the semantic feature and the syntactic feature. The semantic feature is learnt using Word2vec and the syntactic feature is learnt through Parts of Speech tag (POS). Two features are then combined and input to a Softmax function to make the final classification. We carry out intensive experiments on 4 recent Vietnamese sentiment datasets to evaluate the performance of the proposed architecture. The experimental results demonstrate that the proposed model can enhance the accuracy of SC problems compared to two single models and three state-of-the-art ensemble methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Computer Science and Cybernetics

自引率

0.00%

发文量