情感分类:基于特征选择的方法与深度学习

2017 IEEE International Conference on Computer and Information Technology (CIT) Pub Date : 2017-08-01 DOI:10.1109/CIT.2017.53

A. Uysal, Y. Murphey

{"title":"情感分类:基于特征选择的方法与深度学习","authors":"A. Uysal, Y. Murphey","doi":"10.1109/CIT.2017.53","DOIUrl":null,"url":null,"abstract":"Classification of text documents is commonly carried out using various models of bag-of-words that are generated using feature selection methods. In these models, selected features are used as input to well-known classifiers such as Support Vector Machines (SVM) and neural networks. In recent years, a technique called word embeddings has been developed for text mining and, deep learning models using word embeddings have become popular for sentiment classification. However, there is no extensive study has been conducted to compare these approaches for sentiment classification. In this paper, we present an in-depth comparative study on these two types of approaches, feature selection based approaches and and deep learning models for document-level sentiment classification. Experiments were conducted using four datasets with varying characteristics. In order to investigate the effectiveness of word embeddings features, feature sets including combination of selected bag-of-words features and averaged word embedding features were used in sentiment classification. For analyzing deep learning models, we implemented three different deep learning architecture, convolutional neural network, long short-term memory network, and long-term recurrent convolutional network. Our experimental results show that that deep learning models performed better on three out of the four datasets, a combination of selected bag-of-words features and averaged word embedding features gave the best performance on one dataset. In addition, we will show that a deep learning model initialized with either one-hot vectors or fine-tuned word embeddings performed better than the model initialized using than word embeddings without tuning.","PeriodicalId":378423,"journal":{"name":"2017 IEEE International Conference on Computer and Information Technology (CIT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":"{\"title\":\"Sentiment Classification: Feature Selection Based Approaches Versus Deep Learning\",\"authors\":\"A. Uysal, Y. Murphey\",\"doi\":\"10.1109/CIT.2017.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification of text documents is commonly carried out using various models of bag-of-words that are generated using feature selection methods. In these models, selected features are used as input to well-known classifiers such as Support Vector Machines (SVM) and neural networks. In recent years, a technique called word embeddings has been developed for text mining and, deep learning models using word embeddings have become popular for sentiment classification. However, there is no extensive study has been conducted to compare these approaches for sentiment classification. In this paper, we present an in-depth comparative study on these two types of approaches, feature selection based approaches and and deep learning models for document-level sentiment classification. Experiments were conducted using four datasets with varying characteristics. In order to investigate the effectiveness of word embeddings features, feature sets including combination of selected bag-of-words features and averaged word embedding features were used in sentiment classification. For analyzing deep learning models, we implemented three different deep learning architecture, convolutional neural network, long short-term memory network, and long-term recurrent convolutional network. Our experimental results show that that deep learning models performed better on three out of the four datasets, a combination of selected bag-of-words features and averaged word embedding features gave the best performance on one dataset. In addition, we will show that a deep learning model initialized with either one-hot vectors or fine-tuned word embeddings performed better than the model initialized using than word embeddings without tuning.\",\"PeriodicalId\":378423,\"journal\":{\"name\":\"2017 IEEE International Conference on Computer and Information Technology (CIT)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"36\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Computer and Information Technology (CIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIT.2017.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Computer and Information Technology (CIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIT.2017.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 36

摘要

文本文档的分类通常使用使用特征选择方法生成的各种词袋模型进行。在这些模型中，选择的特征被用作众所周知的分类器(如支持向量机(SVM)和神经网络)的输入。近年来，一种称为词嵌入的技术已经被开发出来用于文本挖掘，使用词嵌入的深度学习模型已经成为情感分类的流行技术。然而，还没有广泛的研究进行比较这些方法的情绪分类。在本文中，我们对这两种类型的方法进行了深入的比较研究，基于特征选择的方法和深度学习模型用于文档级情感分类。实验使用四个不同特征的数据集进行。为了考察词嵌入特征的有效性，在情感分类中使用了包括选择词袋特征和平均词嵌入特征组合的特征集。为了分析深度学习模型，我们实现了三种不同的深度学习架构:卷积神经网络、长短期记忆网络和长期循环卷积网络。我们的实验结果表明，深度学习模型在四个数据集中的三个上表现更好，选择的词袋特征和平均词嵌入特征的组合在一个数据集上表现最好。此外，我们将证明，使用单热向量或微调词嵌入初始化的深度学习模型比使用未调优词嵌入初始化的模型表现得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sentiment Classification: Feature Selection Based Approaches Versus Deep Learning

Classification of text documents is commonly carried out using various models of bag-of-words that are generated using feature selection methods. In these models, selected features are used as input to well-known classifiers such as Support Vector Machines (SVM) and neural networks. In recent years, a technique called word embeddings has been developed for text mining and, deep learning models using word embeddings have become popular for sentiment classification. However, there is no extensive study has been conducted to compare these approaches for sentiment classification. In this paper, we present an in-depth comparative study on these two types of approaches, feature selection based approaches and and deep learning models for document-level sentiment classification. Experiments were conducted using four datasets with varying characteristics. In order to investigate the effectiveness of word embeddings features, feature sets including combination of selected bag-of-words features and averaged word embedding features were used in sentiment classification. For analyzing deep learning models, we implemented three different deep learning architecture, convolutional neural network, long short-term memory network, and long-term recurrent convolutional network. Our experimental results show that that deep learning models performed better on three out of the four datasets, a combination of selected bag-of-words features and averaged word embedding features gave the best performance on one dataset. In addition, we will show that a deep learning model initialized with either one-hot vectors or fine-tuned word embeddings performed better than the model initialized using than word embeddings without tuning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Computer and Information Technology (CIT)

自引率

0.00%

发文量