Document similarity estimation for sentiment analysis using neural network

2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS) Pub Date : 2013-06-16 DOI:10.1109/ICIS.2013.6607825

H. Yanagimoto, Mika Shimada, Akane Yoshimura

{"title":"Document similarity estimation for sentiment analysis using neural network","authors":"H. Yanagimoto, Mika Shimada, Akane Yoshimura","doi":"10.1109/ICIS.2013.6607825","DOIUrl":null,"url":null,"abstract":"It is important to classify documents according to their contents because of finding necessary documents efficiently. To achieve good classification document similarity estimation is one of key techniques since classification is executed based on the document similarity. In natural language processing bag-of-words model is used to extract features from documents and term occurrence frequency based value is used as a weight of each features. However, the term weight methodologies usually use predefined models and include some limitations. New approaches to construct feature vectors based on data distribution are desired to achieve high performance of natural language processing. These days many researchers pay attention to deep learning. Deep learning is a new approach to transform raw data to feature vectors using many unlabeled data. This characteristics is desirable to satisfy a previous need. In natural language processing a main aim is to construct a language model on a deep architecture neural network. In this paper we use a deep architecture neural network to estimate document similarity. To obtain good article similarity estimation we have to generate good article vectors that can represent all article characteristics. Hence, we use many stock market news to train the deep architecture neural network and generate article vectors with the trained neural network. And we calculate cosine similarity between labeled articles and discuss performance of the deep architecture neural network. In evaluation we do not focus on articles' contents but on their sentiment polarity. Hence, we discuss whether the proposed method classifies articles according to their sentiment polarity or not. We confirmed though the proposed method is an unsupervised learning approach, it achieves good performance in stock market news similarity estimation. The results show a deep architecture neural network can be applied to more natural language processing tasks.","PeriodicalId":345020,"journal":{"name":"2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2013.6607825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

It is important to classify documents according to their contents because of finding necessary documents efficiently. To achieve good classification document similarity estimation is one of key techniques since classification is executed based on the document similarity. In natural language processing bag-of-words model is used to extract features from documents and term occurrence frequency based value is used as a weight of each features. However, the term weight methodologies usually use predefined models and include some limitations. New approaches to construct feature vectors based on data distribution are desired to achieve high performance of natural language processing. These days many researchers pay attention to deep learning. Deep learning is a new approach to transform raw data to feature vectors using many unlabeled data. This characteristics is desirable to satisfy a previous need. In natural language processing a main aim is to construct a language model on a deep architecture neural network. In this paper we use a deep architecture neural network to estimate document similarity. To obtain good article similarity estimation we have to generate good article vectors that can represent all article characteristics. Hence, we use many stock market news to train the deep architecture neural network and generate article vectors with the trained neural network. And we calculate cosine similarity between labeled articles and discuss performance of the deep architecture neural network. In evaluation we do not focus on articles' contents but on their sentiment polarity. Hence, we discuss whether the proposed method classifies articles according to their sentiment polarity or not. We confirmed though the proposed method is an unsupervised learning approach, it achieves good performance in stock market news similarity estimation. The results show a deep architecture neural network can be applied to more natural language processing tasks.

查看原文本刊更多论文

基于神经网络的情感分析文档相似度估计

为了高效地找到需要的文件，对文件进行内容分类是非常重要的。文档相似度估计是实现分类的关键技术之一，因为分类是基于文档相似度进行的。在自然语言处理中，使用词袋模型从文档中提取特征，并使用基于词出现频率的值作为每个特征的权重。然而，术语权重方法通常使用预定义的模型并包含一些限制。为了实现高性能的自然语言处理，需要基于数据分布构造特征向量的新方法。如今，许多研究人员都在关注深度学习。深度学习是一种利用大量未标记数据将原始数据转换为特征向量的新方法。这种特性对于满足先前的需求是可取的。在自然语言处理中，一个主要目标是在深度结构神经网络上构造语言模型。在本文中，我们使用深度架构神经网络来估计文档相似度。为了获得良好的文章相似度估计，我们必须生成能够表示所有文章特征的好的文章向量。因此，我们使用大量的股票市场新闻来训练深度架构神经网络，并使用训练好的神经网络生成文章向量。我们计算了标记文章之间的余弦相似度，并讨论了深度架构神经网络的性能。在评价时，我们关注的不是文章的内容，而是文章的情感极性。因此，我们讨论了所提出的方法是否根据情感极性对文章进行分类。结果表明，该方法虽然是一种无监督学习方法，但在股市新闻相似度估计中取得了较好的效果。结果表明，深度结构神经网络可以应用于更多的自然语言处理任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS)

自引率

0.00%

发文量