Semantically readable distributed representation learning for social media mining

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics Pub Date : 2017-08-23 DOI:10.1145/3106426.3106521

Ikuo Keshi, Yumiko Suzuki, Koichiro Yoshino, Satoshi Nakamura

{"title":"Semantically readable distributed representation learning for social media mining","authors":"Ikuo Keshi, Yumiko Suzuki, Koichiro Yoshino, Satoshi Nakamura","doi":"10.1145/3106426.3106521","DOIUrl":null,"url":null,"abstract":"The problem with distributed representations generated by neural networks is that the meaning of the features is difficult to understand. We propose a new method that gives a specific meaning to each node of a hidden layer by introducing a manually created word semantic vector dictionary into the initial weights and by using paragraph vector models. Our experimental results demonstrated that weights obtained based on learning and weights based on the dictionary are more strongly correlated in a closed test and more weakly correlated in an open test, compared with the results of a control test. Additionally, we found that the learned vector are better than the performance of the existing paragraph vector in the evaluation of the sentiment analysis task. Finally, we determined the readability of document embedding in a user test. The definition of readability in this paper is that people can understand the meaning of large weighted features of distributed representations. A total of 52.4% of the top five weighted hidden nodes were related to tweets where one of the paragraph vector models learned the document embedding. Because each hidden node maintains a specific meaning, the proposed method succeeds in improving readability.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

The problem with distributed representations generated by neural networks is that the meaning of the features is difficult to understand. We propose a new method that gives a specific meaning to each node of a hidden layer by introducing a manually created word semantic vector dictionary into the initial weights and by using paragraph vector models. Our experimental results demonstrated that weights obtained based on learning and weights based on the dictionary are more strongly correlated in a closed test and more weakly correlated in an open test, compared with the results of a control test. Additionally, we found that the learned vector are better than the performance of the existing paragraph vector in the evaluation of the sentiment analysis task. Finally, we determined the readability of document embedding in a user test. The definition of readability in this paper is that people can understand the meaning of large weighted features of distributed representations. A total of 52.4% of the top five weighted hidden nodes were related to tweets where one of the paragraph vector models learned the document embedding. Because each hidden node maintains a specific meaning, the proposed method succeeds in improving readability.

查看原文本刊更多论文

面向社交媒体挖掘的语义可读分布式表示学习

由神经网络生成的分布式表示的问题是特征的含义难以理解。我们提出了一种新的方法，通过在初始权重中引入人工创建的词语义向量字典，并使用段落向量模型，为隐藏层的每个节点赋予特定的含义。我们的实验结果表明，与对照测试的结果相比，基于学习获得的权重和基于字典获得的权重在封闭测试中相关性更强，在开放测试中相关性更弱。此外，我们发现学习的向量在情感分析任务的评价中优于现有的段落向量。最后，我们在用户测试中确定了文档嵌入的可读性。本文对可读性的定义是人们能够理解分布式表示的大加权特征的含义。前5个加权隐藏节点中有52.4%与tweet相关，其中一个段落向量模型学习了文档嵌入。由于每个隐藏节点都保持一个特定的含义，因此该方法成功地提高了可读性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

自引率

0.00%

发文量