Personalized word representations carrying personalized semantics learned from social network posts

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-10-29 DOI:10.1109/ASRU.2017.8268982

Zih-Wei Lin, Tzu-Wei Sung, Hung-yi Lee, Lin-Shan Lee

{"title":"Personalized word representations carrying personalized semantics learned from social network posts","authors":"Zih-Wei Lin, Tzu-Wei Sung, Hung-yi Lee, Lin-Shan Lee","doi":"10.1109/ASRU.2017.8268982","DOIUrl":null,"url":null,"abstract":"Distributed word representations have been shown to be very useful in various natural language processing (NLP) application tasks. These word vectors learned from huge corpora very often carry both semantic and syntactic information of words. However, it is well known that each individual user has his own language patterns because of different factors such as interested topics, friend groups, social activities, wording habits, etc., which may imply some kind of personalized semantics. With such personalized semantics, the same word may imply slightly differently for different users. For example, the word “Cappuccino” may imply “Leisure”, “Joy”, “Excellent” for a user enjoying coffee, by only a kind of drink for someone else. Such personalized semantics of course cannot be carried by the standard universal word vectors trained with huge corpora produced by many people. In this paper, we propose a framework to train different personalized word vectors for different users based on the very successful continuous skip-gram model using the social network data posted by many individual users. In this framework, universal background word vectors are first learned from the background corpora, and then adapted by the personalized corpus for each individual user to learn the personalized word vectors. We use two application tasks to evaluate the quality of the personalized word vectors obtained in this way, the user prediction task and the sentence completion task. These personalized word vectors were shown to carry some personalized semantics and offer improved performance on these two evaluation tasks.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Distributed word representations have been shown to be very useful in various natural language processing (NLP) application tasks. These word vectors learned from huge corpora very often carry both semantic and syntactic information of words. However, it is well known that each individual user has his own language patterns because of different factors such as interested topics, friend groups, social activities, wording habits, etc., which may imply some kind of personalized semantics. With such personalized semantics, the same word may imply slightly differently for different users. For example, the word “Cappuccino” may imply “Leisure”, “Joy”, “Excellent” for a user enjoying coffee, by only a kind of drink for someone else. Such personalized semantics of course cannot be carried by the standard universal word vectors trained with huge corpora produced by many people. In this paper, we propose a framework to train different personalized word vectors for different users based on the very successful continuous skip-gram model using the social network data posted by many individual users. In this framework, universal background word vectors are first learned from the background corpora, and then adapted by the personalized corpus for each individual user to learn the personalized word vectors. We use two application tasks to evaluate the quality of the personalized word vectors obtained in this way, the user prediction task and the sentence completion task. These personalized word vectors were shown to carry some personalized semantics and offer improved performance on these two evaluation tasks.

查看原文本刊更多论文

从社交网络帖子中学习个性化语义的个性化词语表示

分布式词表示已被证明在各种自然语言处理(NLP)应用任务中非常有用。这些从海量语料库中习得的词向量往往同时包含词的语义和句法信息。然而，众所周知，由于兴趣话题、朋友群体、社交活动、措辞习惯等因素的不同，每个用户都有自己的语言模式，这可能隐含着某种个性化的语义。使用这种个性化的语义，相同的单词对于不同的用户可能意味着略有不同。例如，“卡布奇诺”这个词对一个喜欢喝咖啡的用户来说可能意味着“休闲”、“快乐”、“优秀”，而对另一个人来说可能只是一种饮料。这种个性化的语义当然不能由许多人制作的庞大语料库训练出来的标准通用词向量来承载。在本文中，我们提出了一个框架，基于非常成功的连续跳跃图模型，利用许多个人用户发布的社交网络数据，为不同的用户训练不同的个性化词向量。在该框架中，首先从背景语料库中学习通用背景词向量，然后由个性化语料库根据每个用户学习个性化的词向量。我们使用用户预测任务和句子补全任务两个应用任务来评估以这种方式获得的个性化词向量的质量。这些个性化的词向量被证明带有一些个性化的语义，并在这两个评估任务上提供了改进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量