词嵌入方法稳定吗?我们应该关注它吗?

Angana Borah, M. Barman, Amit Awekar
{"title":"词嵌入方法稳定吗?我们应该关注它吗?","authors":"Angana Borah, M. Barman, Amit Awekar","doi":"10.1145/3465336.3475098","DOIUrl":null,"url":null,"abstract":"A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on two downstream tasks: Clustering and Fairness evaluation. Our experiments indicate that amongst the three WEMs, fastText is the most stable, followed by GloVe and Word2Vec.","PeriodicalId":325072,"journal":{"name":"Proceedings of the 32nd ACM Conference on Hypertext and Social Media","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Are Word Embedding Methods Stable and Should We Care About It?\",\"authors\":\"Angana Borah, M. Barman, Amit Awekar\",\"doi\":\"10.1145/3465336.3475098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on two downstream tasks: Clustering and Fairness evaluation. Our experiments indicate that amongst the three WEMs, fastText is the most stable, followed by GloVe and Word2Vec.\",\"PeriodicalId\":325072,\"journal\":{\"name\":\"Proceedings of the 32nd ACM Conference on Hypertext and Social Media\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 32nd ACM Conference on Hypertext and Social Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3465336.3475098\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 32nd ACM Conference on Hypertext and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3465336.3475098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

如果表示学习方法在多次运行中始终如一地生成给定数据的类似表示,则认为该方法是稳定的。词嵌入方法(WEMs)是一类表示学习方法,它为给定文本数据中的每个词生成密集向量表示。本文的中心思想是利用基于词相似度的内在评价来探讨微加工产品的稳定性度量。我们试验了三种流行的WEMs: Word2Vec、GloVe和fastText。对于稳定性测量,我们研究了训练这些模型所涉及的五个参数的影响。我们使用来自不同领域的四个真实世界数据集进行实验:维基百科、新闻、歌词和欧洲议会会议。我们还观察了WEM稳定性对两个下游任务:聚类和公平性评估的影响。我们的实验表明,在三个WEMs中,fastText是最稳定的,其次是GloVe和Word2Vec。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Are Word Embedding Methods Stable and Should We Care About It?
A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on two downstream tasks: Clustering and Fairness evaluation. Our experiments indicate that amongst the three WEMs, fastText is the most stable, followed by GloVe and Word2Vec.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信