Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata Attributes

Chenxu Niu, Wei Zhang, S. Byna, Yong Chen
{"title":"Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata Attributes","authors":"Chenxu Niu, Wei Zhang, S. Byna, Yong Chen","doi":"10.1109/HPEC55821.2022.9926389","DOIUrl":null,"url":null,"abstract":"Distributed representation methods for words have been developed for years, and numerous methods exist, such as word2vec, GloVe, and fastText. However, they are not designed for key-value pairs, which is an important data pattern and widely used in many scenarios. For example, metadata attributes of scientific files consist of a collection of key-value pairs. In this research, we propose kv2vec, a method that captures relationships between keys and values and represents key-value pairs in dense vectors. The fundamental idea of the kv2vec method is utilizing recurrent neural networks (RNNs) with long short-term memory (LSTM) hidden units to convert each key-value pair to a distributed vector representation. This new method overcomes the weaknesses of existing embedding models for representing key-value pairs as vectors. Moreover, it can be integrated into dataset search solutions through querying metadata attributes for self-describing file formats that are widely used in HPC systems. We evaluate the kv2vec method with multiple real-world datasets, and the results show that kv2vec outperforms existing models.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC55821.2022.9926389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Distributed representation methods for words have been developed for years, and numerous methods exist, such as word2vec, GloVe, and fastText. However, they are not designed for key-value pairs, which is an important data pattern and widely used in many scenarios. For example, metadata attributes of scientific files consist of a collection of key-value pairs. In this research, we propose kv2vec, a method that captures relationships between keys and values and represents key-value pairs in dense vectors. The fundamental idea of the kv2vec method is utilizing recurrent neural networks (RNNs) with long short-term memory (LSTM) hidden units to convert each key-value pair to a distributed vector representation. This new method overcomes the weaknesses of existing embedding models for representing key-value pairs as vectors. Moreover, it can be integrated into dataset search solutions through querying metadata attributes for self-describing file formats that are widely used in HPC systems. We evaluate the kv2vec method with multiple real-world datasets, and the results show that kv2vec outperforms existing models.
Kv2vec:元数据属性中键值对的分布式表示方法
单词的分布式表示方法已经开发了多年,并且存在许多方法,例如word2vec、GloVe和fastText。然而,它们不是为键值对设计的,而键值对是一种重要的数据模式,在许多场景中被广泛使用。例如,科学文件的元数据属性由一组键值对组成。在这项研究中,我们提出了kv2vec,一种捕获键和值之间关系并在密集向量中表示键值对的方法。kv2vec方法的基本思想是利用具有长短期记忆(LSTM)隐藏单元的递归神经网络(rnn)将每个键值对转换为分布式向量表示。该方法克服了现有嵌入模型将键值对表示为向量的缺点。此外,它还可以通过查询在HPC系统中广泛使用的自描述文件格式的元数据属性,集成到数据集搜索解决方案中。我们用多个真实数据集对kv2vec方法进行了评估,结果表明kv2vec优于现有模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信