FacetE

Proceedings of the Workshop on Testing Database Systems Pub Date : 2020-05-22 DOI:10.1145/3395032.3395325

Michael Günther, Paul Sikorski, M. Thiele, W. Lehner

引用次数: 1

Abstract

Today's natural language processing and information retrieval systems heavily depend on word embedding techniques to represent text values. However, given a specific task deciding for a word embedding dataset is not trivial. Current word embedding evaluation methods mostly provide only a one-dimensional quality measure, which does not express how knowledge from different domains is represented in the word embedding models. To overcome this limitation, we provide a new evaluation data set called FacetE derived from 125M Web tables, enabling domain-sensitive evaluation. We show that FacetE can effectively be used to evaluate word embedding models. The evaluation of common general-purpose word embedding models suggests that there is currently no best word embedding for every domain.

查看原文本刊更多论文

很有情趣的

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Workshop on Testing Database Systems

自引率

0.00%

发文量