Crowd-sourcing Web knowledge for metadata extraction

Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries Pub Date : 2014-09-08 DOI:10.1109/JCDL.2014.6970160

Zhaohui Wu, W. Huang, Chen Liang, C. Lee Giles

引用次数: 3

Abstract

We explore a new metadata extraction framework without human annotators with the ground truth harvested from Web. A new training sample is selected based on not only the uncertainty and representativeness in the unlabeled pool, but also on its availability and credibility in Web knowledge bases. We construct a dataset of 4329 books with valid metadata and evaluate our approach using 5 Web book databases as oracles. Empirical results demonstrate its effectiveness and efficiency.

查看原文本刊更多论文

用于元数据提取的众包网络知识

我们探索了一种新的元数据提取框架，无需人工注释器，使用从Web获取的基础事实。新的训练样本的选择不仅要考虑未标记池的不确定性和代表性，还要考虑其在Web知识库中的可用性和可信度。我们使用有效的元数据构建了4329本书的数据集，并使用5个Web图书数据库作为oracle来评估我们的方法。实证结果证明了该方法的有效性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries

自引率

0.00%

发文量