Crowdsourcing the acquisition of natural language corpora: Methods and observations

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI:10.1109/SLT.2012.6424200

William Yang Wang, D. Bohus, Ece Kamar, E. Horvitz

引用次数: 58

Abstract

We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.

查看原文本刊更多论文

自然语言语料库的众包获取:方法与观察

我们研究了使用众包方法获取用于自然语言处理系统的语言语料库的机会。具体来说，我们实证研究了三种方法来引出对应于给定语义形式的自然语言句子。这些方法通过句子、场景和基于列表的描述向人群工作者传递框架语义。我们讨论了众包过程的各种性能度量，并分析了收集语言的语义正确性、自然性和偏差。我们强调了应用这些方法获取用于自然语言处理应用的语料库的研究挑战和方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量