Automatic parsing of parental verbal input.

Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc Pub Date : 2004-02-01 DOI:10.3758/bf03195557

Kenji Sagae, Brian MacWhinney, Alon Lavie

引用次数: 0

Abstract

To evaluate theoretical proposals regarding the course of child language acquisition, researchers often need to rely on the processing of large numbers of syntactically parsed utterances, both from children and from their parents. Because it is so difficult to do this by hand, there are currently no parsed corpora of child language input data. To automate this process, we developed a system that combined the MOR tagger, a rule-based parser, and statistical disambiguation techniques. The resultant system obtained nearly 80% correct parses for the sentences spoken to children. To achieve this level, we had to construct a particular processing sequence that minimizes problems caused by the coverage/ambiguity tradeoff in parser design. These procedures are particularly appropriate for use with the CHILDES database, an international corpus of transcripts. The data and programs are now freely available over the Internet.

查看原文本刊更多论文

自动解析父母的口头输入。

为了评估关于儿童语言习得过程的理论建议，研究人员经常需要依赖于对大量语法分析过的话语的处理，这些话语既有来自儿童的，也有来自父母的。由于手工完成这项工作非常困难，目前还没有解析过的子语言输入数据语料库。为了使这个过程自动化，我们开发了一个系统，该系统结合了MOR标记器、基于规则的解析器和统计消歧技术。由此产生的系统对儿童所说的句子获得了近80%的正确解析。为了达到这个级别，我们必须构造一个特殊的处理序列，以最小化解析器设计中覆盖率/模糊性权衡所引起的问题。这些程序特别适合与国际抄本语料库CHILDES数据库一起使用。这些数据和程序现在可以在互联网上免费获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc

自引率

0.00%

发文量