Automatic analysis of caregiver input and child production

IF 0.5 0 LANGUAGE & LINGUISTICS

Korean Linguistics Pub Date : 2022-09-30 DOI:10.1075/kl.20002.shi

Gyu-Ho Shin

引用次数: 2

Abstract

The present study explores the applicability of Natural Language Processing (NLP) techniques to investigate child corpora in Korean. We employ caregiver input and child production data in the CHILDES database, currently the largest and open-access Korean child corpus data, and apply NLP techniques to the data in two ways: automatic Part-of-Speech tagging by adapting a machine learning algorithm, and (semi-)automatic extraction of constructional patterns expressing a transitive event (active transitive and suffixal passive). As the first empirical report on NLP-assisted analysis of Korean child corpora, this study is expected to reveal its advantages and drawbacks, thereby opening the window to furthering corpus-mediated research on child language development in Korean. Implications of this study’s findings will also contribute to research practice regarding developmental studies on Korean through child corpora, ensuring the reproducibility of procedures and results, which is often lacking in previous corpus-based research on child language development in Korean.

查看原文本刊更多论文

自动分析照顾者的输入和孩子的生产

本研究探讨自然语言处理(NLP)技术在韩语儿童语料库研究中的适用性。我们在CHILDES数据库(目前最大的开放访问韩语儿童语料库数据)中使用照顾者输入和儿童生产数据，并以两种方式对数据应用NLP技术:通过采用机器学习算法自动标记词性，以及(半)自动提取表达及物事件的结构模式(主动及物和后缀被动)。本研究是首个使用nlp辅助分析韩语儿童语料库的实证报告，希望能够揭示其优势和不足，从而为进一步开展语料库介导的韩语儿童语言发展研究打开一扇窗。本研究结果的启示也将有助于通过儿童语料库进行韩语发展研究的研究实践，确保程序和结果的可重复性，这在以往基于语料库的韩语儿童语言发展研究中经常缺乏。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Korean Linguistics

CiteScore

0.30

自引率

0.00%

发文量