{"title":"Automatic analysis of caregiver input and child production","authors":"Gyu-Ho Shin","doi":"10.1075/kl.20002.shi","DOIUrl":null,"url":null,"abstract":"\n The present study explores the applicability of Natural Language Processing (NLP) techniques to investigate child\n corpora in Korean. We employ caregiver input and child production data in the CHILDES database, currently the largest and\n open-access Korean child corpus data, and apply NLP techniques to the data in two ways: automatic Part-of-Speech tagging by\n adapting a machine learning algorithm, and (semi-)automatic extraction of constructional patterns expressing a transitive event\n (active transitive and suffixal passive). As the first empirical report on NLP-assisted analysis of Korean child corpora, this\n study is expected to reveal its advantages and drawbacks, thereby opening the window to furthering corpus-mediated research on\n child language development in Korean. Implications of this study’s findings will also contribute to research practice regarding\n developmental studies on Korean through child corpora, ensuring the reproducibility of procedures and results, which is often\n lacking in previous corpus-based research on child language development in Korean.","PeriodicalId":29725,"journal":{"name":"Korean Linguistics","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/kl.20002.shi","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 2
Abstract
The present study explores the applicability of Natural Language Processing (NLP) techniques to investigate child
corpora in Korean. We employ caregiver input and child production data in the CHILDES database, currently the largest and
open-access Korean child corpus data, and apply NLP techniques to the data in two ways: automatic Part-of-Speech tagging by
adapting a machine learning algorithm, and (semi-)automatic extraction of constructional patterns expressing a transitive event
(active transitive and suffixal passive). As the first empirical report on NLP-assisted analysis of Korean child corpora, this
study is expected to reveal its advantages and drawbacks, thereby opening the window to furthering corpus-mediated research on
child language development in Korean. Implications of this study’s findings will also contribute to research practice regarding
developmental studies on Korean through child corpora, ensuring the reproducibility of procedures and results, which is often
lacking in previous corpus-based research on child language development in Korean.