Alex Lưu, Pasha Koval, Sophia A. Malamud, Irina Y. Dubinina
{"title":"创建一个大规模的俄语儿童双语和儿童导向语音的音频对齐解析语料库:挑战,解决方案和研究意义","authors":"Alex Lưu, Pasha Koval, Sophia A. Malamud, Irina Y. Dubinina","doi":"10.1590/2176-4573e55831","DOIUrl":null,"url":null,"abstract":"ABSTRACT The BiRCh Project (The Corpus of Bilingual Russian Child Speech) involves collecting a longitudinal audio corpus of Russian spoken by children and their families in Russia, Ukraine, Germany, the U.S., and Canada. We are building a large-scale corpus based on a subset of this data, the “Parsed and Audio-aligned Corpus of Bilingual Russian Child and Child-directed Speech (BiRCh)” with two basic components: (1) 1-million-word transcripts which are time-aligned with the audio speech signal and fully textsearchable, and (2) a 500K-word morphologically annotated and parsed portion of the transcripts, also audio-aligned. We are using this corpus to investigate various phenomena in the linguistic input and the developmental trajectory of heritage bilinguals, e.g., case, gender, passives, impersonals, politeness markers, disfluencies, and discourse markers. This article focuses on the challenges and solutions of the BiRCh development and the implications for research on the richly annotated data provided by the corpus.","PeriodicalId":37906,"journal":{"name":"Bakhtiniana","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Creating a Large-Scale Audio-Aligned Parsed Corpus of Bilingual Russian Child and Child-Directed Speech (BiRCh): Challenges, Solutions, and Implications for Research\",\"authors\":\"Alex Lưu, Pasha Koval, Sophia A. Malamud, Irina Y. Dubinina\",\"doi\":\"10.1590/2176-4573e55831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT The BiRCh Project (The Corpus of Bilingual Russian Child Speech) involves collecting a longitudinal audio corpus of Russian spoken by children and their families in Russia, Ukraine, Germany, the U.S., and Canada. We are building a large-scale corpus based on a subset of this data, the “Parsed and Audio-aligned Corpus of Bilingual Russian Child and Child-directed Speech (BiRCh)” with two basic components: (1) 1-million-word transcripts which are time-aligned with the audio speech signal and fully textsearchable, and (2) a 500K-word morphologically annotated and parsed portion of the transcripts, also audio-aligned. We are using this corpus to investigate various phenomena in the linguistic input and the developmental trajectory of heritage bilinguals, e.g., case, gender, passives, impersonals, politeness markers, disfluencies, and discourse markers. This article focuses on the challenges and solutions of the BiRCh development and the implications for research on the richly annotated data provided by the corpus.\",\"PeriodicalId\":37906,\"journal\":{\"name\":\"Bakhtiniana\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bakhtiniana\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1590/2176-4573e55831\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bakhtiniana","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1590/2176-4573e55831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
Creating a Large-Scale Audio-Aligned Parsed Corpus of Bilingual Russian Child and Child-Directed Speech (BiRCh): Challenges, Solutions, and Implications for Research
ABSTRACT The BiRCh Project (The Corpus of Bilingual Russian Child Speech) involves collecting a longitudinal audio corpus of Russian spoken by children and their families in Russia, Ukraine, Germany, the U.S., and Canada. We are building a large-scale corpus based on a subset of this data, the “Parsed and Audio-aligned Corpus of Bilingual Russian Child and Child-directed Speech (BiRCh)” with two basic components: (1) 1-million-word transcripts which are time-aligned with the audio speech signal and fully textsearchable, and (2) a 500K-word morphologically annotated and parsed portion of the transcripts, also audio-aligned. We are using this corpus to investigate various phenomena in the linguistic input and the developmental trajectory of heritage bilinguals, e.g., case, gender, passives, impersonals, politeness markers, disfluencies, and discourse markers. This article focuses on the challenges and solutions of the BiRCh development and the implications for research on the richly annotated data provided by the corpus.
BakhtinianaArts and Humanities-Literature and Literary Theory
CiteScore
0.20
自引率
0.00%
发文量
69
审稿时长
12 weeks
期刊介绍:
Bakhtiniana. Revista de Estudos do Discurso[Bakhtiniana. Journal of Discourse Studies], in electronic format, was created in 2008 by Programa de Estudos Pós-Graduados em Linguística Aplicada e Estudos da Linguagem [the Applied Linguistics and Language Studies Graduate Program] of Pontifícia Universidade Católica de São Paulo/LAEL-PUCSP and by the members of Linguagem, identidade e memória [Language, Identity and Memory] Research Group/CNPq (National Council for Scientific and Technological Development). The journal''s mission is to promote and to publicize research on discourse, mainly on dialogic studies. From 2019 on, it will publish an issue every three months. Each issue is composed of papers and book reviews written by professors and Phd researchers from international and national universities. This is the only journal that covers Bakhtinian studiesper seand that dialogues with other areas of knowledge in Brazil and abroad.