RuTuBiC语料库中的标记片段。语码转换还是词汇借用?

IF 0.6 0 LANGUAGE & LINGUISTICS
Z. Rezanova
{"title":"RuTuBiC语料库中的标记片段。语码转换还是词汇借用?","authors":"Z. Rezanova","doi":"10.17223/22274200/20/5","DOIUrl":null,"url":null,"abstract":"The article presents a solution to one of the problems of special linguistic markup in the RuTuBiC corpus – the Russian Speech Corpus of Russian-Turkic Bilinguals, asso-ciated with error annotation at the lexical level. The corpus includes three subcorpuses representing materials of the Russian speech of Shor-Russian, Tatar-Russian and Khakass-Russian bilinguals. The article presents solutions developed on the basis of all subcorpuses; the illustrative contexts are drawn from the Shor-Russian subcorpus, recordings of interviews with 14 respondents, about 20 hours of sound. The recordings were made during expeditions to Shoria in 2017–2019. Bilingualism of the respondents is defined as early natural bilingualism with the dominance of the second Russian lan-guage, mother tongues are languages of the family heritage. The theoretical basis of the research was works on linguistic contact at the lexical level. Solutions based on the differentiation of lexemes fully mastered by the system of standard Russian and units with the status of borrowings from other subsystems of the national language and other languages are proposed. In the latter case, linguistic and contextual features are distin-guished that oppose lexical borrowing and code-switching. The typical errors singled out at the lexical level are: [LexId] – idiomatic expressions that are not fixed in the standard language (dialectal and vernacular, slang, etc.), they can also be Turkic calques; [LexSem] – general Russian words used in meanings different from those fixed in the normative sources; [LexSemAgr] – violations of the lexical and semantic agreement norms. The units borrowed from the mother tongue of the respondents are located on the scale of transitions from nuclear to borderline. The nuclear units marked with the [Lex] tag are dialectal units, common words, other word usage cases that are outside the standard, as well as borrowings from the Turkic languages that are not included in the dictionaries of standard Russian. On the border “to the left” are borrowings assimilated to different degrees. On the border “to the right” are non-assimilated borrowings and code-switches. The [CodeSw] marks code-switching, insertion of mother tongue elements into Russian speech. The author considers the inclusion of statements as nuclear cases of code-switching, and single lexical inclusions as transitional cases. Code-switching is evidenced by metatext and linguistic proper, primarily phonetic, indicators. There is an insignificant number of both lexical borrowings and cases of code-switching in the speech of the respondents of the RuTuBiC corpus, which depends on the type of bilingualism. The typicality of metatext marking of borrowings and code-switches is determined by the discursive, genre and thematic limitations of the corpus.","PeriodicalId":41132,"journal":{"name":"Voprosy Leksikografii-Russian Journal of Lexicography","volume":"1 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Markup Fragment in the RuTuBiC Linguistic Corpus. Code-Switching or Lexical Borrowing?\",\"authors\":\"Z. Rezanova\",\"doi\":\"10.17223/22274200/20/5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article presents a solution to one of the problems of special linguistic markup in the RuTuBiC corpus – the Russian Speech Corpus of Russian-Turkic Bilinguals, asso-ciated with error annotation at the lexical level. The corpus includes three subcorpuses representing materials of the Russian speech of Shor-Russian, Tatar-Russian and Khakass-Russian bilinguals. The article presents solutions developed on the basis of all subcorpuses; the illustrative contexts are drawn from the Shor-Russian subcorpus, recordings of interviews with 14 respondents, about 20 hours of sound. The recordings were made during expeditions to Shoria in 2017–2019. Bilingualism of the respondents is defined as early natural bilingualism with the dominance of the second Russian lan-guage, mother tongues are languages of the family heritage. The theoretical basis of the research was works on linguistic contact at the lexical level. Solutions based on the differentiation of lexemes fully mastered by the system of standard Russian and units with the status of borrowings from other subsystems of the national language and other languages are proposed. In the latter case, linguistic and contextual features are distin-guished that oppose lexical borrowing and code-switching. The typical errors singled out at the lexical level are: [LexId] – idiomatic expressions that are not fixed in the standard language (dialectal and vernacular, slang, etc.), they can also be Turkic calques; [LexSem] – general Russian words used in meanings different from those fixed in the normative sources; [LexSemAgr] – violations of the lexical and semantic agreement norms. The units borrowed from the mother tongue of the respondents are located on the scale of transitions from nuclear to borderline. The nuclear units marked with the [Lex] tag are dialectal units, common words, other word usage cases that are outside the standard, as well as borrowings from the Turkic languages that are not included in the dictionaries of standard Russian. On the border “to the left” are borrowings assimilated to different degrees. On the border “to the right” are non-assimilated borrowings and code-switches. The [CodeSw] marks code-switching, insertion of mother tongue elements into Russian speech. The author considers the inclusion of statements as nuclear cases of code-switching, and single lexical inclusions as transitional cases. Code-switching is evidenced by metatext and linguistic proper, primarily phonetic, indicators. There is an insignificant number of both lexical borrowings and cases of code-switching in the speech of the respondents of the RuTuBiC corpus, which depends on the type of bilingualism. The typicality of metatext marking of borrowings and code-switches is determined by the discursive, genre and thematic limitations of the corpus.\",\"PeriodicalId\":41132,\"journal\":{\"name\":\"Voprosy Leksikografii-Russian Journal of Lexicography\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Voprosy Leksikografii-Russian Journal of Lexicography\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17223/22274200/20/5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Voprosy Leksikografii-Russian Journal of Lexicography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17223/22274200/20/5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1

摘要

本文针对RuTuBiC语料库(俄-突厥双语俄语语音语料库)中存在的一个特殊语言标记问题,即词法层面的错误标注问题,提出了一种解决方法。该语料库包括三个子语料库,分别代表沙尔俄语、鞑靼俄语和哈卡斯俄语双语者的俄语语音材料。本文介绍了在所有子语料库的基础上开发的解决方案;说明性的上下文取自short - russian子语料库,对14名受访者的采访录音,大约20小时的声音。这些录音是在2017-2019年对Shoria的探险期间录制的。被调查者的双语能力被定义为早期的自然双语能力,以第二俄语为主导,母语是家庭传承的语言。本研究的理论基础是词汇层面的语言接触研究。提出了基于标准俄语系统完全掌握的词素与母语其他子系统和其他语言的借词状态的单位区分的解决方案。在后一种情况下,语言和语境特征是明显的,反对词汇借用和语码转换。在词汇层面挑出的典型错误是:[LexId] -在标准语言中不固定的习惯表达(方言和白话,俚语等),它们也可以是突厥语的calques;[LexSem] -用于不同于规范来源中固定含义的一般俄语单词;[LexSemAgr] -违反词法和语义协议规范。从被调查者的母语中借用的单位位于从核向边缘过渡的尺度上。标有[Lex]标签的核单位是方言单位、常用词、标准以外的其他词用例,以及未列入标准俄语词典的突厥语借词。在边界的“左边”是不同程度被同化的借物。在边界“向右”是非同化的借用和代码转换。[CodeSw]标志着语码转换,将母语元素插入俄语语音。作者认为语句的包含是语码转换的核心案例,单一词汇的包含是过渡案例。语码转换主要表现为元文本和语言固有性,主要是语音指示。在RuTuBiC语料库的应答者的言语中,词汇借用和语码转换的情况都不明显,这与双语者的类型有关。借语元文本标记和语码转换的典型性是由语料库的话语、体裁和主题限制决定的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Markup Fragment in the RuTuBiC Linguistic Corpus. Code-Switching or Lexical Borrowing?
The article presents a solution to one of the problems of special linguistic markup in the RuTuBiC corpus – the Russian Speech Corpus of Russian-Turkic Bilinguals, asso-ciated with error annotation at the lexical level. The corpus includes three subcorpuses representing materials of the Russian speech of Shor-Russian, Tatar-Russian and Khakass-Russian bilinguals. The article presents solutions developed on the basis of all subcorpuses; the illustrative contexts are drawn from the Shor-Russian subcorpus, recordings of interviews with 14 respondents, about 20 hours of sound. The recordings were made during expeditions to Shoria in 2017–2019. Bilingualism of the respondents is defined as early natural bilingualism with the dominance of the second Russian lan-guage, mother tongues are languages of the family heritage. The theoretical basis of the research was works on linguistic contact at the lexical level. Solutions based on the differentiation of lexemes fully mastered by the system of standard Russian and units with the status of borrowings from other subsystems of the national language and other languages are proposed. In the latter case, linguistic and contextual features are distin-guished that oppose lexical borrowing and code-switching. The typical errors singled out at the lexical level are: [LexId] – idiomatic expressions that are not fixed in the standard language (dialectal and vernacular, slang, etc.), they can also be Turkic calques; [LexSem] – general Russian words used in meanings different from those fixed in the normative sources; [LexSemAgr] – violations of the lexical and semantic agreement norms. The units borrowed from the mother tongue of the respondents are located on the scale of transitions from nuclear to borderline. The nuclear units marked with the [Lex] tag are dialectal units, common words, other word usage cases that are outside the standard, as well as borrowings from the Turkic languages that are not included in the dictionaries of standard Russian. On the border “to the left” are borrowings assimilated to different degrees. On the border “to the right” are non-assimilated borrowings and code-switches. The [CodeSw] marks code-switching, insertion of mother tongue elements into Russian speech. The author considers the inclusion of statements as nuclear cases of code-switching, and single lexical inclusions as transitional cases. Code-switching is evidenced by metatext and linguistic proper, primarily phonetic, indicators. There is an insignificant number of both lexical borrowings and cases of code-switching in the speech of the respondents of the RuTuBiC corpus, which depends on the type of bilingualism. The typicality of metatext marking of borrowings and code-switches is determined by the discursive, genre and thematic limitations of the corpus.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.80
自引率
0.00%
发文量
0
期刊介绍: The mission of the Russian Journal of Lexicography is to accumulate the intellectual potential of scholars and practitioners for the purpose of discussing and solving the topical issues of theoretical and applied lexicography, and new concepts of dictionary compilation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信