Linguistic and statistical analysis of the lexical ‘Langue-Parole’ dichotomy in a restricted domain

IF 0.9 0 LANGUAGE & LINGUISTICS

Russian Journal of Linguistics Pub Date : 2023-06-30 DOI:10.22363/2687-0088-32933

S. Sheremetyeva, O. Babina

{"title":"Linguistic and statistical analysis of the lexical ‘Langue-Parole’ dichotomy in a restricted domain","authors":"S. Sheremetyeva, O. Babina","doi":"10.22363/2687-0088-32933","DOIUrl":null,"url":null,"abstract":"Development of new digital methods for analyzing the ‘Langue-Parole’ dichotomy is one of the most sought-after, but least researched problems of modern theoretical and applied linguistics. This determines the relevance of this study, the purpose of which is to develop a methodology for the automated linguastatistical analysis of a domain-related lexical layer in the context of the ‘Langue-Parole’ dichotomy and to apply the methodology to the Russian-language domain “Research on athlete integrative physiology” (RAIP). The study was conducted on the material of the Russian-language corpus including 56 RAIP domain texts of 300,000 wordforms in total published over the 2013-2020 period in the scientific journals “People. Sport. Medicine” (formerly “SUSU Bulletin. Series “Education, Healthcare, Physical Culture”), “Theory and Practice of Physical Culture”, etc. The key methodological approach is the ontological analysis of corpus data using statistical and linguistic modeling methods. The domain-specific language and speech are modeled by the corresponding lexicon and corpus, while the ‘Langue-Parole’ lexical dichotomy is represented by the values of the linguistic-statistical concept verbalization parameters of the domain concepts in the lexicon and corpus. The computational parameters include the indices of lexical diversity, structural complexity, conceptual syncretism, lexical structural complexity vs. conceptual syncretism correlation, and syncretical concept junction when verbalized in the corpus. The main results of the study are: 1) а methodology for analyzing domain-specific lexical dichotomy ‘Langue-Parole’, which can be ported to other domains and national languages; 2) the RAIP domain-related resources, including language-independent ontology, conceptually annotated Russian corpus, onto-lexicon, linguistic-statistical parameter values of the lexical ‘Langue-Parole’ dichotomy; and 3) tools that automate certain stages of the study.","PeriodicalId":53426,"journal":{"name":"Russian Journal of Linguistics","volume":"1 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Journal of Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22363/2687-0088-32933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Development of new digital methods for analyzing the ‘Langue-Parole’ dichotomy is one of the most sought-after, but least researched problems of modern theoretical and applied linguistics. This determines the relevance of this study, the purpose of which is to develop a methodology for the automated linguastatistical analysis of a domain-related lexical layer in the context of the ‘Langue-Parole’ dichotomy and to apply the methodology to the Russian-language domain “Research on athlete integrative physiology” (RAIP). The study was conducted on the material of the Russian-language corpus including 56 RAIP domain texts of 300,000 wordforms in total published over the 2013-2020 period in the scientific journals “People. Sport. Medicine” (formerly “SUSU Bulletin. Series “Education, Healthcare, Physical Culture”), “Theory and Practice of Physical Culture”, etc. The key methodological approach is the ontological analysis of corpus data using statistical and linguistic modeling methods. The domain-specific language and speech are modeled by the corresponding lexicon and corpus, while the ‘Langue-Parole’ lexical dichotomy is represented by the values of the linguistic-statistical concept verbalization parameters of the domain concepts in the lexicon and corpus. The computational parameters include the indices of lexical diversity, structural complexity, conceptual syncretism, lexical structural complexity vs. conceptual syncretism correlation, and syncretical concept junction when verbalized in the corpus. The main results of the study are: 1) а methodology for analyzing domain-specific lexical dichotomy ‘Langue-Parole’, which can be ported to other domains and national languages; 2) the RAIP domain-related resources, including language-independent ontology, conceptually annotated Russian corpus, onto-lexicon, linguistic-statistical parameter values of the lexical ‘Langue-Parole’ dichotomy; and 3) tools that automate certain stages of the study.

查看原文本刊更多论文

词汇“语言-言语”二分法在限定范围内的语言学和统计学分析

发展新的数字方法来分析“语言-言语”二分法是现代理论和应用语言学中最受欢迎但研究最少的问题之一。这决定了本研究的相关性，其目的是开发一种在“语言-言语”二分法的背景下对领域相关词汇层进行自动语言统计分析的方法，并将该方法应用于俄语领域“运动员综合生理学研究”(RAIP)。该研究的材料是俄语语料库，包括2013-2020年期间发表在科学期刊《人物》上的56个RAIP领域文本，共30万个单词形式。运动。医学”(原“南乌拉尔国立大学公报”)。《教育、保健、体育》系列、《体育理论与实践》等。关键的方法论方法是使用统计和语言建模方法对语料库数据进行本体论分析。特定领域的语言和语音由相应的词汇和语料库建模，而“语言-言语”词汇二分法由词汇和语料库中领域概念的语言统计概念语化参数值表示。计算参数包括语料库中词汇多样性指数、结构复杂性指数、概念融合指数、词汇结构复杂性与概念融合相关性指数、综合概念连接指数。本研究的主要成果有:1)建立了特定领域词汇二分法“language - parole”的分析方法，该方法可移植到其他领域和民族语言;2)与RAIP领域相关的资源，包括与语言无关的本体、概念标注的俄语语料库、本体-词汇、词汇“语言-言语”二分法的语言统计参数值;3)自动化研究某些阶段的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊