{"title":"Linguistic and statistical analysis of the lexical ‘Langue-Parole’ dichotomy in a restricted domain","authors":"S. Sheremetyeva, O. Babina","doi":"10.22363/2687-0088-32933","DOIUrl":null,"url":null,"abstract":"Development of new digital methods for analyzing the ‘Langue-Parole’ dichotomy is one of the most sought-after, but least researched problems of modern theoretical and applied linguistics. This determines the relevance of this study, the purpose of which is to develop a methodology for the automated linguastatistical analysis of a domain-related lexical layer in the context of the ‘Langue-Parole’ dichotomy and to apply the methodology to the Russian-language domain “Research on athlete integrative physiology” (RAIP). The study was conducted on the material of the Russian-language corpus including 56 RAIP domain texts of 300,000 wordforms in total published over the 2013-2020 period in the scientific journals “People. Sport. Medicine” (formerly “SUSU Bulletin. Series “Education, Healthcare, Physical Culture”), “Theory and Practice of Physical Culture”, etc. The key methodological approach is the ontological analysis of corpus data using statistical and linguistic modeling methods. The domain-specific language and speech are modeled by the corresponding lexicon and corpus, while the ‘Langue-Parole’ lexical dichotomy is represented by the values of the linguistic-statistical concept verbalization parameters of the domain concepts in the lexicon and corpus. The computational parameters include the indices of lexical diversity, structural complexity, conceptual syncretism, lexical structural complexity vs. conceptual syncretism correlation, and syncretical concept junction when verbalized in the corpus. The main results of the study are: 1) а methodology for analyzing domain-specific lexical dichotomy ‘Langue-Parole’, which can be ported to other domains and national languages; 2) the RAIP domain-related resources, including language-independent ontology, conceptually annotated Russian corpus, onto-lexicon, linguistic-statistical parameter values of the lexical ‘Langue-Parole’ dichotomy; and 3) tools that automate certain stages of the study.","PeriodicalId":53426,"journal":{"name":"Russian Journal of Linguistics","volume":"1 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Journal of Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22363/2687-0088-32933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Development of new digital methods for analyzing the ‘Langue-Parole’ dichotomy is one of the most sought-after, but least researched problems of modern theoretical and applied linguistics. This determines the relevance of this study, the purpose of which is to develop a methodology for the automated linguastatistical analysis of a domain-related lexical layer in the context of the ‘Langue-Parole’ dichotomy and to apply the methodology to the Russian-language domain “Research on athlete integrative physiology” (RAIP). The study was conducted on the material of the Russian-language corpus including 56 RAIP domain texts of 300,000 wordforms in total published over the 2013-2020 period in the scientific journals “People. Sport. Medicine” (formerly “SUSU Bulletin. Series “Education, Healthcare, Physical Culture”), “Theory and Practice of Physical Culture”, etc. The key methodological approach is the ontological analysis of corpus data using statistical and linguistic modeling methods. The domain-specific language and speech are modeled by the corresponding lexicon and corpus, while the ‘Langue-Parole’ lexical dichotomy is represented by the values of the linguistic-statistical concept verbalization parameters of the domain concepts in the lexicon and corpus. The computational parameters include the indices of lexical diversity, structural complexity, conceptual syncretism, lexical structural complexity vs. conceptual syncretism correlation, and syncretical concept junction when verbalized in the corpus. The main results of the study are: 1) а methodology for analyzing domain-specific lexical dichotomy ‘Langue-Parole’, which can be ported to other domains and national languages; 2) the RAIP domain-related resources, including language-independent ontology, conceptually annotated Russian corpus, onto-lexicon, linguistic-statistical parameter values of the lexical ‘Langue-Parole’ dichotomy; and 3) tools that automate certain stages of the study.