Formulaic expressions in Korean academic discourse: A corpus-based combinatoric morphemic analysis

IF 1.3 3区文学 0 LANGUAGE & LINGUISTICS

Lingua Pub Date : 2025-04-01 Epub Date: 2025-02-25 DOI:10.1016/j.lingua.2025.103912

Beomil Kang , Sun-Hee Lee

{"title":"Formulaic expressions in Korean academic discourse: A corpus-based combinatoric morphemic analysis","authors":"Beomil Kang , Sun-Hee Lee","doi":"10.1016/j.lingua.2025.103912","DOIUrl":null,"url":null,"abstract":"<div><div>This study introduces a corpus-based investigation of formulaic expressions in Korean academic discourse, employing a refined morphemic analysis designed for agglutinative properties of Korean. In the corpus analysis, dynamic discourse functions of formulaic sequences in Korean academic prose and formal conversation are explored while determining distinct register-based properties. Over the past thirty years, various corpus-based studies have rigorously examined recurrent formulaic expressions (so-called lexical bundles or multi-word expressions) in English, Spanish, etc. In contrast, there have been few studies in an agglutinative language like Korean with intricate morphosyntactic dependencies. By implementing pre-processing of allomorphs and unnecessary morphological units and lemmatization of predicate endings, the new combinatoric morphemic analysis provides substantial lists of lexico-grammatical patterns with accurate frequency information. This methodological template paves the way for further research into formulaic units in other agglutinative languages like Japanese, Turkish, and beyond. Three types of corpora have been used: a written corpus (2000 academic journal papers), a spoken corpus of formal conversation and a balanced reference corpus (3 million words). The result affirms high productivity and dynamic linguistic functions of Korean formulaic expressions in academic discourse, which indicates their utility as a valuable resource for exploring the process of second language learning and pedagogy.</div></div>","PeriodicalId":47955,"journal":{"name":"Lingua","volume":"318 ","pages":"Article 103912"},"PeriodicalIF":1.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lingua","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0024384125000373","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/25 0:00:00","PubModel":"Epub","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

Abstract

This study introduces a corpus-based investigation of formulaic expressions in Korean academic discourse, employing a refined morphemic analysis designed for agglutinative properties of Korean. In the corpus analysis, dynamic discourse functions of formulaic sequences in Korean academic prose and formal conversation are explored while determining distinct register-based properties. Over the past thirty years, various corpus-based studies have rigorously examined recurrent formulaic expressions (so-called lexical bundles or multi-word expressions) in English, Spanish, etc. In contrast, there have been few studies in an agglutinative language like Korean with intricate morphosyntactic dependencies. By implementing pre-processing of allomorphs and unnecessary morphological units and lemmatization of predicate endings, the new combinatoric morphemic analysis provides substantial lists of lexico-grammatical patterns with accurate frequency information. This methodological template paves the way for further research into formulaic units in other agglutinative languages like Japanese, Turkish, and beyond. Three types of corpora have been used: a written corpus (2000 academic journal papers), a spoken corpus of formal conversation and a balanced reference corpus (3 million words). The result affirms high productivity and dynamic linguistic functions of Korean formulaic expressions in academic discourse, which indicates their utility as a valuable resource for exploring the process of second language learning and pedagogy.

查看原文本刊更多论文

韩国语学术话语中的公式化表达：基于语料库的组合语素分析

本研究介绍了一个基于语料库的韩国学术话语的公式化表达的调查，采用了完善的语素分析，旨在为韩国的粘合特性。在语料库分析中，研究了韩国学术散文和正式会话中公式化序列的动态话语功能，同时确定了不同的基于语域的属性。在过去的三十年里，各种基于语料库的研究严格检查了英语、西班牙语等语言中的循环公式表达（所谓的词汇束或多词表达）。相比之下，对像韩语这样具有复杂形态句法依赖性的黏着语言的研究却很少。新的组合语素分析方法通过对异形体和不必要的形态单位进行预处理，并对谓语词尾进行词素化处理，从而提供具有准确频率信息的大量词汇语法模式列表。该方法模板为进一步研究其他黏性语言（如日语、土耳其语等）的公式化单位铺平了道路。使用了三种类型的语料库：书面语料库（2000篇学术期刊论文），正式会话口语语料库和平衡参考语料库（300万字）。研究结果证实了韩国语在学术话语中的高生产力和动态语言功能，表明韩国语是探索第二语言学习和教学过程的宝贵资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Lingua Multiple-

CiteScore

2.50

自引率

9.10%

发文量

审稿时长

24 weeks

期刊介绍： Lingua publishes papers of any length, if justified, as well as review articles surveying developments in the various fields of linguistics, and occasional discussions. A considerable number of pages in each issue are devoted to critical book reviews. Lingua also publishes Lingua Franca articles consisting of provocative exchanges expressing strong opinions on central topics in linguistics; The Decade In articles which are educational articles offering the nonspecialist linguist an overview of a given area of study; and Taking up the Gauntlet special issues composed of a set number of papers examining one set of data and exploring whose theory offers the most insight with a minimal set of assumptions and a maximum of arguments.