Journal of Linguistics/Jazykovedný casopis最新文献

Slovak Question Answering Dataset Based on the Machine Translation of the Squad V2.0 基于《小分队》V2.0 机器翻译的斯洛伐克语问题解答数据集

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0054

J. Staš, D. Hládek, Tomás Koctúr

引用次数: 0

Corroborating Corpus Data with Elicited Introspection Data: A Case Study 用内省数据证实语料库数据：案例研究

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0024

Jakob Horsch

引用次数: 0

Morphosyntactic Annotation in Universal Dependencies for Old Czech 旧捷克语通用依存关系中的语态句法注释

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0039

Daniel Zeman, Pavel Kosek, Martin Březina, Jiří Pergler

引用次数: 0

God Knows How It Turns Out: On Three Constructions Including Bog ‘God’, Čert ‘Devil’ and Some Taboo Words in the Russian Language Over the Last Three Centuries 天知道结果如何：论过去三个世纪俄语中包括 "上帝"（Bog）、"魔鬼"（Čert）和一些禁忌词在内的三种构词法

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0020

Evgeniya Budennaya, Kristina Litvintseva, Anastasia Yakovleva

引用次数: 0

Distractor Generation for Lexical Questions Using Learner Corpus Data 利用学习者语料库数据生成词汇问题的干扰项

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0051

Nikita Login

{"title":"Distractor Generation for Lexical Questions Using Learner Corpus Data","authors":"Nikita Login","doi":"10.2478/jazcas-2023-0051","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0051","url":null,"abstract":"Abstract Learner corpora with error annotation can serve as a source of data for automated question generation (QG) for language testing. In case of multiple choice gapfill lexical questions, this process involves two steps. The first step is to extract sentences with lexical corrections from the learner corpus. The second step, which is the focus of this paper, is to generate distractors for the retrieved questions. The presented approach (called DisSelector) is based on supervised learning on specially annotated learner corpus data. For each sentence a list of distractor candidates was retrieved. Then, each candidate was manually labelled as a plausible or implausible distractor. The derived set of examples was additionally filtered by a set of lexical and grammatical rules and then split into training and testing subsets in 4:1 ratio. Several classification models, including classical machine learning algorithms and gradient boosting implementations, were trained on the data. Word and sentence vectors from language models together with corpus word frequencies were used as input features for the classifiers. The highest F1-score (0.72) was attained by a XGBoost model. Various configurations of DisSelector showed improvements over the unsupervised baseline in both automatic and expert evaluation. DisSelector was integrated into an opensource language testing platform LangExBank as a microservice with a REST API.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"9 1","pages":"345 - 356"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139372042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lemmatization of the DIA1900 Diachronic Corpus DIA1900双时态语料库的词表化

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0045

Lucie Benešová, Klára Pivoňková, Martin Stluka

引用次数: 0

Annotation of Analytic Verb Forms in Czech – Complex Cases 捷克语分析动词形式的注释 - 复杂案例

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0041

Vladimír Petkevic, Hana Skoumalová

引用次数: 0

The Epistemic Marker Určit Ě in the Light of Corpus Data 从语料库数据看认识标记Určit Ě

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0031

B. Štěpánková, J. Šindlerová, Lucie Poláková

引用次数: 0

Adverbs Derived from Adjectival Present Participles in Polish, Slovak and Czech: A Comparative Corpus-Based Study 波兰语、斯洛伐克语和捷克语中由形容词性现在分词派生的副词：基于语料库的比较研究

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0030

Aksana Schillová

引用次数: 0

Linear Dependency Segments in Foreign Language Acquisition: Syntactic Complexity Analysis in Czech Learners’ Texts 外语习得中的线性依赖段：捷克语学习者文本中的句法复杂性分析

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI: 10.2478/jazcas-2023-0037

Michaela Nogolová, Michaela Hanušková, Miroslav Kubát, Radek Čech

{"title":"Linear Dependency Segments in Foreign Language Acquisition: Syntactic Complexity Analysis in Czech Learners’ Texts","authors":"Michaela Nogolová, Michaela Hanušková, Miroslav Kubát, Radek Čech","doi":"10.2478/jazcas-2023-0037","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0037","url":null,"abstract":"Abstract The paper discusses a new way to measure syntactic complexity in foreign language acquisition. It is based on a recently proposed syntactic unit called linear dependency segment (LDS), the longest possible sequence of words belonging to the same clause where all linear neighbours are also syntactic neighbours. The dataset comprises 5,721 Czech texts from the CzeSL-SGT learner corpus covering five CEFR proficiency levels (A1–C1). The study covers two analyses. First, the development of the average clause length in terms of LDS and the average LDS length in the number of words across the latter language proficiency levels. Second, we consider the differences between Slavic and non-Slavic speakers. The results show an increasing tendency of the average clause length measured in LDS while the average clause length measured in words is decreasing. Results also show statistically significant differences between Slavic and non-Slavic speakers in most cases. Our results indicate that using LDS may be a useful unit of syntactic complexity measure in foreign language acquisition research.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"109 1","pages":"193 - 203"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0