To drop or not to drop? Predicting the omission of the infinitival marker in a Swedish future construction

IF 1.7 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory Pub Date : 2023-05-05 DOI:10.1515/cllt-2022-0101

Aleksandrs Berdicevskis, E. Coussé, Alexander Koplenig, Yvonne Adesam

{"title":"To drop or not to drop? Predicting the omission of the infinitival marker in a Swedish future construction","authors":"Aleksandrs Berdicevskis, E. Coussé, Alexander Koplenig, Yvonne Adesam","doi":"10.1515/cllt-2022-0101","DOIUrl":null,"url":null,"abstract":"Abstract We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Corpus Linguistics and Linguistic Theory","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1515/cllt-2022-0101","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.

查看原文本刊更多论文

摔还是不摔?预测瑞典语将来式中不定式标记的省略

摘要我们研究了瑞典语将来时结构中不定式标记的选择性省略。在过去的二十年里，遗漏的频率迅速增加，这一过程在文献中受到了相当大的关注。我们测试积累的知识是否能准确预测语言的变化。我们从大量的语料库中提取了所有出现的结构。数据集自动注释了语言内部预测因子，这些预测因子先前已被显示或假设会影响变异。为了做出两种预测，我们训练了几个模型：标记在特定的话语中是否会被省略，以及在给定的时间段内省略的比例有多大。对于我们尝试的大多数方法，我们都无法实现比基线更好的性能。唯一的例外是使用自回归综合移动平均模型预测遗漏的比例，用于提前一步预测，在这种情况下，时间是唯一重要的预测因素。我们的数据表明，大多数语言内部预测因素确实对变异有一定影响，但这种影响还不足以产生可靠的预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Corpus Linguistics and Linguistic Theory Multiple-

CiteScore

4.20

自引率

12.50%

发文量

期刊介绍： Corpus Linguistics and Linguistic Theory (CLLT) is a peer-reviewed journal publishing high-quality original corpus-based research focusing on theoretically relevant issues in all core areas of linguistic research, or other recognized topic areas. It provides a forum for researchers from different theoretical backgrounds and different areas of interest that share a commitment to the systematic and exhaustive analysis of naturally occurring language. Contributions from all theoretical frameworks are welcome but they should be addressed at a general audience and thus be explicit about their assumptions and discovery procedures and provide sufficient theoretical background to be accessible to researchers from different frameworks. Topics Corpus Linguistics Quantitative Linguistics Phonology Morphology Semantics Syntax Pragmatics.