{"title":"Studying negative evidence in Finnish language corpora","authors":"Alexandre Nikolaev, Neil Bermel","doi":"10.3366/word.2023.0229","DOIUrl":null,"url":null,"abstract":"This study explores the relationship between lower-than-expected frequencies of word forms and inherent gaps in Finnish inflectional paradigms. The research aims to determine whether it is possible to predict paradigmatic gaps from lower-than-expected frequencies of word forms. We examined Finnish nouns inflected in a marginal case (the instructive) and hypothesized that some of these nouns may potentially have gaps in their inflectional paradigms. However, we found that such gaps are contingent and do not cause uncertainty when filled. We find that the correlation between inherent gaps and lower frequencies is one-directional: predicting inherent gaps from lower-than-expected frequencies is problematic. The results suggest that any paradigmatic gap suggested by corpus frequency is more likely to be contingent than inherent, and that the less semantic need there is for a particular word form, the more likely it will be unattested even in a large corpus. The research highlights the importance of considering semantic profiles when analyzing the grammaticality of word forms and suggests that statistical tests like Fisher’s exact are not necessarily the right approach to tackle the problem of negative evidence in corpus studies.","PeriodicalId":43166,"journal":{"name":"Word Structure","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Word Structure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3366/word.2023.0229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1
Abstract
This study explores the relationship between lower-than-expected frequencies of word forms and inherent gaps in Finnish inflectional paradigms. The research aims to determine whether it is possible to predict paradigmatic gaps from lower-than-expected frequencies of word forms. We examined Finnish nouns inflected in a marginal case (the instructive) and hypothesized that some of these nouns may potentially have gaps in their inflectional paradigms. However, we found that such gaps are contingent and do not cause uncertainty when filled. We find that the correlation between inherent gaps and lower frequencies is one-directional: predicting inherent gaps from lower-than-expected frequencies is problematic. The results suggest that any paradigmatic gap suggested by corpus frequency is more likely to be contingent than inherent, and that the less semantic need there is for a particular word form, the more likely it will be unattested even in a large corpus. The research highlights the importance of considering semantic profiles when analyzing the grammaticality of word forms and suggests that statistical tests like Fisher’s exact are not necessarily the right approach to tackle the problem of negative evidence in corpus studies.