A generating model for Finnish nominal inflection using distributional semantics

IF 0.6 Q3 LINGUISTICS
A. Nikolaev, Yu-Ying Chuang, R. Baayen
{"title":"A generating model for Finnish nominal inflection using distributional semantics","authors":"A. Nikolaev, Yu-Ying Chuang, R. Baayen","doi":"10.1075/ml.22008.nik","DOIUrl":null,"url":null,"abstract":"\n Finnish nouns are characterized by rich inflectional variation, with obligatory marking of case and number, with\n optional possessive suffixes and with the possibility of further cliticization. We present a model for the conceptualization of\n Finnish inflected nouns, using pre-compiled fasttext embeddings (300-dimensional semantic vectors that approximate words’\n meanings). Instead of deriving the semantic vector of an inflected word from another word in its paradigm, we propose that an\n inflected word is conceptualized by means of summation of latent vectors representing the meanings of its lexeme and its\n inflectional features. We tested this model on the 2,000 most frequent Finnish nouns and their inflected word forms from a corpus\n of Finnish (84 million tokens). Visualization of the semantic space of Finnish using t-SNE clarified that a ‘main effects’\n additive model does not do justice to the semantics of inflection. In Finnish, how number is realized turns out to vary\n substantially with case. Further interactions emerged with the possessive suffixes and the clitics. By taking these interactions\n into account, the accuracy of our model, evaluated with the fasttext embeddings as gold standard, improved from 76% to 89%.\n Analyses of the errors made by the model clarified that 7.5% of errors are due to overabundance (and hence not true errors), and\n that 16.5% of the errors involved exchanges of semantically highly similar stems (lexemes). Our results indicate, first, that the\n semantics of Finnish noun inflection are more intricate than assumed thus far, and second, that these intricacies can be captured\n with surprisingly high accuracy by a simple generating model based on imputed semantic vectors for lexemes, inflectional features,\n and interactions of inflectional features.","PeriodicalId":45215,"journal":{"name":"Mental Lexicon","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mental Lexicon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/ml.22008.nik","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 2

Abstract

Finnish nouns are characterized by rich inflectional variation, with obligatory marking of case and number, with optional possessive suffixes and with the possibility of further cliticization. We present a model for the conceptualization of Finnish inflected nouns, using pre-compiled fasttext embeddings (300-dimensional semantic vectors that approximate words’ meanings). Instead of deriving the semantic vector of an inflected word from another word in its paradigm, we propose that an inflected word is conceptualized by means of summation of latent vectors representing the meanings of its lexeme and its inflectional features. We tested this model on the 2,000 most frequent Finnish nouns and their inflected word forms from a corpus of Finnish (84 million tokens). Visualization of the semantic space of Finnish using t-SNE clarified that a ‘main effects’ additive model does not do justice to the semantics of inflection. In Finnish, how number is realized turns out to vary substantially with case. Further interactions emerged with the possessive suffixes and the clitics. By taking these interactions into account, the accuracy of our model, evaluated with the fasttext embeddings as gold standard, improved from 76% to 89%. Analyses of the errors made by the model clarified that 7.5% of errors are due to overabundance (and hence not true errors), and that 16.5% of the errors involved exchanges of semantically highly similar stems (lexemes). Our results indicate, first, that the semantics of Finnish noun inflection are more intricate than assumed thus far, and second, that these intricacies can be captured with surprisingly high accuracy by a simple generating model based on imputed semantic vectors for lexemes, inflectional features, and interactions of inflectional features.
芬兰语名词屈折的分布语义生成模型
芬兰语名词具有丰富的屈折变化特征,具有强制性的格和数标记,具有可选的所有格后缀,并具有进一步的集团化的可能性。我们提出了一个芬兰语屈折名词概念化模型,使用预先编译的快速文本嵌入(300维近似单词含义的语义向量)。我们提出,一个屈折词是通过表示其词位含义及其屈折特征的潜在向量的总和来概念化的,而不是从另一个词的范式中推导屈折词的语义向量。我们在芬兰语语料库(8400万个标记)中的2000个最常见的芬兰语名词及其屈折词形式上测试了这个模型。使用t-SNE对芬兰语的语义空间进行可视化,澄清了“主要效应”加性模型不符合屈折的语义。在芬兰语中,数字的实现方式因情况而异。所有格后缀和派系之间出现了进一步的互动。通过考虑这些交互,我们的模型的准确性从76%提高到89%,该模型以fasttext嵌入作为金标准进行评估。对该模型所做错误的分析表明,7.5%的错误是由于过多(因此不是真正的错误),16.5%的错误涉及语义高度相似的词干(词汇)的交换。我们的研究结果表明,首先,芬兰语名词屈折的语义比迄今为止假设的更复杂,其次,通过一个基于词位、屈折特征和屈折特征的相互作用的估算语义向量的简单生成模型,可以以惊人的高精度捕捉到这些复杂情况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Mental Lexicon
Mental Lexicon LINGUISTICS-
CiteScore
1.50
自引率
0.00%
发文量
11
期刊介绍: The Mental Lexicon is an interdisciplinary journal that provides an international forum for research that bears on the issues of the representation and processing of words in the mind and brain. We encourage both the submission of original research and reviews of significant new developments in the understanding of the mental lexicon. The journal publishes work that includes, but is not limited to the following: Models of the representation of words in the mind Computational models of lexical access and production Experimental investigations of lexical processing Neurolinguistic studies of lexical impairment. Functional neuroimaging and lexical representation in the brain Lexical development across the lifespan Lexical processing in second language acquisition The bilingual mental lexicon Lexical and morphological structure across languages Formal models of lexical structure Corpus research on the lexicon New experimental paradigms and statistical techniques for mental lexicon research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信