{"title":"芬兰语名词屈折的分布语义生成模型","authors":"A. Nikolaev, Yu-Ying Chuang, R. Baayen","doi":"10.1075/ml.22008.nik","DOIUrl":null,"url":null,"abstract":"\n Finnish nouns are characterized by rich inflectional variation, with obligatory marking of case and number, with\n optional possessive suffixes and with the possibility of further cliticization. We present a model for the conceptualization of\n Finnish inflected nouns, using pre-compiled fasttext embeddings (300-dimensional semantic vectors that approximate words’\n meanings). Instead of deriving the semantic vector of an inflected word from another word in its paradigm, we propose that an\n inflected word is conceptualized by means of summation of latent vectors representing the meanings of its lexeme and its\n inflectional features. We tested this model on the 2,000 most frequent Finnish nouns and their inflected word forms from a corpus\n of Finnish (84 million tokens). Visualization of the semantic space of Finnish using t-SNE clarified that a ‘main effects’\n additive model does not do justice to the semantics of inflection. In Finnish, how number is realized turns out to vary\n substantially with case. Further interactions emerged with the possessive suffixes and the clitics. By taking these interactions\n into account, the accuracy of our model, evaluated with the fasttext embeddings as gold standard, improved from 76% to 89%.\n Analyses of the errors made by the model clarified that 7.5% of errors are due to overabundance (and hence not true errors), and\n that 16.5% of the errors involved exchanges of semantically highly similar stems (lexemes). Our results indicate, first, that the\n semantics of Finnish noun inflection are more intricate than assumed thus far, and second, that these intricacies can be captured\n with surprisingly high accuracy by a simple generating model based on imputed semantic vectors for lexemes, inflectional features,\n and interactions of inflectional features.","PeriodicalId":45215,"journal":{"name":"Mental Lexicon","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A generating model for Finnish nominal inflection using distributional semantics\",\"authors\":\"A. Nikolaev, Yu-Ying Chuang, R. Baayen\",\"doi\":\"10.1075/ml.22008.nik\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Finnish nouns are characterized by rich inflectional variation, with obligatory marking of case and number, with\\n optional possessive suffixes and with the possibility of further cliticization. We present a model for the conceptualization of\\n Finnish inflected nouns, using pre-compiled fasttext embeddings (300-dimensional semantic vectors that approximate words’\\n meanings). Instead of deriving the semantic vector of an inflected word from another word in its paradigm, we propose that an\\n inflected word is conceptualized by means of summation of latent vectors representing the meanings of its lexeme and its\\n inflectional features. We tested this model on the 2,000 most frequent Finnish nouns and their inflected word forms from a corpus\\n of Finnish (84 million tokens). Visualization of the semantic space of Finnish using t-SNE clarified that a ‘main effects’\\n additive model does not do justice to the semantics of inflection. In Finnish, how number is realized turns out to vary\\n substantially with case. Further interactions emerged with the possessive suffixes and the clitics. By taking these interactions\\n into account, the accuracy of our model, evaluated with the fasttext embeddings as gold standard, improved from 76% to 89%.\\n Analyses of the errors made by the model clarified that 7.5% of errors are due to overabundance (and hence not true errors), and\\n that 16.5% of the errors involved exchanges of semantically highly similar stems (lexemes). Our results indicate, first, that the\\n semantics of Finnish noun inflection are more intricate than assumed thus far, and second, that these intricacies can be captured\\n with surprisingly high accuracy by a simple generating model based on imputed semantic vectors for lexemes, inflectional features,\\n and interactions of inflectional features.\",\"PeriodicalId\":45215,\"journal\":{\"name\":\"Mental Lexicon\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2023-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mental Lexicon\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1075/ml.22008.nik\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mental Lexicon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/ml.22008.nik","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"LINGUISTICS","Score":null,"Total":0}
A generating model for Finnish nominal inflection using distributional semantics
Finnish nouns are characterized by rich inflectional variation, with obligatory marking of case and number, with
optional possessive suffixes and with the possibility of further cliticization. We present a model for the conceptualization of
Finnish inflected nouns, using pre-compiled fasttext embeddings (300-dimensional semantic vectors that approximate words’
meanings). Instead of deriving the semantic vector of an inflected word from another word in its paradigm, we propose that an
inflected word is conceptualized by means of summation of latent vectors representing the meanings of its lexeme and its
inflectional features. We tested this model on the 2,000 most frequent Finnish nouns and their inflected word forms from a corpus
of Finnish (84 million tokens). Visualization of the semantic space of Finnish using t-SNE clarified that a ‘main effects’
additive model does not do justice to the semantics of inflection. In Finnish, how number is realized turns out to vary
substantially with case. Further interactions emerged with the possessive suffixes and the clitics. By taking these interactions
into account, the accuracy of our model, evaluated with the fasttext embeddings as gold standard, improved from 76% to 89%.
Analyses of the errors made by the model clarified that 7.5% of errors are due to overabundance (and hence not true errors), and
that 16.5% of the errors involved exchanges of semantically highly similar stems (lexemes). Our results indicate, first, that the
semantics of Finnish noun inflection are more intricate than assumed thus far, and second, that these intricacies can be captured
with surprisingly high accuracy by a simple generating model based on imputed semantic vectors for lexemes, inflectional features,
and interactions of inflectional features.
期刊介绍:
The Mental Lexicon is an interdisciplinary journal that provides an international forum for research that bears on the issues of the representation and processing of words in the mind and brain. We encourage both the submission of original research and reviews of significant new developments in the understanding of the mental lexicon. The journal publishes work that includes, but is not limited to the following: Models of the representation of words in the mind Computational models of lexical access and production Experimental investigations of lexical processing Neurolinguistic studies of lexical impairment. Functional neuroimaging and lexical representation in the brain Lexical development across the lifespan Lexical processing in second language acquisition The bilingual mental lexicon Lexical and morphological structure across languages Formal models of lexical structure Corpus research on the lexicon New experimental paradigms and statistical techniques for mental lexicon research.