向量空间模型如何消除形容词的歧义:一项危险但有效的事业

Yearbook of the German Cognitive Linguistics Association Pub Date : 2022-11-01 DOI:10.1515/gcla-2022-0002

Mariana Montes, D. Geeraerts

{"title":"向量空间模型如何消除形容词的歧义:一项危险但有效的事业","authors":"Mariana Montes, D. Geeraerts","doi":"10.1515/gcla-2022-0002","DOIUrl":null,"url":null,"abstract":"Abstract The present study is part of a larger research project developing computational tools for large-scale corpus-based semantic analyses. One such tool represents semantic structure with vector space models (VSMs). The paper shows that this tool and the models built require a deeper understanding, especially with a view to how its results relate to cognitive theories of meaning. Although token-based VSMs are increasingly used in corpus-based cognitive semantics, we believe it is insufficiently appreciated how alternative parameter settings deal with a range of semantic issues, such as granularity of meaning, prototypicality of the domain of application and interaction with syntactic patterns. For the purpose of this paper, we will focus on only one of those issues, viz. the prototypicality of the domain of application, presenting the results of three of our case studies on the Dutch adjectives heilzaam, hoekig, hachelijk and geldig. The models presented are built from a 520MW corpus of contemporary Dutch and Flemish newspapers and by varying parameters such as window size, part-of-speech and frequency thresholds in the selection of features. The resulting VSMs are evaluated through visual analytics: although multidimensional, they can be reduced to 2D and represented in scatterplots where more similar tokens appear closer to each other. The color-coding with manual sense tags employed here makes it possible to compare the groupings provided by human annotators with those of the computational models in a way consistent with the cognitive approach to meaning and categorization.","PeriodicalId":418519,"journal":{"name":"Yearbook of the German Cognitive Linguistics Association","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"How vector space models disambiguate adjectives: A perilous but valid enterprise\",\"authors\":\"Mariana Montes, D. Geeraerts\",\"doi\":\"10.1515/gcla-2022-0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The present study is part of a larger research project developing computational tools for large-scale corpus-based semantic analyses. One such tool represents semantic structure with vector space models (VSMs). The paper shows that this tool and the models built require a deeper understanding, especially with a view to how its results relate to cognitive theories of meaning. Although token-based VSMs are increasingly used in corpus-based cognitive semantics, we believe it is insufficiently appreciated how alternative parameter settings deal with a range of semantic issues, such as granularity of meaning, prototypicality of the domain of application and interaction with syntactic patterns. For the purpose of this paper, we will focus on only one of those issues, viz. the prototypicality of the domain of application, presenting the results of three of our case studies on the Dutch adjectives heilzaam, hoekig, hachelijk and geldig. The models presented are built from a 520MW corpus of contemporary Dutch and Flemish newspapers and by varying parameters such as window size, part-of-speech and frequency thresholds in the selection of features. The resulting VSMs are evaluated through visual analytics: although multidimensional, they can be reduced to 2D and represented in scatterplots where more similar tokens appear closer to each other. The color-coding with manual sense tags employed here makes it possible to compare the groupings provided by human annotators with those of the computational models in a way consistent with the cognitive approach to meaning and categorization.\",\"PeriodicalId\":418519,\"journal\":{\"name\":\"Yearbook of the German Cognitive Linguistics Association\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Yearbook of the German Cognitive Linguistics Association\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/gcla-2022-0002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Yearbook of the German Cognitive Linguistics Association","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/gcla-2022-0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

目前的研究是一个更大的研究项目的一部分，该项目为大规模基于语料库的语义分析开发计算工具。其中一个工具用向量空间模型(VSMs)表示语义结构。本文表明，这一工具和所建立的模型需要更深入的理解，特别是考虑到其结果与意义认知理论的关系。尽管基于标记的vsm越来越多地应用于基于语料库的认知语义中，但我们认为，对替代参数设置如何处理一系列语义问题(如意义粒度、应用领域的原型性以及与句法模式的交互)的认识还不够充分。为了本文的目的，我们将只关注其中一个问题，即应用领域的原型性，展示我们对荷兰语形容词heilzaam, hoekig, hachhelijk和geldig的三个案例研究的结果。所呈现的模型是基于520MW的当代荷兰语和佛兰德语报纸语料库，并通过不同的参数，如窗口大小、词性和频率阈值来选择特征。由此产生的vsm通过可视化分析进行评估:尽管是多维的，但它们可以被简化为2D，并在散点图中表示，在散点图中，更多相似的标记看起来更靠近彼此。这里使用的带有手动感知标签的颜色编码使得以与意义和分类的认知方法一致的方式将人类注释者提供的分组与计算模型提供的分组进行比较成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How vector space models disambiguate adjectives: A perilous but valid enterprise

Abstract The present study is part of a larger research project developing computational tools for large-scale corpus-based semantic analyses. One such tool represents semantic structure with vector space models (VSMs). The paper shows that this tool and the models built require a deeper understanding, especially with a view to how its results relate to cognitive theories of meaning. Although token-based VSMs are increasingly used in corpus-based cognitive semantics, we believe it is insufficiently appreciated how alternative parameter settings deal with a range of semantic issues, such as granularity of meaning, prototypicality of the domain of application and interaction with syntactic patterns. For the purpose of this paper, we will focus on only one of those issues, viz. the prototypicality of the domain of application, presenting the results of three of our case studies on the Dutch adjectives heilzaam, hoekig, hachelijk and geldig. The models presented are built from a 520MW corpus of contemporary Dutch and Flemish newspapers and by varying parameters such as window size, part-of-speech and frequency thresholds in the selection of features. The resulting VSMs are evaluated through visual analytics: although multidimensional, they can be reduced to 2D and represented in scatterplots where more similar tokens appear closer to each other. The color-coding with manual sense tags employed here makes it possible to compare the groupings provided by human annotators with those of the computational models in a way consistent with the cognitive approach to meaning and categorization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Yearbook of the German Cognitive Linguistics Association

自引率

0.00%

发文量