{"title":"向量空间模型如何消除形容词的歧义:一项危险但有效的事业","authors":"Mariana Montes, D. Geeraerts","doi":"10.1515/gcla-2022-0002","DOIUrl":null,"url":null,"abstract":"Abstract The present study is part of a larger research project developing computational tools for large-scale corpus-based semantic analyses. One such tool represents semantic structure with vector space models (VSMs). The paper shows that this tool and the models built require a deeper understanding, especially with a view to how its results relate to cognitive theories of meaning. Although token-based VSMs are increasingly used in corpus-based cognitive semantics, we believe it is insufficiently appreciated how alternative parameter settings deal with a range of semantic issues, such as granularity of meaning, prototypicality of the domain of application and interaction with syntactic patterns. For the purpose of this paper, we will focus on only one of those issues, viz. the prototypicality of the domain of application, presenting the results of three of our case studies on the Dutch adjectives heilzaam, hoekig, hachelijk and geldig. The models presented are built from a 520MW corpus of contemporary Dutch and Flemish newspapers and by varying parameters such as window size, part-of-speech and frequency thresholds in the selection of features. The resulting VSMs are evaluated through visual analytics: although multidimensional, they can be reduced to 2D and represented in scatterplots where more similar tokens appear closer to each other. The color-coding with manual sense tags employed here makes it possible to compare the groupings provided by human annotators with those of the computational models in a way consistent with the cognitive approach to meaning and categorization.","PeriodicalId":418519,"journal":{"name":"Yearbook of the German Cognitive Linguistics Association","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"How vector space models disambiguate adjectives: A perilous but valid enterprise\",\"authors\":\"Mariana Montes, D. Geeraerts\",\"doi\":\"10.1515/gcla-2022-0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The present study is part of a larger research project developing computational tools for large-scale corpus-based semantic analyses. One such tool represents semantic structure with vector space models (VSMs). The paper shows that this tool and the models built require a deeper understanding, especially with a view to how its results relate to cognitive theories of meaning. Although token-based VSMs are increasingly used in corpus-based cognitive semantics, we believe it is insufficiently appreciated how alternative parameter settings deal with a range of semantic issues, such as granularity of meaning, prototypicality of the domain of application and interaction with syntactic patterns. For the purpose of this paper, we will focus on only one of those issues, viz. the prototypicality of the domain of application, presenting the results of three of our case studies on the Dutch adjectives heilzaam, hoekig, hachelijk and geldig. The models presented are built from a 520MW corpus of contemporary Dutch and Flemish newspapers and by varying parameters such as window size, part-of-speech and frequency thresholds in the selection of features. The resulting VSMs are evaluated through visual analytics: although multidimensional, they can be reduced to 2D and represented in scatterplots where more similar tokens appear closer to each other. The color-coding with manual sense tags employed here makes it possible to compare the groupings provided by human annotators with those of the computational models in a way consistent with the cognitive approach to meaning and categorization.\",\"PeriodicalId\":418519,\"journal\":{\"name\":\"Yearbook of the German Cognitive Linguistics Association\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Yearbook of the German Cognitive Linguistics Association\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/gcla-2022-0002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Yearbook of the German Cognitive Linguistics Association","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/gcla-2022-0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
How vector space models disambiguate adjectives: A perilous but valid enterprise
Abstract The present study is part of a larger research project developing computational tools for large-scale corpus-based semantic analyses. One such tool represents semantic structure with vector space models (VSMs). The paper shows that this tool and the models built require a deeper understanding, especially with a view to how its results relate to cognitive theories of meaning. Although token-based VSMs are increasingly used in corpus-based cognitive semantics, we believe it is insufficiently appreciated how alternative parameter settings deal with a range of semantic issues, such as granularity of meaning, prototypicality of the domain of application and interaction with syntactic patterns. For the purpose of this paper, we will focus on only one of those issues, viz. the prototypicality of the domain of application, presenting the results of three of our case studies on the Dutch adjectives heilzaam, hoekig, hachelijk and geldig. The models presented are built from a 520MW corpus of contemporary Dutch and Flemish newspapers and by varying parameters such as window size, part-of-speech and frequency thresholds in the selection of features. The resulting VSMs are evaluated through visual analytics: although multidimensional, they can be reduced to 2D and represented in scatterplots where more similar tokens appear closer to each other. The color-coding with manual sense tags employed here makes it possible to compare the groupings provided by human annotators with those of the computational models in a way consistent with the cognitive approach to meaning and categorization.