The Discriminatory Power of Lexical Context for Alternations: An Information-theoretic Exploration

Journal of Research Design and Statistics in Linguistics and Communication Science Pub Date : 2019-08-29 DOI:10.1558/jrds.38227

S. Gries

{"title":"The Discriminatory Power of Lexical Context for Alternations: An Information-theoretic Exploration","authors":"S. Gries","doi":"10.1558/jrds.38227","DOIUrl":null,"url":null,"abstract":"This paper makes a very exploratory, tentative, and thinking-aloud kind of suggestion for the corpus-based analysis of alternation data. I start from the observation that studies of alternations/choices in particular in corpus linguistics have become increasingly sophisticated in terms of the statistical methods they employ and the number of predictors they involve. While the predictors employed come from many different levels of linguistic analysis – phonology, morphosyntax, semantics, prag-matics/discoursal, textual, psycholinguistic, sociolinguistic, and others – they are usually contextual in nature, meaning they characterize the context of the choice the language user needs to make or has just made. However, one aspect of the context seems to be crucially underutilized when it comes to modeling speakers’ choices: the lexical context. In this paper, I build on recent work in computational psycholinguis-tics to: (a) define a lexical-distribution prototype of each of the (typically, but not necessarily, two) alternants of an alternation; and (b) compute the degree to which each instance of the alternation in question diverges from each of the prototypes. Then, (c) the values that all choices score on the divergences from each of the prototypes are entered as predictors to all others in statistical models to, minimally, serve as a variable that controls for whatever information is contained in the lexical context of an instance of speaker’s choice. I exemplify the approach and its sometimes amazing predictive power on the basis of a choice between near synonyms, two morphosyn-tactic alternations (preposition stranding vs. pied-piping and of-vs. s genitives), and a distinction between the functions of well","PeriodicalId":230971,"journal":{"name":"Journal of Research Design and Statistics in Linguistics and Communication Science","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Research Design and Statistics in Linguistics and Communication Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1558/jrds.38227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper makes a very exploratory, tentative, and thinking-aloud kind of suggestion for the corpus-based analysis of alternation data. I start from the observation that studies of alternations/choices in particular in corpus linguistics have become increasingly sophisticated in terms of the statistical methods they employ and the number of predictors they involve. While the predictors employed come from many different levels of linguistic analysis – phonology, morphosyntax, semantics, prag-matics/discoursal, textual, psycholinguistic, sociolinguistic, and others – they are usually contextual in nature, meaning they characterize the context of the choice the language user needs to make or has just made. However, one aspect of the context seems to be crucially underutilized when it comes to modeling speakers’ choices: the lexical context. In this paper, I build on recent work in computational psycholinguis-tics to: (a) define a lexical-distribution prototype of each of the (typically, but not necessarily, two) alternants of an alternation; and (b) compute the degree to which each instance of the alternation in question diverges from each of the prototypes. Then, (c) the values that all choices score on the divergences from each of the prototypes are entered as predictors to all others in statistical models to, minimally, serve as a variable that controls for whatever information is contained in the lexical context of an instance of speaker’s choice. I exemplify the approach and its sometimes amazing predictive power on the basis of a choice between near synonyms, two morphosyn-tactic alternations (preposition stranding vs. pied-piping and of-vs. s genitives), and a distinction between the functions of well

查看原文本刊更多论文

词汇语境对变化的区别作用:一个信息论的探索

本文对基于语料库的交替数据分析提出了一种探索性、试探性和思想性的建议。我从观察到的变化/选择的研究开始，特别是在语料库语言学中，就他们使用的统计方法和他们涉及的预测因子的数量而言，已经变得越来越复杂。虽然所使用的预测因子来自许多不同层次的语言分析——音韵学、形态语法、语义学、语用学/话语学、文本学、心理语言学、社会语言学等等——但它们通常是语境性的，这意味着它们表征了语言使用者需要做出或刚刚做出选择的语境。然而，语境的一个方面似乎没有充分利用，当涉及到建模说话者的选择:词汇语境。在本文中，我以计算心理语言学的最新工作为基础:(a)定义一个交替的每个(通常但不一定是两个)替代的词汇分布原型;(b)计算所讨论的交替的每个实例偏离每个原型的程度。然后，(c)所有选择在与每个原型的差异上的得分值作为统计模型中所有其他选择的预测因子输入，最低限度地作为控制说话者选择实例的词汇上下文中包含的任何信息的变量。我举例说明了这种方法和它有时惊人的预测能力，这是基于对近义词的选择，两种形态-策略交替(介词的停顿vs.管道和of-vs.)。S定语)，以及well的功能之间的区别

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Research Design and Statistics in Linguistics and Communication Science

自引率

0.00%

发文量