{"title":"Compound or phrase or in between? Testing linguistic criteria for compoundhood in English","authors":"Patrick Ziering, Lonneke van der Plas","doi":"10.3366/word.2020.0169","DOIUrl":null,"url":null,"abstract":"In this paper, we present an empirical study on the definition of compounds in English, the graded nature of the phenomenon and its correlations with the commonly used linguistic criteria for compoundhood. We create a resource that includes a diverse set of nominal compounds identified by two trained independent annotators in sentences from the proceedings of the European Parliament. In addition, the annotators provide ratings on the compoundhood of the identified compounds, and ratings for the applicability of six prominent linguistic criteria of compoundhood for each item. We show the controversy of defining compounds in practice by comparing the annotations of two annotators, and the graded nature of compoundhood. By measuring the correlation between compoundhood and the six diverse linguistic criteria using machine learning techniques, we show that some linguistic criteria are stronger predictors of compoundhood than others.","PeriodicalId":43166,"journal":{"name":"Word Structure","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Word Structure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3366/word.2020.0169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we present an empirical study on the definition of compounds in English, the graded nature of the phenomenon and its correlations with the commonly used linguistic criteria for compoundhood. We create a resource that includes a diverse set of nominal compounds identified by two trained independent annotators in sentences from the proceedings of the European Parliament. In addition, the annotators provide ratings on the compoundhood of the identified compounds, and ratings for the applicability of six prominent linguistic criteria of compoundhood for each item. We show the controversy of defining compounds in practice by comparing the annotations of two annotators, and the graded nature of compoundhood. By measuring the correlation between compoundhood and the six diverse linguistic criteria using machine learning techniques, we show that some linguistic criteria are stronger predictors of compoundhood than others.