Gotzon Aurrekoetxea, E. Clua, Aitor Iglesias, I. Usobiaga, M. Salicrú
{"title":"Characterizing Dialect Groups: Distance and Informativeness Associated with Linguistic Features","authors":"Gotzon Aurrekoetxea, E. Clua, Aitor Iglesias, I. Usobiaga, M. Salicrú","doi":"10.25162/zdl-2020-0011","DOIUrl":null,"url":null,"abstract":"Starting from a distance which highlights similarities and differences among populations, dialectal classification allows the border between varieties to be established and transition zones (border populations) to be identified. The high cost of conducting and processing surveys to a great extent limits the size of the samples used, the number of localities and the time interval between fieldworks to determine dialect variation over time. Although recently other methods of gathering information have been developed, for those who prefer face to face methods we have introduced a method which allows researchers to select the subset of the most informative linguistic items. In order to maximize the similarity between the classifications obtained with the selected subset and the complete set of linguistic items, we have defined a measure of similarity which highlights redundancy between items (Simple Matching Coefficient), we have grouped items by similarity (K-means method), and finally, we have chosen the most representative linguistic items in the representation obtained from Ward’s method, proportionally according to the size of each one of the subgroups. This exploratory study made use of the Bourciez Corpus, focusing on Basque language data, to illustrate the methodology.","PeriodicalId":42450,"journal":{"name":"Zeitschrift Fur Dialektologie Und Linguistik","volume":"100 1","pages":"307-326"},"PeriodicalIF":0.4000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zeitschrift Fur Dialektologie Und Linguistik","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.25162/zdl-2020-0011","RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Starting from a distance which highlights similarities and differences among populations, dialectal classification allows the border between varieties to be established and transition zones (border populations) to be identified. The high cost of conducting and processing surveys to a great extent limits the size of the samples used, the number of localities and the time interval between fieldworks to determine dialect variation over time. Although recently other methods of gathering information have been developed, for those who prefer face to face methods we have introduced a method which allows researchers to select the subset of the most informative linguistic items. In order to maximize the similarity between the classifications obtained with the selected subset and the complete set of linguistic items, we have defined a measure of similarity which highlights redundancy between items (Simple Matching Coefficient), we have grouped items by similarity (K-means method), and finally, we have chosen the most representative linguistic items in the representation obtained from Ward’s method, proportionally according to the size of each one of the subgroups. This exploratory study made use of the Bourciez Corpus, focusing on Basque language data, to illustrate the methodology.
期刊介绍:
Aufsätze, Diskussionsbeiträge, Berichte, Rezensionen und Besprechungsexemplare werden nur an die Adresse der Redaktion erbeten. Für unaufgefordert eingesandte Rezensionsexemplare kann die Gewähr einer Besprechung nicht übernommen werden. Das neue Merkblatt für die Einrichtung von Manuskripten kann bei der Redaktion angefordert oder unter Richtlinien für die Erstellung von Beiträgen heruntergeladen werden.