{"title":"The validity of mixed-effects regression for analysing linguistic distance matrices: a simulation study","authors":"John L.A. Huisman, Roeland van Hout","doi":"10.5117/tet2023.1.004.huis","DOIUrl":null,"url":null,"abstract":"Recent work in dialectometry has proposed the use of linear mixed-effects regression (LMER) for analysing full distance matrices. While the outcomes are promising, work is needed to confirm that such outcomes are valid, given that the analysis of distance matrices using this method is not established. The current contribution provides a supporting framework for this approach by testing its validity through a series of simulated datasets. We analysed the generated data using LMER, and compared its performance to that of the well-established multiple regression on distance matrices (MRM) approach. We find that the LMER results are on par with—and sometimes even exceed—the results obtained from MRM. The potential to include random effects makes LMER a more powerful tool than MRM to examine a linguistic area as a whole, with all pairwise comparisons included, making it an ideal candidate for big data analyses that are becoming more prevalent with the ongoing digitisation of large dialect databases.","PeriodicalId":30675,"journal":{"name":"Taal en Tongval Language Variation in the Low Countries","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Taal en Tongval Language Variation in the Low Countries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5117/tet2023.1.004.huis","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent work in dialectometry has proposed the use of linear mixed-effects regression (LMER) for analysing full distance matrices. While the outcomes are promising, work is needed to confirm that such outcomes are valid, given that the analysis of distance matrices using this method is not established. The current contribution provides a supporting framework for this approach by testing its validity through a series of simulated datasets. We analysed the generated data using LMER, and compared its performance to that of the well-established multiple regression on distance matrices (MRM) approach. We find that the LMER results are on par with—and sometimes even exceed—the results obtained from MRM. The potential to include random effects makes LMER a more powerful tool than MRM to examine a linguistic area as a whole, with all pairwise comparisons included, making it an ideal candidate for big data analyses that are becoming more prevalent with the ongoing digitisation of large dialect databases.