Probabilistic corpus-based dialectometry

Christoph Wolk, Benedikt Szmrecsanyi
{"title":"Probabilistic corpus-based dialectometry","authors":"Christoph Wolk, Benedikt Szmrecsanyi","doi":"10.1017/jlg.2018.6","DOIUrl":null,"url":null,"abstract":"Researchers in dialectometry have begun to explore measurements based on fundamentally quantitative metrics, often sourced from dialect corpora, as an alternative to the traditional signals derived from dialect atlases. This change of data type amplifies an existing issue in the classical paradigm, namely that locations may vary in coverage and that this affects the distance measurements: pairs involving a location with lower coverage suffer from greater noise and therefore imprecision. We propose a method for increasing robustness using generalized additive modeling, a statistical technique that allows leveraging the spatial arrangement of the data. The technique is applied to data from the British English dialect corpus FRED; the results are evaluated regarding their interpretability and according to several quantitative metrics. We conclude that data availability is an influential covariate in corpus-based dialectometry and beyond, and recommend that researchers be aware of this issue and of methods to alleviate it.","PeriodicalId":93207,"journal":{"name":"Journal of linguistic geography","volume":"6 1","pages":"56 - 75"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/jlg.2018.6","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of linguistic geography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/jlg.2018.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Researchers in dialectometry have begun to explore measurements based on fundamentally quantitative metrics, often sourced from dialect corpora, as an alternative to the traditional signals derived from dialect atlases. This change of data type amplifies an existing issue in the classical paradigm, namely that locations may vary in coverage and that this affects the distance measurements: pairs involving a location with lower coverage suffer from greater noise and therefore imprecision. We propose a method for increasing robustness using generalized additive modeling, a statistical technique that allows leveraging the spatial arrangement of the data. The technique is applied to data from the British English dialect corpus FRED; the results are evaluated regarding their interpretability and according to several quantitative metrics. We conclude that data availability is an influential covariate in corpus-based dialectometry and beyond, and recommend that researchers be aware of this issue and of methods to alleviate it.
基于概率语料库的辩证法
方言测量学的研究人员已经开始探索基于基本定量指标的测量方法,这些指标通常来源于方言语料库,作为从方言图谱中获得的传统信号的替代品。数据类型的这种变化放大了经典范式中存在的一个问题,即位置的覆盖范围可能不同,这会影响距离测量:涉及覆盖范围较低位置的对会受到更大的噪声,因此会不精确。我们提出了一种使用广义加性建模提高鲁棒性的方法,这是一种允许利用数据空间排列的统计技术。将该技术应用于英国英语方言语料库FRED中的数据;根据几个定量指标对结果的可解释性进行评估。我们得出的结论是,数据可用性是基于语料库的方言测量及其他方面的一个有影响力的协变量,并建议研究人员意识到这一问题以及缓解这一问题的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信