Drawing areal information from a corpus of noisy dialect data

Alfred Lameli, Elvira Glaser, Philipp Stöckle
{"title":"Drawing areal information from a corpus of noisy dialect data","authors":"Alfred Lameli, Elvira Glaser, Philipp Stöckle","doi":"10.1017/jlg.2020.4","DOIUrl":null,"url":null,"abstract":"Abstract This article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.","PeriodicalId":93207,"journal":{"name":"Journal of linguistic geography","volume":"8 1","pages":"31 - 48"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/jlg.2020.4","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of linguistic geography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/jlg.2020.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Abstract This article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.
从嘈杂的方言数据语料库中提取区域信息
摘要本文分析了1933/34年瑞士德语方言的语言学调查数据。这些数据是由非专业人士使用拉丁字母制作的应用语音转录的印象式数据。由于缺乏预定义的标准化,语音转录非常异构。从技术角度来看,这会导致数据非常嘈杂,这就是为什么温克数据,特别是瑞士温克数据的有效性受到质疑的原因。使用计算语言学的方法,我们首次将温克数据与语言学专业人员几乎同时收集的语言数据进行了比较。与已出版的瑞士德语地图集(SDS)样本的直接比较表明,尽管数据很嘈杂,但它们仍然提供了可靠的信息,例如瑞士方言的空间结构。因此,该研究是其他地区处理非结构化Wenker数据的基于语料库的研究的成功试点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信