Drawing areal information from a corpus of noisy dialect data

Journal of linguistic geography Pub Date : 2020-04-01 DOI:10.1017/jlg.2020.4

Alfred Lameli, Elvira Glaser, Philipp Stöckle

{"title":"Drawing areal information from a corpus of noisy dialect data","authors":"Alfred Lameli, Elvira Glaser, Philipp Stöckle","doi":"10.1017/jlg.2020.4","DOIUrl":null,"url":null,"abstract":"Abstract This article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.","PeriodicalId":93207,"journal":{"name":"Journal of linguistic geography","volume":"8 1","pages":"31 - 48"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/jlg.2020.4","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of linguistic geography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/jlg.2020.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Abstract This article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.

查看原文本刊更多论文

从嘈杂的方言数据语料库中提取区域信息

摘要本文分析了1933/34年瑞士德语方言的语言学调查数据。这些数据是由非专业人士使用拉丁字母制作的应用语音转录的印象式数据。由于缺乏预定义的标准化，语音转录非常异构。从技术角度来看，这会导致数据非常嘈杂，这就是为什么温克数据，特别是瑞士温克数据的有效性受到质疑的原因。使用计算语言学的方法，我们首次将温克数据与语言学专业人员几乎同时收集的语言数据进行了比较。与已出版的瑞士德语地图集（SDS）样本的直接比较表明，尽管数据很嘈杂，但它们仍然提供了可靠的信息，例如瑞士方言的空间结构。因此，该研究是其他地区处理非结构化Wenker数据的基于语料库的研究的成功试点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of linguistic geography

自引率

0.00%

发文量