Lörres, Möppes, and the Swiss. (Re)Discovering regional patterns in anonymous social media data

Christoph Purschke, Dirk Hovy
{"title":"Lörres, Möppes, and the Swiss. (Re)Discovering regional patterns in anonymous social media data","authors":"Christoph Purschke, Dirk Hovy","doi":"10.1017/jlg.2019.10","DOIUrl":null,"url":null,"abstract":"Abstract We study regional similarities and differences in language use on an anonymous mobile chat application in the German-speaking area. We use a neural network on 2.3 million online conversations to automatically learn representations of words and cities. These linguistic-use-based representations capture regional distinctions in a high-dimensional vector space that can be clustered and visualized to discover patterns in the data. We find that the resulting regional patterns are closely linked to the traditional division of German dialects, even though most of the conversations are written in standard German. The resulting maps correspond to traditional dialect divisions and language-external spatial structures, with a few notable exceptions that can be explained through external factors. Our method also facilitates two qualitative analyses, allowing us to discover geographically-pertinent words for various regional levels, as well as creating regional group-specific style profiles based on various linguistic resources. The results of our study strongly suggest the existence of region-specific patterns of language use (“digital regiolects”) representing distinctive strategies of linguistic stylization in relation to linguistic resources and topics. As a methodological contribution, we show how linguistic theory can drive the application and direction of neural network-based representation learning, and how their judicious application provides the basis for qualitative analysis of large-scale data collections.","PeriodicalId":93207,"journal":{"name":"Journal of linguistic geography","volume":"7 1","pages":"113 - 134"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/jlg.2019.10","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of linguistic geography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/jlg.2019.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Abstract We study regional similarities and differences in language use on an anonymous mobile chat application in the German-speaking area. We use a neural network on 2.3 million online conversations to automatically learn representations of words and cities. These linguistic-use-based representations capture regional distinctions in a high-dimensional vector space that can be clustered and visualized to discover patterns in the data. We find that the resulting regional patterns are closely linked to the traditional division of German dialects, even though most of the conversations are written in standard German. The resulting maps correspond to traditional dialect divisions and language-external spatial structures, with a few notable exceptions that can be explained through external factors. Our method also facilitates two qualitative analyses, allowing us to discover geographically-pertinent words for various regional levels, as well as creating regional group-specific style profiles based on various linguistic resources. The results of our study strongly suggest the existence of region-specific patterns of language use (“digital regiolects”) representing distinctive strategies of linguistic stylization in relation to linguistic resources and topics. As a methodological contribution, we show how linguistic theory can drive the application and direction of neural network-based representation learning, and how their judicious application provides the basis for qualitative analysis of large-scale data collections.
Lörres、Möppes和瑞士人。(Re)在匿名社交媒体数据中发现区域模式
摘要我们在德语区的一个匿名手机聊天应用程序上研究了语言使用的地区相似性和差异性。我们在230万次在线对话中使用神经网络来自动学习单词和城市的表示。这些基于语言使用的表示在高维向量空间中捕捉区域差异,可以对其进行聚类和可视化,以发现数据中的模式。我们发现,由此产生的区域模式与德语方言的传统划分密切相关,尽管大多数对话都是用标准德语书写的。由此产生的地图对应于传统的方言划分和语言外部空间结构,只有少数显著的例外可以通过外部因素来解释。我们的方法还促进了两种定性分析,使我们能够发现不同地区层面的地理相关词汇,并基于各种语言资源创建特定于地区群体的风格档案。我们的研究结果强烈表明,存在特定地区的语言使用模式(“数字区域”),代表了与语言资源和主题相关的独特语言风格化策略。作为一项方法学贡献,我们展示了语言学理论如何推动基于神经网络的表示学习的应用和方向,以及它们的明智应用如何为大规模数据收集的定性分析提供基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信