Tuomo Hiippala, Tuomas Väisänen, T. Toivonen, O. Järv
{"title":"绘制芬兰Twitter的语言图","authors":"Tuomo Hiippala, Tuomas Väisänen, T. Toivonen, O. Järv","doi":"10.51814/NM.99996","DOIUrl":null,"url":null,"abstract":"Twitter is a popular social media platform for scholarly research, because the user-generated content on the platform can also include geographic and temporal information. We collect a corpus of 38 million Twitter messages with two million geographical coordinates to map the languages used across Finland at the level of regions and municipalities. To cope with the high volume of social media data, we use automatic language identification and place of residence detection. We estimate the linguistic richness and diversity of users and locations using measures developed within ecology and information sciences. The analyses reveal a rich, multilingual environment that varies geographically and temporally, particularly between coastal, rural and urban areas. The results, which underline the mutual benefits of collaboration between linguists and geographers, provide a more fine-grained, accurate and comprehensive view of the languages used on Twitter in Finland than previously available.","PeriodicalId":43379,"journal":{"name":"NEUPHILOLOGISCHE MITTEILUNGEN","volume":"18 1","pages":"12-44"},"PeriodicalIF":0.1000,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Mapping the languages of Twitter in Finland:\",\"authors\":\"Tuomo Hiippala, Tuomas Väisänen, T. Toivonen, O. Järv\",\"doi\":\"10.51814/NM.99996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is a popular social media platform for scholarly research, because the user-generated content on the platform can also include geographic and temporal information. We collect a corpus of 38 million Twitter messages with two million geographical coordinates to map the languages used across Finland at the level of regions and municipalities. To cope with the high volume of social media data, we use automatic language identification and place of residence detection. We estimate the linguistic richness and diversity of users and locations using measures developed within ecology and information sciences. The analyses reveal a rich, multilingual environment that varies geographically and temporally, particularly between coastal, rural and urban areas. The results, which underline the mutual benefits of collaboration between linguists and geographers, provide a more fine-grained, accurate and comprehensive view of the languages used on Twitter in Finland than previously available.\",\"PeriodicalId\":43379,\"journal\":{\"name\":\"NEUPHILOLOGISCHE MITTEILUNGEN\",\"volume\":\"18 1\",\"pages\":\"12-44\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2020-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NEUPHILOLOGISCHE MITTEILUNGEN\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.51814/NM.99996\",\"RegionNum\":4,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NEUPHILOLOGISCHE MITTEILUNGEN","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51814/NM.99996","RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
Twitter is a popular social media platform for scholarly research, because the user-generated content on the platform can also include geographic and temporal information. We collect a corpus of 38 million Twitter messages with two million geographical coordinates to map the languages used across Finland at the level of regions and municipalities. To cope with the high volume of social media data, we use automatic language identification and place of residence detection. We estimate the linguistic richness and diversity of users and locations using measures developed within ecology and information sciences. The analyses reveal a rich, multilingual environment that varies geographically and temporally, particularly between coastal, rural and urban areas. The results, which underline the mutual benefits of collaboration between linguists and geographers, provide a more fine-grained, accurate and comprehensive view of the languages used on Twitter in Finland than previously available.