分析不受限制的网络:芬兰语在线注册语料库

IF 0.5 3区 文学 0 LANGUAGE & LINGUISTICS
Valtteri Skantsi, Veronika Laippala
{"title":"分析不受限制的网络:芬兰语在线注册语料库","authors":"Valtteri Skantsi, Veronika Laippala","doi":"10.1017/s0332586523000021","DOIUrl":null,"url":null,"abstract":"\n This article introduces the Finnish Corpus of Online Registers (FinCORE) representing the full range of registers – situationally defined text varieties such as news and blogs – on the Finnish Internet. The extreme range of language use found online has challenged the study of registers. It has been unclear what registers the entire Internet includes, and if they can be sufficiently defined to allow for their analysis or classification, previous studies focusing on restricted sets of registers and English. FinCORE features 10,754 texts from the unrestricted web, manually annotated for their register using a scheme originally established for the Corpus of Online Registers of English (CORE). We present the FinCORE registers and compare them to CORE. Finally, we show that the FinCORE registers are sufficiently well-defined to allow for their automatic identification, thus opening novel possibilities for both linguistics and web-as-corpus research. FinCORE is published under an open license.","PeriodicalId":43203,"journal":{"name":"Nordic Journal of Linguistics","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analyzing the unrestricted web: The finnish corpus of online registers\",\"authors\":\"Valtteri Skantsi, Veronika Laippala\",\"doi\":\"10.1017/s0332586523000021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This article introduces the Finnish Corpus of Online Registers (FinCORE) representing the full range of registers – situationally defined text varieties such as news and blogs – on the Finnish Internet. The extreme range of language use found online has challenged the study of registers. It has been unclear what registers the entire Internet includes, and if they can be sufficiently defined to allow for their analysis or classification, previous studies focusing on restricted sets of registers and English. FinCORE features 10,754 texts from the unrestricted web, manually annotated for their register using a scheme originally established for the Corpus of Online Registers of English (CORE). We present the FinCORE registers and compare them to CORE. Finally, we show that the FinCORE registers are sufficiently well-defined to allow for their automatic identification, thus opening novel possibilities for both linguistics and web-as-corpus research. FinCORE is published under an open license.\",\"PeriodicalId\":43203,\"journal\":{\"name\":\"Nordic Journal of Linguistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2023-03-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nordic Journal of Linguistics\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1017/s0332586523000021\",\"RegionNum\":3,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nordic Journal of Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1017/s0332586523000021","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1

摘要

本文介绍了芬兰语在线语域语料库(FinCORE),它代表了芬兰互联网上所有的语域——情境定义的文本类型,如新闻和博客。网上发现的语言使用范围之广给语域研究带来了挑战。目前还不清楚整个互联网包括哪些注册表,也不清楚这些注册表是否可以被充分定义,以便进行分析或分类,之前的研究主要集中在有限的注册表和英语上。FinCORE拥有10,754篇来自无限制网络的文本,使用最初为在线英语注册语料库(CORE)建立的方案为其注册手工注释。我们介绍了FinCORE寄存器,并将它们与CORE进行了比较。最后,我们表明FinCORE注册表被充分定义以允许它们的自动识别,从而为语言学和网络作为语料库的研究开辟了新的可能性。FinCORE是在开放许可下发布的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Analyzing the unrestricted web: The finnish corpus of online registers
This article introduces the Finnish Corpus of Online Registers (FinCORE) representing the full range of registers – situationally defined text varieties such as news and blogs – on the Finnish Internet. The extreme range of language use found online has challenged the study of registers. It has been unclear what registers the entire Internet includes, and if they can be sufficiently defined to allow for their analysis or classification, previous studies focusing on restricted sets of registers and English. FinCORE features 10,754 texts from the unrestricted web, manually annotated for their register using a scheme originally established for the Corpus of Online Registers of English (CORE). We present the FinCORE registers and compare them to CORE. Finally, we show that the FinCORE registers are sufficiently well-defined to allow for their automatic identification, thus opening novel possibilities for both linguistics and web-as-corpus research. FinCORE is published under an open license.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.20
自引率
20.00%
发文量
22
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信