{"title":"CELI语料库:一个新的在线学习者语料库的设计和语言注释","authors":"Stefania Spina, Irene Fioravanti, Luciana Forti, Fabio Zanda","doi":"10.1177/02676583231176370","DOIUrl":null,"url":null,"abstract":"This article introduces the CELI corpus, a new learner corpus of written Italian consisting of ca. 600,000 tokens, evenly distributed among CEFR (Common European Framework of Reference for Languages) proficiency levels B1, B2, C1 and C2. The collected texts derive from the language certification exams administered by the University for Foreigners of Perugia all around the world. The corpus contains rich metadata pertaining to text-related and learner-related variables. It expands the domain of learner corpora by being, among other things, both freely available online to the research community, and by focusing on a target language other than English. The article also presents and evaluates the POS-tagging procedure, thus contributing to best practices in learner corpus annotation.","PeriodicalId":47414,"journal":{"name":"Second Language Research","volume":"253 ","pages":"0"},"PeriodicalIF":1.9000,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The CELI corpus: Design and linguistic annotation of a new online learner corpus\",\"authors\":\"Stefania Spina, Irene Fioravanti, Luciana Forti, Fabio Zanda\",\"doi\":\"10.1177/02676583231176370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article introduces the CELI corpus, a new learner corpus of written Italian consisting of ca. 600,000 tokens, evenly distributed among CEFR (Common European Framework of Reference for Languages) proficiency levels B1, B2, C1 and C2. The collected texts derive from the language certification exams administered by the University for Foreigners of Perugia all around the world. The corpus contains rich metadata pertaining to text-related and learner-related variables. It expands the domain of learner corpora by being, among other things, both freely available online to the research community, and by focusing on a target language other than English. The article also presents and evaluates the POS-tagging procedure, thus contributing to best practices in learner corpus annotation.\",\"PeriodicalId\":47414,\"journal\":{\"name\":\"Second Language Research\",\"volume\":\"253 \",\"pages\":\"0\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Second Language Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/02676583231176370\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Second Language Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/02676583231176370","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
The CELI corpus: Design and linguistic annotation of a new online learner corpus
This article introduces the CELI corpus, a new learner corpus of written Italian consisting of ca. 600,000 tokens, evenly distributed among CEFR (Common European Framework of Reference for Languages) proficiency levels B1, B2, C1 and C2. The collected texts derive from the language certification exams administered by the University for Foreigners of Perugia all around the world. The corpus contains rich metadata pertaining to text-related and learner-related variables. It expands the domain of learner corpora by being, among other things, both freely available online to the research community, and by focusing on a target language other than English. The article also presents and evaluates the POS-tagging procedure, thus contributing to best practices in learner corpus annotation.
期刊介绍:
Second Language Research is a high quality international peer reviewed journal, currently ranked in the top 20 journals in its field by Thomson Scientific (formerly ISI). SLR publishes theoretical and experimental papers concerned with second language acquisition and second language performance, and adheres to a rigorous double-blind reviewing policy in which the identity of both the reviewer and author are always concealed from both parties.