{"title":"匹兹堡大学英语语言学院语料库","authors":"Ben Naismith, Na-Rae Han, Alan Juffs","doi":"10.1075/ijlcr.21002.nai","DOIUrl":null,"url":null,"abstract":"\n This report introduces the University of Pittsburgh English Language Institute Corpus (PELIC;\n Juffs et al., 2020), a publicly available 4.2-million-word learner corpus of\n written texts. Collected over seven years in the University of Pittsburgh’s Intensive English Program, these texts were produced\n by more than 1,100 students with diverse linguistic backgrounds and proficiency levels. Unlike most learner corpora which are\n cross-sectional, PELIC is longitudinal, offering greater opportunities for tracking development in a natural classroom setting.\n This potential is illustrated in an overview of the research conducted to date with these data. The report also provides a\n description of PELIC’s creation and contents, including how the texts have been managed to facilitate natural language processing.\n Overall, the corpus contributes to the field of learner corpus research by adding to the pool of freely and publicly available\n learner corpora, supplemented by a useful set of Python tools and tutorials for accessing these data.","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The University of Pittsburgh English Language Institute Corpus (PELIC)\",\"authors\":\"Ben Naismith, Na-Rae Han, Alan Juffs\",\"doi\":\"10.1075/ijlcr.21002.nai\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This report introduces the University of Pittsburgh English Language Institute Corpus (PELIC;\\n Juffs et al., 2020), a publicly available 4.2-million-word learner corpus of\\n written texts. Collected over seven years in the University of Pittsburgh’s Intensive English Program, these texts were produced\\n by more than 1,100 students with diverse linguistic backgrounds and proficiency levels. Unlike most learner corpora which are\\n cross-sectional, PELIC is longitudinal, offering greater opportunities for tracking development in a natural classroom setting.\\n This potential is illustrated in an overview of the research conducted to date with these data. The report also provides a\\n description of PELIC’s creation and contents, including how the texts have been managed to facilitate natural language processing.\\n Overall, the corpus contributes to the field of learner corpus research by adding to the pool of freely and publicly available\\n learner corpora, supplemented by a useful set of Python tools and tutorials for accessing these data.\",\"PeriodicalId\":29715,\"journal\":{\"name\":\"International Journal of Learner Corpus Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2022-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Learner Corpus Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1075/ijlcr.21002.nai\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Learner Corpus Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/ijlcr.21002.nai","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
The University of Pittsburgh English Language Institute Corpus (PELIC)
This report introduces the University of Pittsburgh English Language Institute Corpus (PELIC;
Juffs et al., 2020), a publicly available 4.2-million-word learner corpus of
written texts. Collected over seven years in the University of Pittsburgh’s Intensive English Program, these texts were produced
by more than 1,100 students with diverse linguistic backgrounds and proficiency levels. Unlike most learner corpora which are
cross-sectional, PELIC is longitudinal, offering greater opportunities for tracking development in a natural classroom setting.
This potential is illustrated in an overview of the research conducted to date with these data. The report also provides a
description of PELIC’s creation and contents, including how the texts have been managed to facilitate natural language processing.
Overall, the corpus contributes to the field of learner corpus research by adding to the pool of freely and publicly available
learner corpora, supplemented by a useful set of Python tools and tutorials for accessing these data.