构建大型带注释的英语语料库:宾州树库

Comput. Linguistics Pub Date : 1993-06-01 DOI:10.21236/ada273556

Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz

{"title":"构建大型带注释的英语语料库:宾州树库","authors":"Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz","doi":"10.21236/ada273556","DOIUrl":null,"url":null,"abstract":"Abstract : As a result of this grant, the researchers have now published oil CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, with over 3 million words of that material assigned skeletal grammatical structure. This material now includes a fully hand-parsed version of the classic Brown corpus. About one half of the papers at the ACL Workshop on Using Large Text Corpora this past summer were based on the materials generated by this grant.","PeriodicalId":360119,"journal":{"name":"Comput. Linguistics","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8621","resultStr":"{\"title\":\"Building a Large Annotated Corpus of English: The Penn Treebank\",\"authors\":\"Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz\",\"doi\":\"10.21236/ada273556\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract : As a result of this grant, the researchers have now published oil CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, with over 3 million words of that material assigned skeletal grammatical structure. This material now includes a fully hand-parsed version of the classic Brown corpus. About one half of the papers at the ACL Workshop on Using Large Text Corpora this past summer were based on the materials generated by this grant.\",\"PeriodicalId\":360119,\"journal\":{\"name\":\"Comput. Linguistics\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1993-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8621\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comput. Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21236/ada273556\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comput. Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21236/ada273556","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8621

摘要

摘要:由于这项资助，研究人员现在已经发表了一个超过400万字的带有词性标注(POS)标签的运行文本语料库，其中超过300万字的语料库被分配了骨架语法结构。该材料现在包括经典布朗语料库的完全手工解析版本。在刚刚过去的这个夏天，在使用大文本语料库的ACL研讨会上，大约有一半的论文是基于这项资助产生的材料。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Building a Large Annotated Corpus of English: The Penn Treebank

Abstract : As a result of this grant, the researchers have now published oil CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, with over 3 million words of that material assigned skeletal grammatical structure. This material now includes a fully hand-parsed version of the classic Brown corpus. About one half of the papers at the ACL Workshop on Using Large Text Corpora this past summer were based on the materials generated by this grant.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Comput. Linguistics

自引率

0.00%

发文量