{"title":"语言文献项目中的语料库编制与开发","authors":"U. Mosel","doi":"10.1093/OXFORDHB/9780190610029.013.14","DOIUrl":null,"url":null,"abstract":"This chapter analyzes the specific characteristics of corpora of endangered languages from a corpus linguistic perspective. Therefore it starts with a definition of the central notions of corpus and text and then investigates how the heterogeneous language documentation corpora may fit into a general typology of corpora. The third section looks at the genres and registers that for methodological and theoretical reasons are typical for language documentations, whereas the fourth section deals with the structure of corpora and how texts of a particular content, genre or register can be accessed in archives. The format of the texts, which are typically annotated audio and video recordings, is described in the fifth section and deals with metadata, transcription, orthography, translation, glossing, and syntactic annotation. How annotated corpora can be analyzed for grammatical and lexical research is shown in the sixth section. The last section summarizes the specific features of language documentation corpora.","PeriodicalId":424278,"journal":{"name":"The Oxford Handbook of Endangered Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Corpus Compilation and Exploitation in Language Documentation Projects\",\"authors\":\"U. Mosel\",\"doi\":\"10.1093/OXFORDHB/9780190610029.013.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This chapter analyzes the specific characteristics of corpora of endangered languages from a corpus linguistic perspective. Therefore it starts with a definition of the central notions of corpus and text and then investigates how the heterogeneous language documentation corpora may fit into a general typology of corpora. The third section looks at the genres and registers that for methodological and theoretical reasons are typical for language documentations, whereas the fourth section deals with the structure of corpora and how texts of a particular content, genre or register can be accessed in archives. The format of the texts, which are typically annotated audio and video recordings, is described in the fifth section and deals with metadata, transcription, orthography, translation, glossing, and syntactic annotation. How annotated corpora can be analyzed for grammatical and lexical research is shown in the sixth section. The last section summarizes the specific features of language documentation corpora.\",\"PeriodicalId\":424278,\"journal\":{\"name\":\"The Oxford Handbook of Endangered Languages\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Oxford Handbook of Endangered Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/OXFORDHB/9780190610029.013.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Oxford Handbook of Endangered Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/OXFORDHB/9780190610029.013.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Corpus Compilation and Exploitation in Language Documentation Projects
This chapter analyzes the specific characteristics of corpora of endangered languages from a corpus linguistic perspective. Therefore it starts with a definition of the central notions of corpus and text and then investigates how the heterogeneous language documentation corpora may fit into a general typology of corpora. The third section looks at the genres and registers that for methodological and theoretical reasons are typical for language documentations, whereas the fourth section deals with the structure of corpora and how texts of a particular content, genre or register can be accessed in archives. The format of the texts, which are typically annotated audio and video recordings, is described in the fifth section and deals with metadata, transcription, orthography, translation, glossing, and syntactic annotation. How annotated corpora can be analyzed for grammatical and lexical research is shown in the sixth section. The last section summarizes the specific features of language documentation corpora.