{"title":"语料库太小:胡巴双语词典中文本数据的使用","authors":"J. Spence","doi":"10.1093/IJL/ECAB006","DOIUrl":null,"url":null,"abstract":"\n Although corpus-driven methods have led to a revolution in the way lexicographers of some languages approach their work, text corpora for many less-studied languages are too small for such methods to be used reliably. Hupa, a Native American language of northwestern California, is one such language. Nonetheless, the Hupa Online Dictionary and Texts website relies heavily on its small text corpus to support development of the dictionary component. The corpus is especially important as a way to address Hupa’s complex and productive polysynthetic morphology, both derivational and inflectional, with words attested in the corpus providing the empirical basis for creating new entries and expanding the coverage of existing ones. It also provides a ready source of example sentences in context, figurative uses of language that might not come to light through elicitation, and aspects of linguistic variation that dictionary normalization tends to obscure. Thus, while corpus-driven lexicography may not be a realistic possibility at this point, corpus-based lexicography (Tognini-Bonelli 2001) is certainly within reach.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2021-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Corpus Too Small: Uses of Text Data in a Hupa-English Bilingual Dictionary\",\"authors\":\"J. Spence\",\"doi\":\"10.1093/IJL/ECAB006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Although corpus-driven methods have led to a revolution in the way lexicographers of some languages approach their work, text corpora for many less-studied languages are too small for such methods to be used reliably. Hupa, a Native American language of northwestern California, is one such language. Nonetheless, the Hupa Online Dictionary and Texts website relies heavily on its small text corpus to support development of the dictionary component. The corpus is especially important as a way to address Hupa’s complex and productive polysynthetic morphology, both derivational and inflectional, with words attested in the corpus providing the empirical basis for creating new entries and expanding the coverage of existing ones. It also provides a ready source of example sentences in context, figurative uses of language that might not come to light through elicitation, and aspects of linguistic variation that dictionary normalization tends to obscure. Thus, while corpus-driven lexicography may not be a realistic possibility at this point, corpus-based lexicography (Tognini-Bonelli 2001) is certainly within reach.\",\"PeriodicalId\":45657,\"journal\":{\"name\":\"International Journal of Lexicography\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2021-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Lexicography\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1093/IJL/ECAB006\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Lexicography","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/IJL/ECAB006","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
A Corpus Too Small: Uses of Text Data in a Hupa-English Bilingual Dictionary
Although corpus-driven methods have led to a revolution in the way lexicographers of some languages approach their work, text corpora for many less-studied languages are too small for such methods to be used reliably. Hupa, a Native American language of northwestern California, is one such language. Nonetheless, the Hupa Online Dictionary and Texts website relies heavily on its small text corpus to support development of the dictionary component. The corpus is especially important as a way to address Hupa’s complex and productive polysynthetic morphology, both derivational and inflectional, with words attested in the corpus providing the empirical basis for creating new entries and expanding the coverage of existing ones. It also provides a ready source of example sentences in context, figurative uses of language that might not come to light through elicitation, and aspects of linguistic variation that dictionary normalization tends to obscure. Thus, while corpus-driven lexicography may not be a realistic possibility at this point, corpus-based lexicography (Tognini-Bonelli 2001) is certainly within reach.
期刊介绍:
The International Journal of Lexicography was launched in 1988. Interdisciplinary as well as international, it is concerned with all aspects of lexicography, including issues of design, compilation and use, and with dictionaries of all languages, though the chief focus is on dictionaries of the major European languages - monolingual and bilingual, synchronic and diachronic, pedagogical and encyclopedic. The Journal recognizes the vital role of lexicographical theory and research, and of developments in related fields such as computational linguistics, and welcomes contributions in these areas.