语料库太小：胡巴双语词典中文本数据的使用

IF 0.8 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Lexicography Pub Date : 2021-05-25 DOI:10.1093/IJL/ECAB006

J. Spence

{"title":"语料库太小：胡巴双语词典中文本数据的使用","authors":"J. Spence","doi":"10.1093/IJL/ECAB006","DOIUrl":null,"url":null,"abstract":"\n Although corpus-driven methods have led to a revolution in the way lexicographers of some languages approach their work, text corpora for many less-studied languages are too small for such methods to be used reliably. Hupa, a Native American language of northwestern California, is one such language. Nonetheless, the Hupa Online Dictionary and Texts website relies heavily on its small text corpus to support development of the dictionary component. The corpus is especially important as a way to address Hupa’s complex and productive polysynthetic morphology, both derivational and inflectional, with words attested in the corpus providing the empirical basis for creating new entries and expanding the coverage of existing ones. It also provides a ready source of example sentences in context, figurative uses of language that might not come to light through elicitation, and aspects of linguistic variation that dictionary normalization tends to obscure. Thus, while corpus-driven lexicography may not be a realistic possibility at this point, corpus-based lexicography (Tognini-Bonelli 2001) is certainly within reach.","PeriodicalId":45657,"journal":{"name":"International Journal of Lexicography","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2021-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Corpus Too Small: Uses of Text Data in a Hupa-English Bilingual Dictionary\",\"authors\":\"J. Spence\",\"doi\":\"10.1093/IJL/ECAB006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Although corpus-driven methods have led to a revolution in the way lexicographers of some languages approach their work, text corpora for many less-studied languages are too small for such methods to be used reliably. Hupa, a Native American language of northwestern California, is one such language. Nonetheless, the Hupa Online Dictionary and Texts website relies heavily on its small text corpus to support development of the dictionary component. The corpus is especially important as a way to address Hupa’s complex and productive polysynthetic morphology, both derivational and inflectional, with words attested in the corpus providing the empirical basis for creating new entries and expanding the coverage of existing ones. It also provides a ready source of example sentences in context, figurative uses of language that might not come to light through elicitation, and aspects of linguistic variation that dictionary normalization tends to obscure. Thus, while corpus-driven lexicography may not be a realistic possibility at this point, corpus-based lexicography (Tognini-Bonelli 2001) is certainly within reach.\",\"PeriodicalId\":45657,\"journal\":{\"name\":\"International Journal of Lexicography\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2021-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Lexicography\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1093/IJL/ECAB006\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Lexicography","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/IJL/ECAB006","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 2

摘要

尽管语料库驱动的方法已经导致了一些语言的词典编纂者处理工作方式的革命，但许多研究较少的语言的文本语料库太小，无法可靠地使用这些方法。胡帕语是加利福尼亚州西北部的一种美洲原住民语言，就是这样一种语言。尽管如此，Hupa在线词典和文本网站在很大程度上依赖其小型文本语料库来支持词典组件的开发。语料库作为处理胡帕复杂而富有成效的多合成形态（包括派生形态和屈折形态）的一种方式尤为重要，语料库中的单词为创建新条目和扩大现有条目的覆盖范围提供了经验基础。它还提供了上下文中的例句、可能不会通过启发而曝光的语言的比喻用法，以及词典规范化倾向于模糊的语言变体方面的现成来源。因此，尽管语料库驱动的词典编纂在这一点上可能不是一种现实的可能性，但基于语料库的词典编纂（Tognini Bonelli，2001）肯定是触手可及的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Corpus Too Small: Uses of Text Data in a Hupa-English Bilingual Dictionary

Although corpus-driven methods have led to a revolution in the way lexicographers of some languages approach their work, text corpora for many less-studied languages are too small for such methods to be used reliably. Hupa, a Native American language of northwestern California, is one such language. Nonetheless, the Hupa Online Dictionary and Texts website relies heavily on its small text corpus to support development of the dictionary component. The corpus is especially important as a way to address Hupa’s complex and productive polysynthetic morphology, both derivational and inflectional, with words attested in the corpus providing the empirical basis for creating new entries and expanding the coverage of existing ones. It also provides a ready source of example sentences in context, figurative uses of language that might not come to light through elicitation, and aspects of linguistic variation that dictionary normalization tends to obscure. Thus, while corpus-driven lexicography may not be a realistic possibility at this point, corpus-based lexicography (Tognini-Bonelli 2001) is certainly within reach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Lexicography Multiple-

CiteScore

1.90

自引率

20.00%

发文量

期刊介绍： The International Journal of Lexicography was launched in 1988. Interdisciplinary as well as international, it is concerned with all aspects of lexicography, including issues of design, compilation and use, and with dictionaries of all languages, though the chief focus is on dictionaries of the major European languages - monolingual and bilingual, synchronic and diachronic, pedagogical and encyclopedic. The Journal recognizes the vital role of lexicographical theory and research, and of developments in related fields such as computational linguistics, and welcomes contributions in these areas.