Peculiarities of Avestan Manuscripts for Computational Linguistics

J. Lang. Technol. Comput. Linguistics Pub Date : 2012-07-01 DOI:10.21248/jlcl.27.2012.161

Thomas Jügel

{"title":"Peculiarities of Avestan Manuscripts for Computational Linguistics","authors":"Thomas Jügel","doi":"10.21248/jlcl.27.2012.161","DOIUrl":null,"url":null,"abstract":"This paper will discuss several computational tools f r creating a stemma of Avestan manuscripts, such as: a letter similarity matrix, a mor phological expander, and co-occurrence networks. After a short introduction to Avestan and Avestan manuscripts and a representation of Avestan peculiarities concerning the creati on of stemmata, the operatability of the above-mentioned tools for this text corpus will be discussed. Finally, I will give a brief outlook on the complexity of a database structure f o Avestan texts. Introduction The Avesta, represented by the edition of G ELDNER (1886-96), appears to be a sort of Bible containing several books or chapters, cf. S KJÆRVØ’s “sacred book of the Zoroastrians” (2009: 44); and, indeed, in Middle Iranian times (i .e., before 600 AD) there existed a kind of text corpus, rather than ‘a book’, of holy texts (C ANTERA 2004). However, GELDNER’s edition disguises the actual texts of the manuscripts because what we have today is not a book but a collection of ceremonies attested in various manuscripts. Avestan is the term for an Old Iranian language, as such a member of the IndoEuropean language family. The actual name of the la ngu ge is not known to us. The name ‘Avestan’ is taken from Middle Persian texts which refer to their religious text corpus as the “abest ā(g)”. When manuscripts containing these religious t exts came to light for European research, they were referred to as “Avesta” and the language as “Avestan”. 2 Avestan is known to us in two varieties, called “Ol d Avestan” and “Young Avestan”. This is so because they display two different chron ol gical layers of Avestan. However, they also differ in some linguistic respect so that t ey represent two different dialects of the same language (e.g., genitive singular of xratu“wisdom” is xratə̄uš in Old Avestan but xraθβō in Young Avestan, for further examples see DE VAAN 2003: 8ff.). The Avestan manuscripts (henceforth MS) can be sort ed into several groups, the main grouping is: 1) the ‘Pahlavi-MSs’, and 2) the ‘Sade -MSs’. The Pahlavi-MSs contain the Avestan text plus its translation and commentaries, g nerally Middle Persian, but there are translations into Sanskrit, Gujarati and/or New Per sian as well. 3 The Sade-MSs (i.e., the “pure” MS) only contain ritual instructions in Midd le Persian, etc., besides the Avestan text. The Pahlavi-MS served as exegetical texts written f or scholarly use only. On the contrary, the Sade-MSs were for the daily use in the ceremoni es. These different purposes had an influence on the copying process (cf. Section 1). The aforementioned grouping can be made by first gl ance at the MS because of the various writings these MSs do or do not contain. Be sid s the grouping into Pahlaviand Sade-MSs, the MSs are further classified into diffe rent ceremonies. There are four of them: the Yasna Rapihwin, V īsprad, Yašt, and V īdēvdād ceremony. Depending on the season or on the deity who is invoked, there are further diff erences in what is otherwise the same","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Lang. Technol. Comput. Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21248/jlcl.27.2012.161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This paper will discuss several computational tools f r creating a stemma of Avestan manuscripts, such as: a letter similarity matrix, a mor phological expander, and co-occurrence networks. After a short introduction to Avestan and Avestan manuscripts and a representation of Avestan peculiarities concerning the creati on of stemmata, the operatability of the above-mentioned tools for this text corpus will be discussed. Finally, I will give a brief outlook on the complexity of a database structure f o Avestan texts. Introduction The Avesta, represented by the edition of G ELDNER (1886-96), appears to be a sort of Bible containing several books or chapters, cf. S KJÆRVØ’s “sacred book of the Zoroastrians” (2009: 44); and, indeed, in Middle Iranian times (i .e., before 600 AD) there existed a kind of text corpus, rather than ‘a book’, of holy texts (C ANTERA 2004). However, GELDNER’s edition disguises the actual texts of the manuscripts because what we have today is not a book but a collection of ceremonies attested in various manuscripts. Avestan is the term for an Old Iranian language, as such a member of the IndoEuropean language family. The actual name of the la ngu ge is not known to us. The name ‘Avestan’ is taken from Middle Persian texts which refer to their religious text corpus as the “abest ā(g)”. When manuscripts containing these religious t exts came to light for European research, they were referred to as “Avesta” and the language as “Avestan”. 2 Avestan is known to us in two varieties, called “Ol d Avestan” and “Young Avestan”. This is so because they display two different chron ol gical layers of Avestan. However, they also differ in some linguistic respect so that t ey represent two different dialects of the same language (e.g., genitive singular of xratu“wisdom” is xratə̄uš in Old Avestan but xraθβō in Young Avestan, for further examples see DE VAAN 2003: 8ff.). The Avestan manuscripts (henceforth MS) can be sort ed into several groups, the main grouping is: 1) the ‘Pahlavi-MSs’, and 2) the ‘Sade -MSs’. The Pahlavi-MSs contain the Avestan text plus its translation and commentaries, g nerally Middle Persian, but there are translations into Sanskrit, Gujarati and/or New Per sian as well. 3 The Sade-MSs (i.e., the “pure” MS) only contain ritual instructions in Midd le Persian, etc., besides the Avestan text. The Pahlavi-MS served as exegetical texts written f or scholarly use only. On the contrary, the Sade-MSs were for the daily use in the ceremoni es. These different purposes had an influence on the copying process (cf. Section 1). The aforementioned grouping can be made by first gl ance at the MS because of the various writings these MSs do or do not contain. Be sid s the grouping into Pahlaviand Sade-MSs, the MSs are further classified into diffe rent ceremonies. There are four of them: the Yasna Rapihwin, V īsprad, Yašt, and V īdēvdād ceremony. Depending on the season or on the deity who is invoked, there are further diff erences in what is otherwise the same

查看原文本刊更多论文

计算语言学中阿维斯陀手稿的特点

本文将讨论用于创建阿维斯陀手稿体系的几种计算工具，例如:字母相似矩阵，更多的生理扩展器和共现网络。在对阿维斯陀语和阿维斯陀语手稿的简短介绍以及阿维斯陀语关于词干的创造的特点的表现之后，将讨论上述文本语料库工具的可操作性。最后，我将简要介绍一下阿维斯陀文本数据库结构的复杂性。阿维斯塔，代表的版本G ELDNER(1886-96)，似乎是一种圣经包含几本书或章节，参见S KJÆRVØ的“琐罗亚斯德教神圣的书”(2009:44);事实上，在中伊朗时期(例如:(公元600年之前)存在一种文本语料库，而不是“一本书”，神圣的文本(C ANTERA 2004)。然而，GELDNER的版本掩盖了手稿的实际文本，因为我们今天拥有的不是一本书，而是各种手稿中证明的仪式的集合。阿维斯陀语是古伊朗语的术语，作为印欧语系的一员。我们不知道拉古格的确切名字。“阿维斯陀”这个名字取自中古波斯文本，这些文本将他们的宗教文本语料库称为“abest ā(g)”。当包含这些宗教文本的手稿为欧洲研究所发现时，它们被称为“阿维斯陀”，这种语言被称为“阿维斯陀语”。我们知道阿维斯陀有两种变体，称为“老阿维斯陀”和“年轻阿维斯陀”。这是因为它们显示了阿维斯陀的两个不同的历史层次。然而，它们在某些语言学方面也有所不同，因此它们代表同一种语言的两种不同方言(例如，xratu“智慧”的属格单数在古阿维斯陀语中是xrat æ ya usi，但在年轻阿维斯陀语中是xraθβ γ，进一步的例子见DE VAAN 2003: 8ff.)。阿维斯陀手稿(以下简称MS)可以分为几个组，主要的组是:1)“巴列维-MS”，2)“萨德-MS”。巴列维- mss包含阿维斯陀文本及其翻译和评论，通常是中波斯语，但也有梵语，古吉拉特语和/或新波斯语的翻译。3萨德-MS(即“纯粹的”MS)除了阿维斯陀文本外，只包含中波斯语的仪式说明等。巴列维- ms作为训诂文本只写学术用途。相反，萨德小姐是在日常仪式上使用的。这些不同的目的对复制过程产生了影响(参见第1节)。上述分组可以通过对MS的第一眼进行，因为这些MS包含或不包含各种著述。他说，在分组为Pahlaviand Sade-MSs之后，MSs进一步分为不同的仪式。有四种仪式:Yasna Rapihwin, V īsprad, Yašt和V īdēvdād仪式。根据季节或被召唤的神，在其他方面相同的东西会有进一步的差异

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Lang. Technol. Comput. Linguistics

自引率

0.00%

发文量