J. Lang. Technol. Comput. Linguistics最新文献_第6页

The Encoding of Avestan - Problems and Solutions 阿维斯陀语的编码-问题和解决方案

J. Lang. Technol. Comput. Linguistics Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.160

J. Gippert

{"title":"The Encoding of Avestan - Problems and Solutions","authors":"J. Gippert","doi":"10.21248/jlcl.27.2012.160","DOIUrl":"https://doi.org/10.21248/jlcl.27.2012.160","url":null,"abstract":"Avestan’ is the name of the ritual language of Zoroastrianism, which was the state religion of the Iranian empire in Achaemenid, Arsacid and Sasanid times, covering a time span of more than 1200 years. It is named after the ‘Avesta’, i.e., the collection of holy scriptures that form the basis of the religion which was allegedly founded by Zarathushtra, also known as Zoroaster, by about the beginning of the first millennium B.C. Together with Vedic Sanskrit, Avestan represents one of the most archaic witnesses of the Indo-Iranian branch of the Indo-European languages, which makes it especially interesting for historical-comparative linguistics. This is why the texts of the Avesta were among the first objects of electronic corpus building that were undertaken in the framework of Indo-European studies, leading to the establishment of the TITUS database (‘Thesaurus indogermanischer Textund Sprachmaterialien’). 2 Today, the complete Avestan corpus is available, together with elaborate search functions and an extended version of the subcorpus of the so-called ‘Yasna’, which covers a great deal of the attestation of variant readings. Right from the beginning of their computational work concerning the Avesta, the compilers had to cope with the fact that the texts contained in it have been transmitted in a special script written from right to left, which was also used for printing them in the scholarly editions used until today. It goes without saying that there was no way in the middle of the 1980s to encode the Avestan scriptures exactly as they are found in the manuscripts. Instead, we had to rely upon transcriptional devices that were dictated by the restrictions of character encoding as provided by the computer systems used. As the problems we had to face in this respect and the solutions we could apply are typical for the development of computational work on ancient languages, it seems worthwhile to sketch them out here. 1 The Avestan script and its transcription 1.1 Early western approaches to the Avestan script and its transcription The Avestan script has been known to western scholarship since the 17 century when the first accounts of the religion of the ‘Parsees’, i.e., Zoroastrians living in India and Iran, were published. The first notable description of the script is found in the travel report by JEAN CHARDIN who sojourned in Iran in 1673–7; in the 1711 edition of his report, the author provides an ‘alphabet of the ancient Persians’, together with a lithographed table contrasting the characters of the Avestan script with their Perso-Arabian equivalents; cf. the extract illustrated in Fig. 1.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120936084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Old Lithuanian Reference Corpus (SLIEKKAS) and Automated Grammatical Annotation 旧立陶宛语参考语料库(SLIEKKAS)和自动语法注释

J. Lang. Technol. Comput. Linguistics Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.165

Jolanta Gelumbeckaite, Mindaugas Sinkunas, Vytautas Zinkevicius

引用次数: 4

Digitalisierung historischer Glossare zur automatisierten Vorannotation von Textkorpora am Beispiel des Altdeutschen "历史词汇数字化——传统用语的自动扩展。"你看，这就是古德国的例子

J. Lang. Technol. Comput. Linguistics Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.162

Roland Mittmann

引用次数: 2

Manuelle Abgleichung bei automatisierter Vorannotation: Das Tagging grammatischer Kategorien im Referenzkorpus Altdeutsch 自动编程时手动计算:参考文件中的语法类别，能够自动计算

J. Lang. Technol. Comput. Linguistics Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.163

S. Linde

引用次数: 1

The Annotation of Morphology, Syntax and Information Structure in a Multilayered Diachronic Corpus 多层历时语料库中形态、句法和信息结构的标注

J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.143

Kristin Bech, K. Eide

引用次数: 5

Slate - A Tool for Creating and Maintaining Annotated Corpora Slate -一个创建和维护标注语料库的工具

J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.149

D. Kaplan, R. Iida, K. Nishina, T. Tokunaga

引用次数: 25

More, Faster: Accelerated Corpus Annotation with Statistical Taggers 更多，更快:使用统计标记器加速语料库注释

J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.154

Arne Skjærholt

引用次数: 6

Korpuslinguistik in der linguistischen Lehre: Erfolge und Misserfolge 语言学家研究的成败

J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.141

Noah Bubenhofer

引用次数: 3

Musisque Deoque: Text Retrieval on Critical Editionse 德文音乐:关键版的文本检索

J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.152

M. Manca, L. Spinazzè, P. Mastandrea, L. Tessarolo, Federico Boschetti

{"title":"Musisque Deoque: Text Retrieval on Critical Editionse","authors":"M. Manca, L. Spinazzè, P. Mastandrea, L. Tessarolo, Federico Boschetti","doi":"10.21248/jlcl.26.2011.152","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.152","url":null,"abstract":"The Musisque Deoque Project (MQDQ) aims at creating a digital archive of Latin poetry, from its origins to the late Italian Renaissance, equipped with critical apparatus and various exegetical and linguistic information. This project is focused on the study of synchronical and diachronical intertextuality as illustrated, e.g., in Cicu (2005). For this reason, we give strong attention to formal and material aspects of the text that actually played a relevant role in the poetical tradition. The fixed text of printed critical editions, aimed at the reconstruction as close as possible to the lost originals, provides just a snapshot of the tradition, which is intrisically dynamic, and gives to the modern reader a distorted image of what an ancient text was in fact. Fully searchable digital collections currently available are based on traditional critical editions, which are, as we just said, authoritarian texts; this authoritarianism is emphasized by the conversion from printed text to database, because usually the critical apparatus is cut away and there is no way for the reader to check a variant different from the one the editor put in the main text, often dubitanter, simply because he had to choose a variant. Limiting lexical searches to editor’s choices drives unavoidably both to false positives and false negatives, which need to be verified back on printed critical editions. False positives are due to possibly wrong emendations made by modern and contemporary scholars, provided by the text retrieval systems among the genuine occurrences, whereas false negatives are the likely variants excluded by editors biased by prejudices against specific linguistic and stylistic phenomena (such as the short-term repetiton, systematically emended by philologists of the last centuries). The purpose of Musisque Deoque is to overcome these limitations, retrieving not only the word keys quoted in the reference edition, but also the variants lying in the critical apparatus. In this way, further knowledge on the accomplished itinerary – from ancient operas during the subsequent ages until the Humanism and the Renaissance – can emerge.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128400867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

From Old Texts to Modern Spellings: An Experiment in Automatic Normalisation 从旧文本到现代拼写:自动规范化实验

J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.147

Iris Hendrickx, Rita Marquilhas

引用次数: 33