J. Lang. Technol. Comput. Linguistics最新文献

筛选
英文 中文
The Encoding of Avestan - Problems and Solutions 阿维斯陀语的编码-问题和解决方案
J. Lang. Technol. Comput. Linguistics Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.160
J. Gippert
{"title":"The Encoding of Avestan - Problems and Solutions","authors":"J. Gippert","doi":"10.21248/jlcl.27.2012.160","DOIUrl":"https://doi.org/10.21248/jlcl.27.2012.160","url":null,"abstract":"Avestan’ is the name of the ritual language of Zoroastrianism, which was the state religion of the Iranian empire in Achaemenid, Arsacid and Sasanid times, covering a time span of more than 1200 years. It is named after the ‘Avesta’, i.e., the collection of holy scriptures that form the basis of the religion which was allegedly founded by Zarathushtra, also known as Zoroaster, by about the beginning of the first millennium B.C. Together with Vedic Sanskrit, Avestan represents one of the most archaic witnesses of the Indo-Iranian branch of the Indo-European languages, which makes it especially interesting for historical-comparative linguistics. This is why the texts of the Avesta were among the first objects of electronic corpus building that were undertaken in the framework of Indo-European studies, leading to the establishment of the TITUS database (‘Thesaurus indogermanischer Textund Sprachmaterialien’). 2 Today, the complete Avestan corpus is available, together with elaborate search functions and an extended version of the subcorpus of the so-called ‘Yasna’, which covers a great deal of the attestation of variant readings. Right from the beginning of their computational work concerning the Avesta, the compilers had to cope with the fact that the texts contained in it have been transmitted in a special script written from right to left, which was also used for printing them in the scholarly editions used until today. It goes without saying that there was no way in the middle of the 1980s to encode the Avestan scriptures exactly as they are found in the manuscripts. Instead, we had to rely upon transcriptional devices that were dictated by the restrictions of character encoding as provided by the computer systems used. As the problems we had to face in this respect and the solutions we could apply are typical for the development of computational work on ancient languages, it seems worthwhile to sketch them out here. 1 The Avestan script and its transcription 1.1 Early western approaches to the Avestan script and its transcription The Avestan script has been known to western scholarship since the 17 century when the first accounts of the religion of the ‘Parsees’, i.e., Zoroastrians living in India and Iran, were published. The first notable description of the script is found in the travel report by JEAN CHARDIN who sojourned in Iran in 1673–7; in the 1711 edition of his report, the author provides an ‘alphabet of the ancient Persians’, together with a lithographed table contrasting the characters of the Avestan script with their Perso-Arabian equivalents; cf. the extract illustrated in Fig. 1.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120936084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Old Lithuanian Reference Corpus (SLIEKKAS) and Automated Grammatical Annotation 旧立陶宛语参考语料库(SLIEKKAS)和自动语法注释
J. Lang. Technol. Comput. Linguistics Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.165
Jolanta Gelumbeckaite, Mindaugas Sinkunas, Vytautas Zinkevicius
{"title":"Old Lithuanian Reference Corpus (SLIEKKAS) and Automated Grammatical Annotation","authors":"Jolanta Gelumbeckaite, Mindaugas Sinkunas, Vytautas Zinkevicius","doi":"10.21248/jlcl.27.2012.165","DOIUrl":"https://doi.org/10.21248/jlcl.27.2012.165","url":null,"abstract":"","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121882748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Digitalisierung historischer Glossare zur automatisierten Vorannotation von Textkorpora am Beispiel des Altdeutschen "历史词汇数字化——传统用语的自动扩展。"你看,这就是古德国的例子
J. Lang. Technol. Comput. Linguistics Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.162
Roland Mittmann
{"title":"Digitalisierung historischer Glossare zur automatisierten Vorannotation von Textkorpora am Beispiel des Altdeutschen","authors":"Roland Mittmann","doi":"10.21248/jlcl.27.2012.162","DOIUrl":"https://doi.org/10.21248/jlcl.27.2012.162","url":null,"abstract":"Um Worter und Wortformen innerhalb von Texten auffindbar zu machen, waren im vordigitalen Zeitalter Glossare unerlasslich. Heute lassen sich ihre Daten automatisiert mit den zugehorigen Texten zusammenfuhren, um die Texte so mit weiteren Informationen anzureichern. Fur die dazu notwendige Digitalisierung der Glossare ist angesichts des historischen Druckbildes und der oft nicht eindeutigen Informationsauszeichnung ein manuelles Vorgehen am zielfuhrendsten. Je nach Strukturierung des Glossars und nach Art und Uberlieferungsdichte des behandelten Textes ergeben sich dabei unterschiedliche Herausforderungen und Probleme. Diese werden am Beispiel der Digitalisierung der Glossare zum Althochdeutschen und Altsachsischen dargestellt.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121508683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Manuelle Abgleichung bei automatisierter Vorannotation: Das Tagging grammatischer Kategorien im Referenzkorpus Altdeutsch 自动编程时手动计算:参考文件中的语法类别,能够自动计算
J. Lang. Technol. Comput. Linguistics Pub Date : 2012-07-01 DOI: 10.21248/jlcl.27.2012.163
S. Linde
{"title":"Manuelle Abgleichung bei automatisierter Vorannotation: Das Tagging grammatischer Kategorien im Referenzkorpus Altdeutsch","authors":"S. Linde","doi":"10.21248/jlcl.27.2012.163","DOIUrl":"https://doi.org/10.21248/jlcl.27.2012.163","url":null,"abstract":"! ∀ # ∃ % ∃ ∃ & & # ∋ ∃ ( ∃ ) % ( ∗ + % , & ∃ ( % + #","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117204346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Annotation of Morphology, Syntax and Information Structure in a Multilayered Diachronic Corpus 多层历时语料库中形态、句法和信息结构的标注
J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.143
Kristin Bech, K. Eide
{"title":"The Annotation of Morphology, Syntax and Information Structure in a Multilayered Diachronic Corpus","authors":"Kristin Bech, K. Eide","doi":"10.21248/jlcl.26.2011.143","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.143","url":null,"abstract":"","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126189651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Slate - A Tool for Creating and Maintaining Annotated Corpora Slate -一个创建和维护标注语料库的工具
J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.149
D. Kaplan, R. Iida, K. Nishina, T. Tokunaga
{"title":"Slate - A Tool for Creating and Maintaining Annotated Corpora","authors":"D. Kaplan, R. Iida, K. Nishina, T. Tokunaga","doi":"10.21248/jlcl.26.2011.149","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.149","url":null,"abstract":"Recent research trends of the last five years show that richly annotated corpora inspire novel research. These richly annotated corpora are indispensable for progressing research, but also more difficult to manage and maintain due to increasing complexity – what is needed is a way to manage the annotation project in its entirety. However, annotation project management has received little attention, with tools predominately focusing on single document annotation. Therefore, we define a list of corpus creation and management needs for annotation systems, and then introduce our multi-purpose annotation and management system Slate to address these needs through use of a case study, showing how project management is essential to creating good corpora.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133184397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
More, Faster: Accelerated Corpus Annotation with Statistical Taggers 更多,更快:使用统计标记器加速语料库注释
J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.154
Arne Skjærholt
{"title":"More, Faster: Accelerated Corpus Annotation with Statistical Taggers","authors":"Arne Skjærholt","doi":"10.21248/jlcl.26.2011.154","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.154","url":null,"abstract":"We present our experiments with annotating a Latin corpus using an assisted annotation procedure where the corpus to be annotated is preannotated by a statistical tagger. This assisted procedure gives a notable reduction in annotator error compared to the unassisted annotation of previous annotation efforts, even with a huge tagset (1 000 tags) and modest tagger accuracy due to limited training data and domain effects.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115842633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Korpuslinguistik in der linguistischen Lehre: Erfolge und Misserfolge 语言学家研究的成败
J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.141
Noah Bubenhofer
{"title":"Korpuslinguistik in der linguistischen Lehre: Erfolge und Misserfolge","authors":"Noah Bubenhofer","doi":"10.21248/jlcl.26.2011.141","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.141","url":null,"abstract":"Fur die sprachwissenschaftliche Ausbildung an den Universitaten ist es zwar unabdingbar, die Studierenden in die Theorie und Methoden der Korpuslinguistik einzufuhren, doch als Lehrperson kampft man dabei mit einer Reihe von Problemen, denn das technische und methodische Know-how der Studierenden ist oft sehr heterogen. Zudem zeigt sich die Wichtigkeit, die Studierenden fur korpuslinguistisches Arbeiten begeistern zu konnen, indem sie an attraktives Anschauungsmaterial herangefuhrt werden. Im Folgenden zeige ich an einigen Beispielen, welche Themen in den Bereichen Semantik, Textlinguistik, Diskursund der Kulturanalyse sinnvollerweise korpuslinguistisch bearbeitet werden konnen. Zudem versuche ich anhand des Nutzungsverhaltens meiner Online-Einfuhrung in die Korpuslinguistik die Bedurfnisse von Anwendern an Methoden und Werkzeuge der Korpuslinguistik abzuleiten.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125911521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Musisque Deoque: Text Retrieval on Critical Editionse 德文音乐:关键版的文本检索
J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.152
M. Manca, L. Spinazzè, P. Mastandrea, L. Tessarolo, Federico Boschetti
{"title":"Musisque Deoque: Text Retrieval on Critical Editionse","authors":"M. Manca, L. Spinazzè, P. Mastandrea, L. Tessarolo, Federico Boschetti","doi":"10.21248/jlcl.26.2011.152","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.152","url":null,"abstract":"The Musisque Deoque Project (MQDQ) aims at creating a digital archive of Latin poetry, from its origins to the late Italian Renaissance, equipped with critical apparatus and various exegetical and linguistic information. This project is focused on the study of synchronical and diachronical intertextuality as illustrated, e.g., in Cicu (2005). For this reason, we give strong attention to formal and material aspects of the text that actually played a relevant role in the poetical tradition. The fixed text of printed critical editions, aimed at the reconstruction as close as possible to the lost originals, provides just a snapshot of the tradition, which is intrisically dynamic, and gives to the modern reader a distorted image of what an ancient text was in fact. Fully searchable digital collections currently available are based on traditional critical editions, which are, as we just said, authoritarian texts; this authoritarianism is emphasized by the conversion from printed text to database, because usually the critical apparatus is cut away and there is no way for the reader to check a variant different from the one the editor put in the main text, often dubitanter, simply because he had to choose a variant. Limiting lexical searches to editor’s choices drives unavoidably both to false positives and false negatives, which need to be verified back on printed critical editions. False positives are due to possibly wrong emendations made by modern and contemporary scholars, provided by the text retrieval systems among the genuine occurrences, whereas false negatives are the likely variants excluded by editors biased by prejudices against specific linguistic and stylistic phenomena (such as the short-term repetiton, systematically emended by philologists of the last centuries). The purpose of Musisque Deoque is to overcome these limitations, retrieving not only the word keys quoted in the reference edition, but also the variants lying in the critical apparatus. In this way, further knowledge on the accomplished itinerary – from ancient operas during the subsequent ages until the Humanism and the Renaissance – can emerge.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128400867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
From Old Texts to Modern Spellings: An Experiment in Automatic Normalisation 从旧文本到现代拼写:自动规范化实验
J. Lang. Technol. Comput. Linguistics Pub Date : 2011-07-01 DOI: 10.21248/jlcl.26.2011.147
Iris Hendrickx, Rita Marquilhas
{"title":"From Old Texts to Modern Spellings: An Experiment in Automatic Normalisation","authors":"Iris Hendrickx, Rita Marquilhas","doi":"10.21248/jlcl.26.2011.147","DOIUrl":"https://doi.org/10.21248/jlcl.26.2011.147","url":null,"abstract":"We aim to tackle the problem of spelling variations in a corpus of personal Portugese letters from the 16 th to the 20 th century. We investigated the extent to which the task of normalising Portuguese spelling can be accom plished automatically. We adapted VARD2 (Baron and Rayson, 2008), a statistical tool for normalising spelling, for use with the Portuguese language and studied its performance over four dierent time periods. Our results showed that VARD2 performed best on the older letters and worst on the most modern ones. In an extrinsic evaluation, we measured the usefulness of automatic normalisation for the linguistic task of automatic POS-tagging and showed that automatic normalisation of spelling helps improve the performance of the POS-tagger.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133797225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信