Lit. Linguistic Comput.最新文献_第5页

FarsiSpell: A spell-checking system for Persian using a large monolingual corpus FarsiSpell:一个使用大型单语语料库的波斯语拼写检查系统

Lit. Linguistic Comput. Pub Date : 2014-04-01 DOI: 10.1093/llc/fqt008

Tayebeh Mosavi Miangah

{"title":"FarsiSpell: A spell-checking system for Persian using a large monolingual corpus","authors":"Tayebeh Mosavi Miangah","doi":"10.1093/llc/fqt008","DOIUrl":"https://doi.org/10.1093/llc/fqt008","url":null,"abstract":"In recent years, great availability of various language resources in different forms as well as rapid development of computer technology and programming skills have made researchers in the fields of linguistics and computer science cooperate in solving different problems of computational linguistics and natural language processing. Building large monolingual as well as bilingual corpora in digital forms and storing them in computer memories has enabled linguists and lan- guage engineers to automatically explore techniques for processing information with the help of various computer programs without any need to manually col- lect and analyze data. One of the main applications of monolingual corpora can be seen in developing automatic spell-checking systems. In such systems, a large monolingual corpus can function as a database instead of a monolingual dictionary. In the present study, it has been tried to demonstrate the effectiveness of a large monolingual corpus of Persian in improving the output quality of a spell-checker developed for this language. In the present spelling correction system, the three phases of error detection, making suggestions, and ranking suggestions are performed in three separate stages. An experiment was carried out to evaluate the performance of the spell-checking system.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116759523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Lit. Linguistic Comput. Pub Date : 2014-04-01 DOI: 10.1093/LLC/FQT002

R. Forsyth, S. Sharoff

{"title":"Document dissimilarity within and across languages: A benchmarking study","authors":"R. Forsyth, S. Sharoff","doi":"10.1093/LLC/FQT002","DOIUrl":"https://doi.org/10.1093/LLC/FQT002","url":null,"abstract":"Quantifying the similarity or dissimilarity between documents is an important task in authorship attribution, information retrieval, plagiarism detection, text mining, and many other areas of linguistic computing. Numerous similarity indices have been devised and used, but relatively little attention has been paid to calibrating such indices against externally imposed standards, mainly because of the difficulty of establishing agreed reference levels of inter-text similarity. The present article introduces a multi-register corpus gathered for this purpose, in which each text has been located in a similarity space based on ratings by human readers. This provides a resource for testing similarity measures derived from computational text-processing against reference levels derived from human judgement, i.e. external to the texts themselves. We describe the results of a benchmarking study in five different languages in which some widely used meas- ures perform comparatively poorly. In particular, several alternative correlational measures (Pearson r, Spearman rho, tetrachoric correlation) consistently outper- form cosine similarity on our data. A method of using what we call 'anchor texts' to extend this method from monolingual inter-text similarity-scoring to inter-text similarity-scoring across languages is also proposed and tested.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132704694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Under the Workbench: An analysis of the use and preservation of MONK text mining research software 工作台下:MONK文本挖掘研究软件的使用与保存分析

Lit. Linguistic Comput. Pub Date : 2014-04-01 DOI: 10.1093/llc/fqt014

H. Green

引用次数: 4

Towards a digital geography of Hispanic Baroque art 走向西班牙巴洛克艺术的数字地理

Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt050

Juan-Luis Suárez, Fernando Sancho-Caparrini, É. Ortega, Javier de la Rosa Pérez, Natalia Caldas, D. Brown

引用次数: 10

Distant Listening to Gertrude Stein's 'Melanctha': Using Similarity Analysis in a Discovery Paradigm to Analyze Prosody and Author Influence 远听格特鲁德·斯坦的《梅兰莎》:在发现范式中使用相似性分析来分析韵律和作者影响

Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt040

Tanya E. Clement, D. Tcheng, L. Auvil, Boris Capitanu, João Barbosa

引用次数: 7

A data-centred 'virtual laboratory' for the humanities: Designing the Australian Humanities Networked Infrastructure (HuNI) service 以数据为中心的人文学科“虚拟实验室”:设计澳大利亚人文学科网络基础设施(HuNI)服务

Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt064

Toby Burrows

引用次数: 5

Linked data driven multilingual access to diverse Japanese Ukiyo-e databases by generating links dynamically 通过动态生成链接，链接数据驱动对不同日语浮世绘数据库的多语言访问

Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/LLC/FQT058

Biligsaikhan Batjargal, T. Kuyama, Fuminori Kimura, Akira Maeda

引用次数: 5

Geo-Temporal Interpretation of Archival Collections with Neatline 利用Neatline对档案馆藏的时空解释

Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt043

B. Nowviskie, David McClure, Wayne Graham, Adam Soroka, J. Boggs, E. Rochester

引用次数: 15

Beyond the tree of texts: Building an empirical model of scribal variation through graph analysis of texts and stemmata 超越文本树:通过文本和词干的图形分析建立抄写变化的经验模型

Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt032

T. Andrews, Caroline Macé

{"title":"Beyond the tree of texts: Building an empirical model of scribal variation through graph analysis of texts and stemmata","authors":"T. Andrews, Caroline Macé","doi":"10.1093/llc/fqt032","DOIUrl":"https://doi.org/10.1093/llc/fqt032","url":null,"abstract":"Stemmatology, or the reconstruction of the transmission history of texts, is a field that stands particularly to gain from digital methods. Many scholars already take stemmatic approaches that rely heavily on computational analysis of the collated text (e.g. Robinson and O’Hara 1996; Salemans 2000; Heikkila 2005; Windram et al. 2008 among many others). Although there is great value in computationally assisted stemmatology, providing as it does a reproducible result and allowing access to the relevant methodological process in related fields such as evolutionary biology, computational stemmatics is not without its critics. The current state-of-the-art effectively forces scholars to choose between a preconceived judgment of the significance of textual differences (the Lachmannian or neo-Lachmannian approach, and the weighted phylogenetic approach) or to make no judgment at all (the unweighted phylogenetic approach). Some basis for judgment of the significance of variation is sorely needed for medieval text criticism in particular. By this, we mean that there is a need for a statistical empirical profile of the text-genealogical significance of the different sorts of variation in different sorts of medieval texts. The rules that apply to copies of Greek and Latin classics may not apply to copies of medieval Dutch story collections; the practices of copying authoritative texts such as the Bible will most likely have been different from the practices of copying the Lives of local saints and other commonly adapted texts. It is nevertheless imperative that we have a consistent, flexible, and analytically tractable model for capturing these phenomena of transmission. In this article, we present a computational model that captures most of the phenomena of text variation, and a method for analysis of one or more stemma hypotheses against the variation model. We apply this method to three ‘artificial traditions’ (i.e. texts copied under laboratory conditions by scholars to study the properties of text variation) and four genuine medieval traditions whose transmission history is known or deduced in varying degrees. Although our findings are necessarily limited by the small number of texts at our disposal, we demonstrate here some of the wide variety of calculations that can be made using our model. Certain of our results call sharply into question the utility of excluding ‘trivial’ variation such as orthographic and spelling changes from stemmatic analysis.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"80 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131456082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Investigating the relatedness of the endangered Dogon languages 调查濒临灭绝的多贡语的亲缘关系

Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt061

Steven Moran, Jelena Prokic

{"title":"Investigating the relatedness of the endangered Dogon languages","authors":"Steven Moran, Jelena Prokic","doi":"10.1093/llc/fqt061","DOIUrl":"https://doi.org/10.1093/llc/fqt061","url":null,"abstract":"In this article we apply up-to-date methods of quantitative language comparison, inspired by algorithms successfully applied in bioinformatics to decode DNA and determine the genetic relatedness of humans, to language data in an attempt to shed light on the current situation of a family of languages called Dogon, which are spoken in Mali, West Africa. Our aim is to determine the linguistic subgroupings of these languages, which we believe will shed light on their prehistory, highlight the linguistic diversity of these groups and which may ultimately inform studies on the cultural boundaries of these languages. DOI: https://doi.org/10.1093/llc/fqt061 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-84673 Journal Article Originally published at: Moran, Steven; Prokić, Jelena (2013). Investigating the Relatedness of the Endangered Dogon Languages. Literary and Linguistic Computing, 28(4):676-691. DOI: https://doi.org/10.1093/llc/fqt061 Investigating the relatedness of the endangered Dogon languages ............................................................................................................................................................","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132080048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5