{"title":"FarsiSpell: A spell-checking system for Persian using a large monolingual corpus","authors":"Tayebeh Mosavi Miangah","doi":"10.1093/llc/fqt008","DOIUrl":"https://doi.org/10.1093/llc/fqt008","url":null,"abstract":"In recent years, great availability of various language resources in different forms as well as rapid development of computer technology and programming skills have made researchers in the fields of linguistics and computer science cooperate in solving different problems of computational linguistics and natural language processing. Building large monolingual as well as bilingual corpora in digital forms and storing them in computer memories has enabled linguists and lan- guage engineers to automatically explore techniques for processing information with the help of various computer programs without any need to manually col- lect and analyze data. One of the main applications of monolingual corpora can be seen in developing automatic spell-checking systems. In such systems, a large monolingual corpus can function as a database instead of a monolingual dictionary. In the present study, it has been tried to demonstrate the effectiveness of a large monolingual corpus of Persian in improving the output quality of a spell-checker developed for this language. In the present spelling correction system, the three phases of error detection, making suggestions, and ranking suggestions are performed in three separate stages. An experiment was carried out to evaluate the performance of the spell-checking system.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116759523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Document dissimilarity within and across languages: A benchmarking study","authors":"R. Forsyth, S. Sharoff","doi":"10.1093/LLC/FQT002","DOIUrl":"https://doi.org/10.1093/LLC/FQT002","url":null,"abstract":"Quantifying the similarity or dissimilarity between documents is an important task in authorship attribution, information retrieval, plagiarism detection, text mining, and many other areas of linguistic computing. Numerous similarity indices have been devised and used, but relatively little attention has been paid to calibrating such indices against externally imposed standards, mainly because of the difficulty of establishing agreed reference levels of inter-text similarity. The present article introduces a multi-register corpus gathered for this purpose, in which each text has been located in a similarity space based on ratings by human readers. This provides a resource for testing similarity measures derived from computational text-processing against reference levels derived from human judgement, i.e. external to the texts themselves. We describe the results of a benchmarking study in five different languages in which some widely used meas- ures perform comparatively poorly. In particular, several alternative correlational measures (Pearson r, Spearman rho, tetrachoric correlation) consistently outper- form cosine similarity on our data. A method of using what we call 'anchor texts' to extend this method from monolingual inter-text similarity-scoring to inter-text similarity-scoring across languages is also proposed and tested.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132704694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Under the Workbench: An analysis of the use and preservation of MONK text mining research software","authors":"H. Green","doi":"10.1093/llc/fqt014","DOIUrl":"https://doi.org/10.1093/llc/fqt014","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"795 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123004179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan-Luis Suárez, Fernando Sancho-Caparrini, É. Ortega, Javier de la Rosa Pérez, Natalia Caldas, D. Brown
{"title":"Towards a digital geography of Hispanic Baroque art","authors":"Juan-Luis Suárez, Fernando Sancho-Caparrini, É. Ortega, Javier de la Rosa Pérez, Natalia Caldas, D. Brown","doi":"10.1093/llc/fqt050","DOIUrl":"https://doi.org/10.1093/llc/fqt050","url":null,"abstract":"In this article we propose an approach to the study of art history based on geography of Hispanic Baroque art by digital means that showcase the multiplicity of possible places of art. Our study advances four elements of a digital geography of art (communities, semantic maps, areas, and flows)—a methodology that can be expanded in future Digital Humanities research. .................................................................................................................................................................................","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124284059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tanya E. Clement, D. Tcheng, L. Auvil, Boris Capitanu, João Barbosa
{"title":"Distant Listening to Gertrude Stein's 'Melanctha': Using Similarity Analysis in a Discovery Paradigm to Analyze Prosody and Author Influence","authors":"Tanya E. Clement, D. Tcheng, L. Auvil, Boris Capitanu, João Barbosa","doi":"10.1093/llc/fqt040","DOIUrl":"https://doi.org/10.1093/llc/fqt040","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132517662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A data-centred 'virtual laboratory' for the humanities: Designing the Australian Humanities Networked Infrastructure (HuNI) service","authors":"Toby Burrows","doi":"10.1093/llc/fqt064","DOIUrl":"https://doi.org/10.1093/llc/fqt064","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132015468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biligsaikhan Batjargal, T. Kuyama, Fuminori Kimura, Akira Maeda
{"title":"Linked data driven multilingual access to diverse Japanese Ukiyo-e databases by generating links dynamically","authors":"Biligsaikhan Batjargal, T. Kuyama, Fuminori Kimura, Akira Maeda","doi":"10.1093/LLC/FQT058","DOIUrl":"https://doi.org/10.1093/LLC/FQT058","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115562513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Nowviskie, David McClure, Wayne Graham, Adam Soroka, J. Boggs, E. Rochester
{"title":"Geo-Temporal Interpretation of Archival Collections with Neatline","authors":"B. Nowviskie, David McClure, Wayne Graham, Adam Soroka, J. Boggs, E. Rochester","doi":"10.1093/llc/fqt043","DOIUrl":"https://doi.org/10.1093/llc/fqt043","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"428 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116515669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beyond the tree of texts: Building an empirical model of scribal variation through graph analysis of texts and stemmata","authors":"T. Andrews, Caroline Macé","doi":"10.1093/llc/fqt032","DOIUrl":"https://doi.org/10.1093/llc/fqt032","url":null,"abstract":"Stemmatology, or the reconstruction of the transmission history of texts, is a field that stands particularly to gain from digital methods. Many scholars already take stemmatic approaches that rely heavily on computational analysis of the collated text (e.g. Robinson and O’Hara 1996; Salemans 2000; Heikkila 2005; Windram et al. 2008 among many others). Although there is great value in computationally assisted stemmatology, providing as it does a reproducible result and allowing access to the relevant methodological process in related fields such as evolutionary biology, computational stemmatics is not without its critics. The current state-of-the-art effectively forces scholars to choose between a preconceived judgment of the significance of textual differences (the Lachmannian or neo-Lachmannian approach, and the weighted phylogenetic approach) or to make no judgment at all (the unweighted phylogenetic approach). Some basis for judgment of the significance of variation is sorely needed for medieval text criticism in particular. By this, we mean that there is a need for a statistical empirical profile of the text-genealogical significance of the different sorts of variation in different sorts of medieval texts. The rules that apply to copies of Greek and Latin classics may not apply to copies of medieval Dutch story collections; the practices of copying authoritative texts such as the Bible will most likely have been different from the practices of copying the Lives of local saints and other commonly adapted texts. It is nevertheless imperative that we have a consistent, flexible, and analytically tractable model for capturing these phenomena of transmission. In this article, we present a computational model that captures most of the phenomena of text variation, and a method for analysis of one or more stemma hypotheses against the variation model. We apply this method to three ‘artificial traditions’ (i.e. texts copied under laboratory conditions by scholars to study the properties of text variation) and four genuine medieval traditions whose transmission history is known or deduced in varying degrees. Although our findings are necessarily limited by the small number of texts at our disposal, we demonstrate here some of the wide variety of calculations that can be made using our model. Certain of our results call sharply into question the utility of excluding ‘trivial’ variation such as orthographic and spelling changes from stemmatic analysis.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"80 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131456082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the relatedness of the endangered Dogon languages","authors":"Steven Moran, Jelena Prokic","doi":"10.1093/llc/fqt061","DOIUrl":"https://doi.org/10.1093/llc/fqt061","url":null,"abstract":"In this article we apply up-to-date methods of quantitative language comparison, inspired by algorithms successfully applied in bioinformatics to decode DNA and determine the genetic relatedness of humans, to language data in an attempt to shed light on the current situation of a family of languages called Dogon, which are spoken in Mali, West Africa. Our aim is to determine the linguistic subgroupings of these languages, which we believe will shed light on their prehistory, highlight the linguistic diversity of these groups and which may ultimately inform studies on the cultural boundaries of these languages. DOI: https://doi.org/10.1093/llc/fqt061 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-84673 Journal Article Originally published at: Moran, Steven; Prokić, Jelena (2013). Investigating the Relatedness of the Endangered Dogon Languages. Literary and Linguistic Computing, 28(4):676-691. DOI: https://doi.org/10.1093/llc/fqt061 Investigating the relatedness of the endangered Dogon languages ............................................................................................................................................................","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132080048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}