Lit. Linguistic Comput.最新文献

筛选
英文 中文
FarsiSpell: A spell-checking system for Persian using a large monolingual corpus FarsiSpell:一个使用大型单语语料库的波斯语拼写检查系统
Lit. Linguistic Comput. Pub Date : 2014-04-01 DOI: 10.1093/llc/fqt008
Tayebeh Mosavi Miangah
{"title":"FarsiSpell: A spell-checking system for Persian using a large monolingual corpus","authors":"Tayebeh Mosavi Miangah","doi":"10.1093/llc/fqt008","DOIUrl":"https://doi.org/10.1093/llc/fqt008","url":null,"abstract":"In recent years, great availability of various language resources in different forms as well as rapid development of computer technology and programming skills have made researchers in the fields of linguistics and computer science cooperate in solving different problems of computational linguistics and natural language processing. Building large monolingual as well as bilingual corpora in digital forms and storing them in computer memories has enabled linguists and lan- guage engineers to automatically explore techniques for processing information with the help of various computer programs without any need to manually col- lect and analyze data. One of the main applications of monolingual corpora can be seen in developing automatic spell-checking systems. In such systems, a large monolingual corpus can function as a database instead of a monolingual dictionary. In the present study, it has been tried to demonstrate the effectiveness of a large monolingual corpus of Persian in improving the output quality of a spell-checker developed for this language. In the present spelling correction system, the three phases of error detection, making suggestions, and ranking suggestions are performed in three separate stages. An experiment was carried out to evaluate the performance of the spell-checking system.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116759523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Document dissimilarity within and across languages: A benchmarking study 语言内部和跨语言的文档差异:基准研究
Lit. Linguistic Comput. Pub Date : 2014-04-01 DOI: 10.1093/LLC/FQT002
R. Forsyth, S. Sharoff
{"title":"Document dissimilarity within and across languages: A benchmarking study","authors":"R. Forsyth, S. Sharoff","doi":"10.1093/LLC/FQT002","DOIUrl":"https://doi.org/10.1093/LLC/FQT002","url":null,"abstract":"Quantifying the similarity or dissimilarity between documents is an important task in authorship attribution, information retrieval, plagiarism detection, text mining, and many other areas of linguistic computing. Numerous similarity indices have been devised and used, but relatively little attention has been paid to calibrating such indices against externally imposed standards, mainly because of the difficulty of establishing agreed reference levels of inter-text similarity. The present article introduces a multi-register corpus gathered for this purpose, in which each text has been located in a similarity space based on ratings by human readers. This provides a resource for testing similarity measures derived from computational text-processing against reference levels derived from human judgement, i.e. external to the texts themselves. We describe the results of a benchmarking study in five different languages in which some widely used meas- ures perform comparatively poorly. In particular, several alternative correlational measures (Pearson r, Spearman rho, tetrachoric correlation) consistently outper- form cosine similarity on our data. A method of using what we call 'anchor texts' to extend this method from monolingual inter-text similarity-scoring to inter-text similarity-scoring across languages is also proposed and tested.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132704694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Under the Workbench: An analysis of the use and preservation of MONK text mining research software 工作台下:MONK文本挖掘研究软件的使用与保存分析
Lit. Linguistic Comput. Pub Date : 2014-04-01 DOI: 10.1093/llc/fqt014
H. Green
{"title":"Under the Workbench: An analysis of the use and preservation of MONK text mining research software","authors":"H. Green","doi":"10.1093/llc/fqt014","DOIUrl":"https://doi.org/10.1093/llc/fqt014","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"795 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123004179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards a digital geography of Hispanic Baroque art 走向西班牙巴洛克艺术的数字地理
Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt050
Juan-Luis Suárez, Fernando Sancho-Caparrini, É. Ortega, Javier de la Rosa Pérez, Natalia Caldas, D. Brown
{"title":"Towards a digital geography of Hispanic Baroque art","authors":"Juan-Luis Suárez, Fernando Sancho-Caparrini, É. Ortega, Javier de la Rosa Pérez, Natalia Caldas, D. Brown","doi":"10.1093/llc/fqt050","DOIUrl":"https://doi.org/10.1093/llc/fqt050","url":null,"abstract":"In this article we propose an approach to the study of art history based on geography of Hispanic Baroque art by digital means that showcase the multiplicity of possible places of art. Our study advances four elements of a digital geography of art (communities, semantic maps, areas, and flows)—a methodology that can be expanded in future Digital Humanities research. .................................................................................................................................................................................","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124284059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Distant Listening to Gertrude Stein's 'Melanctha': Using Similarity Analysis in a Discovery Paradigm to Analyze Prosody and Author Influence 远听格特鲁德·斯坦的《梅兰莎》:在发现范式中使用相似性分析来分析韵律和作者影响
Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt040
Tanya E. Clement, D. Tcheng, L. Auvil, Boris Capitanu, João Barbosa
{"title":"Distant Listening to Gertrude Stein's 'Melanctha': Using Similarity Analysis in a Discovery Paradigm to Analyze Prosody and Author Influence","authors":"Tanya E. Clement, D. Tcheng, L. Auvil, Boris Capitanu, João Barbosa","doi":"10.1093/llc/fqt040","DOIUrl":"https://doi.org/10.1093/llc/fqt040","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132517662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A data-centred 'virtual laboratory' for the humanities: Designing the Australian Humanities Networked Infrastructure (HuNI) service 以数据为中心的人文学科“虚拟实验室”:设计澳大利亚人文学科网络基础设施(HuNI)服务
Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt064
Toby Burrows
{"title":"A data-centred 'virtual laboratory' for the humanities: Designing the Australian Humanities Networked Infrastructure (HuNI) service","authors":"Toby Burrows","doi":"10.1093/llc/fqt064","DOIUrl":"https://doi.org/10.1093/llc/fqt064","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132015468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Linked data driven multilingual access to diverse Japanese Ukiyo-e databases by generating links dynamically 通过动态生成链接,链接数据驱动对不同日语浮世绘数据库的多语言访问
Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/LLC/FQT058
Biligsaikhan Batjargal, T. Kuyama, Fuminori Kimura, Akira Maeda
{"title":"Linked data driven multilingual access to diverse Japanese Ukiyo-e databases by generating links dynamically","authors":"Biligsaikhan Batjargal, T. Kuyama, Fuminori Kimura, Akira Maeda","doi":"10.1093/LLC/FQT058","DOIUrl":"https://doi.org/10.1093/LLC/FQT058","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115562513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Geo-Temporal Interpretation of Archival Collections with Neatline 利用Neatline对档案馆藏的时空解释
Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt043
B. Nowviskie, David McClure, Wayne Graham, Adam Soroka, J. Boggs, E. Rochester
{"title":"Geo-Temporal Interpretation of Archival Collections with Neatline","authors":"B. Nowviskie, David McClure, Wayne Graham, Adam Soroka, J. Boggs, E. Rochester","doi":"10.1093/llc/fqt043","DOIUrl":"https://doi.org/10.1093/llc/fqt043","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"428 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116515669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Beyond the tree of texts: Building an empirical model of scribal variation through graph analysis of texts and stemmata 超越文本树:通过文本和词干的图形分析建立抄写变化的经验模型
Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt032
T. Andrews, Caroline Macé
{"title":"Beyond the tree of texts: Building an empirical model of scribal variation through graph analysis of texts and stemmata","authors":"T. Andrews, Caroline Macé","doi":"10.1093/llc/fqt032","DOIUrl":"https://doi.org/10.1093/llc/fqt032","url":null,"abstract":"Stemmatology, or the reconstruction of the transmission history of texts, is a field that stands particularly to gain from digital methods. Many scholars already take stemmatic approaches that rely heavily on computational analysis of the collated text (e.g. Robinson and O’Hara 1996; Salemans 2000; Heikkila 2005; Windram et al. 2008 among many others). Although there is great value in computationally assisted stemmatology, providing as it does a reproducible result and allowing access to the relevant methodological process in related fields such as evolutionary biology, computational stemmatics is not without its critics. The current state-of-the-art effectively forces scholars to choose between a preconceived judgment of the significance of textual differences (the Lachmannian or neo-Lachmannian approach, and the weighted phylogenetic approach) or to make no judgment at all (the unweighted phylogenetic approach). Some basis for judgment of the significance of variation is sorely needed for medieval text criticism in particular. By this, we mean that there is a need for a statistical empirical profile of the text-genealogical significance of the different sorts of variation in different sorts of medieval texts. The rules that apply to copies of Greek and Latin classics may not apply to copies of medieval Dutch story collections; the practices of copying authoritative texts such as the Bible will most likely have been different from the practices of copying the Lives of local saints and other commonly adapted texts. It is nevertheless imperative that we have a consistent, flexible, and analytically tractable model for capturing these phenomena of transmission. In this article, we present a computational model that captures most of the phenomena of text variation, and a method for analysis of one or more stemma hypotheses against the variation model. We apply this method to three ‘artificial traditions’ (i.e. texts copied under laboratory conditions by scholars to study the properties of text variation) and four genuine medieval traditions whose transmission history is known or deduced in varying degrees. Although our findings are necessarily limited by the small number of texts at our disposal, we demonstrate here some of the wide variety of calculations that can be made using our model. Certain of our results call sharply into question the utility of excluding ‘trivial’ variation such as orthographic and spelling changes from stemmatic analysis.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"80 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131456082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Investigating the relatedness of the endangered Dogon languages 调查濒临灭绝的多贡语的亲缘关系
Lit. Linguistic Comput. Pub Date : 2013-12-01 DOI: 10.1093/llc/fqt061
Steven Moran, Jelena Prokic
{"title":"Investigating the relatedness of the endangered Dogon languages","authors":"Steven Moran, Jelena Prokic","doi":"10.1093/llc/fqt061","DOIUrl":"https://doi.org/10.1093/llc/fqt061","url":null,"abstract":"In this article we apply up-to-date methods of quantitative language comparison, inspired by algorithms successfully applied in bioinformatics to decode DNA and determine the genetic relatedness of humans, to language data in an attempt to shed light on the current situation of a family of languages called Dogon, which are spoken in Mali, West Africa. Our aim is to determine the linguistic subgroupings of these languages, which we believe will shed light on their prehistory, highlight the linguistic diversity of these groups and which may ultimately inform studies on the cultural boundaries of these languages. DOI: https://doi.org/10.1093/llc/fqt061 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-84673 Journal Article Originally published at: Moran, Steven; Prokić, Jelena (2013). Investigating the Relatedness of the Endangered Dogon Languages. Literary and Linguistic Computing, 28(4):676-691. DOI: https://doi.org/10.1093/llc/fqt061 Investigating the relatedness of the endangered Dogon languages ............................................................................................................................................................","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132080048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信