Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF

C. Chiarcos, Maxim Ionov
{"title":"Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF","authors":"C. Chiarcos, Maxim Ionov","doi":"10.4230/OASIcs.LDK.2019.3","DOIUrl":null,"url":null,"abstract":"The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web. 2012 ACM Subject Classification Information systems → Graph-based database models; Computing methodologies → Language resources; Computing methodologies → Knowledge representation and reasoning","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Language, Data, and Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/OASIcs.LDK.2019.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web. 2012 ACM Subject Classification Information systems → Graph-based database models; Computing methodologies → Language resources; Computing methodologies → Knowledge representation and reasoning
light:一种llod原生词汇表,用于将行间有光泽文本表示为RDF
本文介绍了light,一个原生RDF词汇表,用于在关联数据形式主义中将语言示例表示为具有行间注释(IGT)的文本。行间注释法是一种用于语言学各个领域的符号,它为读者提供了一种理解语言现象的方法,并在记录濒危语言时提供了语料库数据。这些数据通常以语素对语素的对应关系提供,任何已建立的表示语料库或自动注释的词汇表都不支持这种对应关系。Interlinear gloss Text可以以专门为此目的设计的几种格式存储和交换,但是这些格式在设计和概念上有所不同,并且它们与特定的工具相关联,因此注释数据的可重用性受到限制。为了提高互操作性和可重用性,我们建议将这些注释转换为非常适合于数据Web的独立于工具的表示,即RDF表示。除了通过公共数据表示建立结构(格式)互操作性之外,我们的方法还允许使用(语言)关联开放数据云提供的共享词汇表和术语库。我们描述了核心词汇表和使用该词汇表将各种广泛使用的工具格式的IGT转换为RDF的转换器。最终,关联数据表示将促进(语言)关联开放数据云中资源较少的语言变体的语言数据的可访问性,并支持以新颖的方式访问和集成这些信息与(L)LOD字典数据和其他类型的词汇语义资源。从更长远的角度来看,目前只能通过这些格式获得的数据将变得更加可见和可重用,并有助于开发真正的多语言(语义)web。2012 ACM主题分类信息系统→基于图的数据库模型;计算方法→语言资源;计算方法→知识表示和推理
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信