International Conference on Language, Data, and Knowledge最新文献

Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages 资源不足的德拉威语不同正字法的机器翻译比较

International Conference on Language, Data, and Knowledge Pub Date : 2019-05-20 DOI: 10.4230/OASIcs.LDK.2019.6

Bharathi Raja Chakravarthi, Mihael Arcan, John P. McCrae

{"title":"Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages","authors":"Bharathi Raja Chakravarthi, Mihael Arcan, John P. McCrae","doi":"10.4230/OASIcs.LDK.2019.6","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.6","url":null,"abstract":"Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"726-731 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125204863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

A Proposal for a Two-Way Journey on Validating Locations in Unstructured and Structured Data 在非结构化和结构化数据中验证位置的双向旅程的建议

International Conference on Language, Data, and Knowledge Pub Date : 2019-05-20 DOI: 10.4230/OASIcs.LDK.2019.13

Ilkcan Keles, Omar Qawasmeh, Tabea Tietz, Ludovica Marinucci, Roberto Reda, M. Erp

引用次数: 0

OWLC: A Contextual Two-Dimensional Web Ontology Language OWLC:一种上下文二维Web本体语言

International Conference on Language, Data, and Knowledge Pub Date : 2019-05-20 DOI: 10.4230/OASIcs.LDK.2019.2

Sahar Aljalbout, Didier Buchs, G. Falquet

引用次数: 3

Interlinking SciGraph and DBpedia Datasets Using Link Discovery and Named Entity Recognition Techniques 使用链接发现和命名实体识别技术连接SciGraph和DBpedia数据集

International Conference on Language, Data, and Knowledge Pub Date : 2019-05-20 DOI: 10.4230/OASICS.LDK.2019.15

Beyza Yaman, Michele Pasin, M. Freudenberg

{"title":"Interlinking SciGraph and DBpedia Datasets Using Link Discovery and Named Entity Recognition Techniques","authors":"Beyza Yaman, Michele Pasin, M. Freudenberg","doi":"10.4230/OASICS.LDK.2019.15","DOIUrl":"https://doi.org/10.4230/OASICS.LDK.2019.15","url":null,"abstract":"In recent years we have seen a proliferation of Linked Open Data (LOD) compliant datasets becoming available on the web, leading to an increased number of opportunities for data consumers to build smarter applications which integrate data coming from disparate sources. However, often the integration is not easily achievable since it requires discovering and expressing associations across heterogeneous data sets. The goal of this work is to increase the discoverability and reusability of the scholarly data by integrating them to highly interlinked datasets in the LOD cloud. In order to do so we applied techniques that a) improve the identity resolution across these two sources using Link Discovery for the structured data (i.e. by annotating Springer Nature (SN) SciGraph entities with links to DBpedia entities), and b) enriching SN SciGraph unstructured text content (document abstracts) with links to DBpedia entities using Named Entity Recognition (NER). We published the results of this work using standard vocabularies and provided an interactive exploration tool which presents the discovered links w.r.t. the breadth and depth of the DBpedia classes.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128950333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Metalexicography as Knowledge Graph 元编辑学作为知识图谱

International Conference on Language, Data, and Knowledge Pub Date : 2019-05-20 DOI: 10.4230/OASIcs.LDK.2019.19

David Lindemann, Christian Klaes, P. Zumstein

引用次数: 0

Opening Digitized Newspapers Corpora: Europeana's Full-Text Data Interoperability Case 开放数字化报纸语料库:欧洲全文数据互操作性案例

International Conference on Language, Data, and Knowledge Pub Date : 2019-05-01 DOI: 10.4230/OASIcs.LDK.2019.22

Nuno Freire, Antoine Isaac, Twan Goosen, D. Broeder, Hugo Manguinhas, V. Charles

{"title":"Opening Digitized Newspapers Corpora: Europeana's Full-Text Data Interoperability Case","authors":"Nuno Freire, Antoine Isaac, Twan Goosen, D. Broeder, Hugo Manguinhas, V. Charles","doi":"10.4230/OASIcs.LDK.2019.22","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.22","url":null,"abstract":"Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133163750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study 专家注释数据集的验证方法:事件注释案例研究

International Conference on Language, Data, and Knowledge Pub Date : 2019-05-01 DOI: 10.4230/OASIcs.LDK.2019.12

O. Inel, Lora Aroyo

引用次数: 11

An Evaluation Dataset for Linked Data Profiling 关联数据分析的评估数据集

International Conference on Language, Data, and Knowledge Pub Date : 2017-06-19 DOI: 10.1007/978-3-319-59888-8_1

Andrejs Abele, John P. McCrae, P. Buitelaar

引用次数: 2

Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders 基于语义特征和自编码器降维的多标签文本分类

International Conference on Language, Data, and Knowledge Pub Date : 2017-06-19 DOI: 10.1007/978-3-319-59888-8_32

Wael Alkhatib, Christoph Rensing, Johannes Silberbauer

引用次数: 12

Answering the Hard Questions 回答棘手的问题

International Conference on Language, Data, and Knowledge Pub Date : 2017-06-19 DOI: 10.1007/978-3-319-59888-8_22

Maria Khvalchik, Chanin Pithyaachariyakul, Anagha Kulkarni

引用次数: 1