International Conference on Language, Data, and Knowledge最新文献_第8页

Plenary Debates of the Parliament of Finland as Linked Open Data and in Parla-CLARIN Markup 芬兰议会全体辩论作为链接开放数据和Parla-CLARIN标记

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.8

Laura Sinikallio, Senka Drobac, Minna Tamper, Rafael Leal, M. Koho, J. Tuominen, Matti La Mela, E. Hyvönen

{"title":"Plenary Debates of the Parliament of Finland as Linked Open Data and in Parla-CLARIN Markup","authors":"Laura Sinikallio, Senka Drobac, Minna Tamper, Rafael Leal, M. Koho, J. Tuominen, Matti La Mela, E. Hyvönen","doi":"10.4230/OASIcs.LDK.2021.8","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.8","url":null,"abstract":"This paper presents a knowledge graph created by transforming the plenary debates of the Parliament of Finland (1907–) into Linked Open Data (LOD). The data, totaling over 900 000 speeches, with automatically created semantic annotations and rich ontology-based metadata, are published in a Linked Open Data Service and are used via a SPARQL API and as data dumps. The speech data is part of larger LOD publication FinnParla that also includes prosopographical data about the politicians. The data is being used for studying parliamentary language and culture in Digital Humanities in several universities. To serve a wider variety of users, the entirety of this data was also produced using Parla-CLARIN markup. We present the first publication of all Finnish parliamentary debates as data. Technical novelties in our approach include the use of both Parla-CLARIN and an RDF schema developed for representing the speeches, integration of the data to a new Parliament of Finland Ontology for deeper data analyses, and enriching the data with a variety of external national and international data sources.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121751279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Towards the Detection and Formal Representation of Semantic Shifts in Inflectional Morphology 屈折形态语义转换的检测与形式表征

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.21

Dagmar Gromann, Thierry Declerck

{"title":"Towards the Detection and Formal Representation of Semantic Shifts in Inflectional Morphology","authors":"Dagmar Gromann, Thierry Declerck","doi":"10.4230/OASIcs.LDK.2019.21","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.21","url":null,"abstract":"Semantic shifts caused by derivational morphemes is a common subject of investigation in language modeling, while inflectional morphemes are frequently portrayed as semantically more stable. This study is motivated by the previously established observation that inflectional morphemes can be just as variable as derivational ones. For instance, the English plural “-s” can turn the fabric silk into the garments of a jockey, silks. While humans know that silk in this sense has no plural, it takes more for machines to arrive at this conclusion. Frequently utilized computational language resources, such as WordNet, or models for representing computational lexicons, like OntoLex-Lemon, have no descriptive mechanism to represent such inflectional semantic shifts. To investigate this phenomenon, we extract word pairs of different grammatical number from WordNet that feature additional senses in the plural and evaluate their distribution in vector space, i.e., pre-trained word2vec and fastText embeddings. We then propose an extension of OntoLex-Lemon to accommodate this phenomenon that we call inflectional morpho-semantic variation to provide a formal representation accessible to algorithms, neural networks, and agents. While the exact scope of the problem is yet to be determined, this first dataset shows that it is not negligible. 2012 ACM Subject Classification Information systems","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123768701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar 基于图的标注工程:面向角色和参考语法的黄金语料库

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.9

C. Chiarcos, Christian Fäth

{"title":"Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar","authors":"C. Chiarcos, Christian Fäth","doi":"10.4230/OASIcs.LDK.2019.9","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.9","url":null,"abstract":"This paper describes the application of annotation engineering techniques for the construction of a corpus for Role and Reference Grammar (RRG). RRG is a semantics-oriented formalism for natural language syntax popular in comparative linguistics and linguistic typology, and predominantly applied for the description of non-European languages which are less-resourced in terms of natural language processing. Because of its crosslinguistic applicability and its conjoint treatment of syntax and semantics, RRG also represents a promising framework for research challenges within natural language processing. At the moment, however, these have not been explored as no RRG corpus data is publicly available. While RRG annotations cannot be easily derived from any single treebank in existence, we suggest that they can be reliably inferred from the intersection of syntactic and semantic annotations as represented by, for example, the Universal Dependencies (UD) and PropBank (PB), and we demonstrate this for the English Web Treebank, a 250,000 token corpus of various genres of English internet text. The resulting corpus is a gold corpus for future experiments in natural language processing in the sense that it is built on existing annotations which have been created manually. A technical challenge in this context is to align UD and PB annotations, to integrate them in a coherent manner, and to distribute and to combine their information on RRG constituent and operator projections. For this purpose, we describe a framework for flexible and scalable annotation engineering based on flexible, unconstrained graph transformations of sentence graphs by means of SPARQL Update. 2012 ACM Subject Classification Computing methodologies → Language resources; Information systems → Semantic web description languages; Computing methodologies → Natural language processing; Computing methodologies → Lexical semantics","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"307 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134299472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Towards Learning Terminological Concept Systems from Multilingual Natural Language Text 从多语种自然语言文本中学习术语概念系统

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.22

Lennart Wachowiak, Christian Lang, B. Heinisch, Dagmar Gromann

引用次数: 1

Functional Representation of Technical Artefacts in Ontology-Terminology Models 本体术语模型中技术工件的功能表示

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.5

L. Giacomini

引用次数: 0

Universal Dependencies for Multilingual Open Information Extraction 多语言开放信息提取的通用依赖关系

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.24

Massinissa Atmani, Mathieu Lafourcade

引用次数: 0

Can Computational Meta-Documentary Linguistics Provide for Accountability and Offer an Alternative to "Reproducibility" in Linguistics? 计算元文献语言学能否提供问责性，并为语言学中的“再现性”提供另一种选择?

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.26

T. Weber

引用次数: 8

The JeuxDeMots Project (Invited Talk) JeuxDeMots项目(特邀演讲)

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.1

Mathieu Lafourcade

引用次数: 0

A Computational Simulation of Children's Language Acquisition (Crazy New Idea) 儿童语言习得的计算模拟(疯狂的新想法)

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.4

Ben Ambridge

引用次数: 0

Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets 推文隐喻识别的高质量数据集

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.10

Omnia Zayed, John P. McCrae, P. Buitelaar

{"title":"Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets","authors":"Omnia Zayed, John P. McCrae, P. Buitelaar","doi":"10.4230/OASIcs.LDK.2019.10","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.10","url":null,"abstract":"Metaphor is one of the most important elements of human communication, especially in informal settings such as social media. There have been a number of datasets created for metaphor identiﬁcation, however, this task has proven diﬃcult due to the nebulous nature of metaphoricity. In this paper, we present a crowd-sourcing approach for the creation of a dataset for metaphor identiﬁcation, that is able to rapidly achieve large coverage over the diﬀerent usages of metaphor in a given corpus while maintaining high accuracy. We validate this methodology by creating a set of 2,500 manually annotated tweets in English, for which we achieve inter-annotator agreement scores over 0.8, which is higher than other reported results that did not limit the task. This methodology is based on the use of an existing classiﬁer for metaphor in order to assist in the identiﬁcation and the selection of the examples for annotation, in a way that reduces the cognitive load for annotators and enables quick and accurate annotation. We selected a corpus of both general language tweets and political tweets relating to Brexit and we compare the resulting corpus on these two domains. As a result of this work, we have published the ﬁrst dataset of tweets annotated for metaphors, which we believe will be invaluable for the development, training and evaluation of approaches for metaphor identiﬁcation in tweets.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127134269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6