International Conference on Language, Data, and Knowledge最新文献_第9页

Supporting the Annotation Experience Through CorEx and Word Mover's Distance 通过CorEx和Word Mover's Distance支持注释体验

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.12

Stefania Pecore

引用次数: 0

Name Variants for Improving Entity Discovery and Linking 名称变体用于改进实体发现和链接

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.14

A. Weichselbraun, P. Kuntschik, Adrian M. P. Braşoveanu

{"title":"Name Variants for Improving Entity Discovery and Linking","authors":"A. Weichselbraun, P. Kuntschik, Adrian M. P. Braşoveanu","doi":"10.4230/OASIcs.LDK.2019.14","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.14","url":null,"abstract":"Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content. This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements. 2012 ACM Subject Classification Information systems → Incomplete data; Information systems → Inconsistent data; Information systems → Extraction, transformation and loading; Information systems → Entity resolution","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129349911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Relevance Feedback Search Based on Automatic Annotation and Classification of Texts 基于文本自动标注与分类的相关反馈搜索

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.18

Rafael Leal, Joonas Kesäniemi, M. Koho, E. Hyvönen

{"title":"Relevance Feedback Search Based on Automatic Annotation and Classification of Texts","authors":"Rafael Leal, Joonas Kesäniemi, M. Koho, E. Hyvönen","doi":"10.4230/OASIcs.LDK.2021.18","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.18","url":null,"abstract":"The idea behind Relevance Feedback Search (RFBS) is to build search queries as an iterative and interactive process in which they are gradually refined based on the results of the previous search round. This can be helpful in situations where the end user cannot easily formulate their information needs at the outset as a well-focused query, or more generally as a way to filter and focus search results. This paper concerns (1) a framework that integrates keyword extraction and unsupervised classification into the RFBS paradigm and (2) the application of this framework to the legal domain as a use case. We focus on the Natural Language Processing (NLP) methods underlying the framework and application, where an automatic annotation tool is used for extracting document keywords as ontology concepts, which are then transformed into word embeddings to form vectorial representations of the texts. An unsupervised classification system that employs similar techniques is also used in order to classify the documents into broad thematic classes. This classification functionality is evaluated using two different datasets. As the use case, we describe an application perspective in the semantic portal LawSampo – Finnish Legislation and Case Law on the Semantic Web . This online demonstrator uses a dataset of 82 145 sections in 3725 statutes of Finnish legislation and another dataset that comprises 13 470 court decisions.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132393470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Get! Mimetypes! Right! (Crazy New Idea) 得到!mimetype !没错!(疯狂的新想法)

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.5

C. Chiarcos

引用次数: 0

Linking Discourse Marker Inventories 链接语篇标记量表

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.40

C. Chiarcos, Maxim Ionov

引用次数: 4

Towards a Corpus of Historical German Plays with Emotion Annotations 情感注释的德国历史戏剧语料库研究

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.9

Thomas Schmidt, Katrin Dennerlein, Christian Wolff

{"title":"Towards a Corpus of Historical German Plays with Emotion Annotations","authors":"Thomas Schmidt, Katrin Dennerlein, Christian Wolff","doi":"10.4230/OASIcs.LDK.2021.9","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.9","url":null,"abstract":"In this paper, we present first work-in-progress annotation results of a project investigating computational methods of emotion analysis for historical German plays around 1800. We report on the development of an annotation scheme focussing on the annotation of emotions that are important from a literary studies perspective for this time span as well as on the annotation process we have developed. We annotate emotions expressed or attributed by characters of the plays in the written texts. The scheme consists of 13 hierarchically structured emotion concepts as well as the source (who experiences or attributes the emotion) and target (who or what is the emotion directed towards). We have conducted the annotation of five example plays of our corpus with two annotators per play and report on annotation distributions and agreement statistics. We were able to collect over 6,500 emotion annotations and identified a fair agreement for most concepts around a κ-value of 0.4. We discuss how we plan to improve annotator consistency and continue our work. The results also have implications for similar projects in the context of Digital Humanities.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124477745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation CoNLL-Merge:并发标记化和文本变化的有效协调

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.7

C. Chiarcos, Niko Schenk

{"title":"CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation","authors":"C. Chiarcos, Niko Schenk","doi":"10.4230/OASIcs.LDK.2019.7","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.7","url":null,"abstract":"The proper detection of tokens in of running text represents the initial processing step in modular NLP pipelines. But strategies for defining these minimal units can differ, and conflicting analyses of the same text seriously limit the integration of subsequent linguistic annotations into a shared representation. As a solution, we introduce CoNLL Merge, a practical tool for harmonizing TSVrelated data models, as they occur, e.g., in multi-layer corpora with non-sequential, concurrent tokenizations, but also in ensemble combinations in Natural Language Processing. CoNLL Merge works unsupervised, requires no manual intervention or external data sources, and comes with a flexible API for fully automated merging routines, validity and sanity checks. Users can chose from several merging strategies, and either preserve a reference tokenization (with possible losses of annotation granularity), create a common tokenization layer consisting of minimal shared subtokens (loss-less in terms of annotation granularity, destructive against a reference tokenization), or present tokenization clashes (loss-less and non-destructive, but introducing empty tokens as place-holders for unaligned elements). We demonstrate the applicability of the tool on two use cases from natural language processing and computational philology. 2012 ACM Subject Classification Applied computing → Format and notation; Applied computing → Document management and text processing; Applied computing → Annotation; Software and its engineering → Interoperability","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129306578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discrepancies Between Database- and Pragmatically Driven NLG: Insights from QUD-Based Annotations 数据库和语用驱动的NLG之间的差异:基于qud的注释的见解

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.32

C. Hesse, Maurice Langner, Anton Benz, R. Klabunde

引用次数: 1

AAA4LLL - Acquisition, Annotation, Augmentation for Lively Language Learning aaa4ll -习得，注释，增强活泼的语言学习

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.29

Bartholomäus Wloka, W. Winiwarter

引用次数: 2

TatWordNet: A Linguistic Linked Open Data-Integrated WordNet Resource for Tatar 鞑靼文字网:一个语言链接开放数据集成的鞑靼文字网资源

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.16

A. Kirillovich, M. Shaekhov, A. Galieva, O. Nevzorova, D. Ilvovsky, Natalia V. Loukachevitch

引用次数: 1