International Conference on Language, Data, and Knowledge最新文献

筛选
英文 中文
APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text apics - light:走向行间有光泽文本的语义丰富
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.27
Maxim Ionov
{"title":"APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text","authors":"Maxim Ionov","doi":"10.4230/OASIcs.LDK.2021.27","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.27","url":null,"abstract":"This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic examples from APiCS, the Atlas of Pidgin and Creole Language Structures. Interlinear glossed text (IGT) plays an important role in typological and theoretical linguistic research, especially with understudied and endangered languages: It provides a way to understand linguistic phenomena without necessarily knowing the source language which is crucial for these languages since native speakers are not always easily accessible. Previously, we presented Ligt, RDF vocabulary created for representing interlinear glosses in text segments. In this paper, we present our conversion of the APiCS IGT dataset into this model and describe our efforts in linking linguistic annotations to an external ontology to add semantic representation. 2012 ACM Subject Classification Information systems → Graph-based database models; Computing methodologies → Language resources; Computing methodologies → Knowledge representation and reasoning","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132572577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploring Causal Relationships Among Emotional and Topical Trajectories in Political Text Data 政治文本数据中情感轨迹与话题轨迹的因果关系探讨
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.38
Andreas Baumann, Klaus Hofmann, B. Kern, Anna Marakasova, J. Neidhardt, Tanja Wissik
{"title":"Exploring Causal Relationships Among Emotional and Topical Trajectories in Political Text Data","authors":"Andreas Baumann, Klaus Hofmann, B. Kern, Anna Marakasova, J. Neidhardt, Tanja Wissik","doi":"10.4230/OASIcs.LDK.2021.38","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.38","url":null,"abstract":"We explore relationships between dynamics of emotion (arousal and valence) and topical stability in political discourse in two diachronic corpora of Austrian German. In doing so, we assess interactions among emotional and topical dynamics related to political parties as well as interactions between two different domains of discourse: debates in the parliament and journalistic media. Methodologically, we employ unsupervised techniques, time-series clustering and Granger-causal modeling to detect potential interactions. We find that emotional and topical dynamics in the media are only rarely a reflex of dynamics in parliamentary discourse.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134105950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Encoder-Attention-Based Automatic Term Recognition (EA-ATR) 基于编码器-注意的自动词识别
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.23
Sampritha H. Manjunath, John P. Mccrae
{"title":"Encoder-Attention-Based Automatic Term Recognition (EA-ATR)","authors":"Sampritha H. Manjunath, John P. Mccrae","doi":"10.4230/OASIcs.LDK.2021.23","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.23","url":null,"abstract":"Automated Term Recognition (ATR) is the task of finding terminology from raw text. It involves designing and developing techniques for the mining of possible terms from the text and filtering these identified terms based on their scores calculated using scoring methodologies like frequency of occurrence and then ranking the terms. Current approaches often rely on statistics and regular expressions over part-of-speech tags to identify terms, but this is error-prone. We propose a deep learning technique to improve the process of identifying a possible sequence of terms. We improve the term recognition by using Bidirectional Encoder Representations from Transformers (BERT) based embeddings to identify which sequence of words is a term. This model is trained on Wikipedia titles. We assume all Wikipedia titles to be the positive set, and random n-grams generated from the raw text as a weak negative set. The positive and negative set will be trained using the Embed, Encode, Attend and Predict (EEAP) formulation using BERT as embeddings. The model will then be evaluated against different domain-specific corpora like GENIA – annotated biological terms and Krapivin – scientific papers from the computer science domain. 2012 ACM Subject Classification Information systems → Top-k retrieval in databases; Computing methodologies → Information extraction; Computing methodologies → Neural networks","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133416276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Representing the Under-Represented: a Dataset of Post-Colonial, and Migrant Writers 代表未被代表的群体:后殖民和移民作家的数据集
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.7
M. Stranisci, V. Patti, R. Damiano
{"title":"Representing the Under-Represented: a Dataset of Post-Colonial, and Migrant Writers","authors":"M. Stranisci, V. Patti, R. Damiano","doi":"10.4230/OASIcs.LDK.2021.7","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.7","url":null,"abstract":"In today’s media and in the Web of Data, non-Western people still suffer a lack of representation. In our work, we address this issue by presenting a pipeline for collecting and semantically encoding Wikipedia biographies of writers who are under-represented due to their non-Western origins, or their legal status in a country. The two main components of the ontology will be described, together with a framework for mapping textual biographies to their corresponding semantic representations. A description of the data set, and some examples of biographical texts conversion to the Ontology Classes, will be provided. 2012 ACM Subject Classification Information systems → Ontologies","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126659800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The Secret to Popular Chinese Web Novels: A Corpus-Driven Study 中国网络小说流行的秘密:语料库驱动的研究
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.24
Yi-Ju Lin, S. Hsieh
{"title":"The Secret to Popular Chinese Web Novels: A Corpus-Driven Study","authors":"Yi-Ju Lin, S. Hsieh","doi":"10.4230/OASIcs.LDK.2019.24","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.24","url":null,"abstract":"What is the secret to writing popular novels? The issue is an intriguing one among researchers from various fields. The goal of this study is to identify the linguistic features of several popular web novels as well as how the textual features found within and the overall tone interact with the genre and themes of each novel. Apart from writing style, non-textual information may also reveal details behind the success of web novels. Since web fiction has become a major industry with top writers making millions of dollars and their stories adapted into published books, determining essential elements of “publishable” novels is of importance. The present study further examines how non-textual information, namely, the number of hits, shares, favorites, and comments, may contribute to several features of the most popular published and unpublished web novels. Findings reveal that keywords, function words, and lexical diversity of a novel are highly related to its genres and writing style while dialogue proportion shows the narration voice of the story. In addition, relatively shorter sentences are found in these novels. The data also reveal that the number of favorites and comments serve as significant predictors for the number of shares and hits of unpublished web novels, respectively; however, the number of hits and shares of published web novels is more unpredictable. 2012 ACM Subject Classification","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130392584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages 关联数据中语言标签在对未知语言建模时的不足
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.4
Frances Gillis-Webber, Sabine Tittel
{"title":"The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages","authors":"Frances Gillis-Webber, Sabine Tittel","doi":"10.4230/OASIcs.LDK.2019.4","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.4","url":null,"abstract":"In recent years, the modeling of data from linguistic resources with Resource Description Framework (RDF), following the Linked Data paradigm and using the OntoLex-Lemon vocabulary, has become a prevalent method to create datasets for a multilingual web of data. An important aspect of data modeling is the use of language tags to mark lexicons, lexemes, word senses, etc. of a linguistic dataset. However, attempts to model data from lesser-known languages show significant shortcomings with the authoritative list of language codes by ISO 639: for many lesser-known languages spoken by minorities and also for historical stages of languages, language codes, the basis of language tags, are simply not available. This paper discusses these shortcomings based on the examples of three such languages, i.e., two varieties of click languages of Southern Africa together with Old French, and suggests solutions for the issues identified. 2012 ACM Subject Classification Computing methodologies → Language resources; Information systems → Dictionaries; Information systems → Semantic web description languages; Information systems → Graph-based database models; Information systems → Resource Description Framework (RDF); Software and its engineering → Interoperability; Information systems → Multilingual and cross-lingual retrieval; Computing methodologies → Information extraction; Computing methodologies → Artificial intelligence","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128531208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Enriching a Lexical Resource for French Verbs with Aspectual Information 用方面信息丰富法语动词词汇资源
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.10
Anna Kupsc, P. Haas, Rafael Marín, Antonio Balvet
{"title":"Enriching a Lexical Resource for French Verbs with Aspectual Information","authors":"Anna Kupsc, P. Haas, Rafael Marín, Antonio Balvet","doi":"10.4230/OASIcs.LDK.2021.10","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.10","url":null,"abstract":"The paper presents a syntactico-semantic lexicon of over a thousand French verbs. It has been created by manually adding lexical aspect features to verb frames from TreeLex [16]. We present how the original syntactic resource has been adapted to the current project, our aspect assignment procedure and an overview of the resulting lexical resource.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133240793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus 利用从松散并行文本-数据语料库中挖掘的类特定关联规则弥合本体和词典之间的差距
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.33
Basil Ell, Mohammad Fazleh Elahi, P. Cimiano
{"title":"Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus","authors":"Basil Ell, Mohammad Fazleh Elahi, P. Cimiano","doi":"10.4230/OASIcs.LDK.2021.33","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.33","url":null,"abstract":"There is a well-known lexical gap between content expressed in the form of natural language (NL) texts and content stored in an RDF knowledge base (KB). For tasks such as Information Extraction (IE), this gap needs to be bridged from NL to KB, so that facts extracted from text can be represented in RDF and can then be added to an RDF KB. For tasks such as Natural Language Generation, this gap needs to be bridged from KB to NL, so that facts stored in an RDF KB can be verbalized and read by humans. In this paper we propose LexExMachina, a new methodology that induces correspondences between lexical elements and KB elements by mining class-specific association rules. As an example of such an association rule, consider the rule that predicts that if the text about a person contains the token \"Greek\", then this person has the relation nationality to the entity Greece. Another rule predicts that if the text about a settlement contains the token \"Greek\", then this settlement has the relation country to the entity Greece. Such a rule can help in question answering, as it maps an adjective to the relevant KB terms, and it can help in information extraction from text. We propose and empirically investigate a set of 20 types of class-specific association rules together with different interestingness measures to rank them. We apply our method on a loosely-parallel text-data corpus that consists of data from DBpedia and texts from Wikipedia, and evaluate and provide empirical evidence for the utility of the rules for Question Answering.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130731840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cross-Dictionary Linking at Sense Level with a Double-Layer Classifier 基于双层分类器的语义级跨词典链接
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.20
R. Saurí, Louis Mahon, Irene Russo, Mironas Bitinis
{"title":"Cross-Dictionary Linking at Sense Level with a Double-Layer Classifier","authors":"R. Saurí, Louis Mahon, Irene Russo, Mironas Bitinis","doi":"10.4230/OASIcs.LDK.2019.20","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.20","url":null,"abstract":"We present a system for linking dictionaries at the sense level, which is part of a wider programme aiming to extend current lexical resources and to create new ones by automatic means. One of the main challenges of the sense linking task is the existence of non one-to-one mappings among senses. Our system handles this issue by addressing the task as a binary classification problem using standard Machine Learning methods, where each sense pair is classified independently from the others. In addition, it implements a second, statistically-based classification layer to also model the dependence existing among sense pairs, namely, the fact that a sense in one dictionary that is already linked to a sense in the other dictionary has a lower probability of being linked to a further sense. The resulting double-layer classifier achieves global Precision and Recall scores of 0.91 and 0.80, respectively. 2012 ACM Subject Classification Computing methodologies→ Lexical semantics; Computing methodologies → Language resources; Computing methodologies → Supervised learning by classification","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124247032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Translation-Based Dictionary Alignment for Under-Resourced Bantu Languages 资源不足的班图语基于翻译的字典对齐
International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.17
Thomas Eckart, Sonja E. Bosch, Dirk Goldhahn, U. Quasthoff, B. Klimek
{"title":"Translation-Based Dictionary Alignment for Under-Resourced Bantu Languages","authors":"Thomas Eckart, Sonja E. Bosch, Dirk Goldhahn, U. Quasthoff, B. Klimek","doi":"10.4230/OASIcs.LDK.2019.17","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.17","url":null,"abstract":"Despite a large number of active speakers, most Bantu languages can be considered as underor lessresourced languages. This includes especially the current situation of lexicographical data, which is highly unsatisfactory concerning the size, quality and consistency in format and provided information. Unfortunately, this does not only hold for the amount and quality of data for monolingual dictionaries, but also for their lack of interconnection to form a network of dictionaries. Current endeavours to promote the use of Bantu languages in primary and secondary education in countries like South Africa show the urgent need for high-quality digital dictionaries. This contribution describes a prototypical implementation for aligning Xhosa, Zimbabwean Ndebele and Kalanga language dictionaries based on their English translations using simple string matching techniques and via WordNet URIs. The RDF-based representation of the data using the Bantu Language Model (BLM) and – partial – references to the established WordNet dataset supported this process significantly. 2012 ACM Subject Classification Information systems → Resource Description Framework (RDF); Computing methodologies → Phonology / morphology; Information systems → Dictionaries","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124030241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信