International Conference on Language, Data, and Knowledge最新文献_第6页

HISTORIAE, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case 作为语言数据科学的社会文化转型的历史。人文学科用例

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.34

F. Armaselu, E. Apostol, Anas Fahad Khan, Chaya Liebeskind, Barbara McGillivray, Ciprian-Octavian Truică, G. Oleškevičienė

{"title":"HISTORIAE, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case","authors":"F. Armaselu, E. Apostol, Anas Fahad Khan, Chaya Liebeskind, Barbara McGillivray, Ciprian-Octavian Truică, G. Oleškevičienė","doi":"10.4230/OASIcs.LDK.2021.34","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.34","url":null,"abstract":"The paper proposes an interdisciplinary approach including methods from disciplines such as history of concepts, linguistics, natural language processing (NLP) and Semantic Web, to create a comparative framework for detecting semantic change in multilingual historical corpora and generating diachronic ontologies as linguistic linked open data (LLOD). Initiated as a use case (UC4.2.1) within the COST Action Nexus Linguarum, European network for Web-centred linguistic data science, the study will explore emerging trends in knowledge extraction, analysis and representation from linguistic data science, and apply the devised methodology to datasets in the humanities to trace the evolution of concepts from the domain of socio-cultural transformation. The paper will describe the main elements of the methodological framework and preliminary planning of the intended workflow. 2012 ACM Subject Classification Computing methodologies → Semantic networks; Computing methodologies → Ontology engineering; Computing methodologies → Temporal reasoning; Computing methodologies → Lexical semantics; Computing methodologies → Language resources; Computing methodologies → Information extraction","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130605789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Explainable Zero-Shot Topic Extraction Using a Common-Sense Knowledge Graph 使用常识知识图的可解释零采样主题提取

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.17

Ismail Harrando, Raphael Troncy

{"title":"Explainable Zero-Shot Topic Extraction Using a Common-Sense Knowledge Graph","authors":"Ismail Harrando, Raphael Troncy","doi":"10.4230/OASIcs.LDK.2021.17","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.17","url":null,"abstract":"Pre-trained word embeddings constitute an essential building block for many NLP systems and applications, notably when labeled data is scarce. However, since they compress word meanings into a fixed-dimensional representation, their use usually lack interpretability beyond a measure of similarity and linear analogies that do not always reflect real-world word relatedness, which can be important for many NLP applications. In this paper, we propose a model which extracts topics from text documents based on the common-sense knowledge available in ConceptNet [24] – a semantic concept graph that explicitly encodes real-world relations between words – and without any human supervision. When combining both ConceptNet’s knowledge graph and graph embeddings, our approach outperforms other baselines in the zero-shot setting, while generating a human-understandable explanation for its predictions through the knowledge graph. We study the importance of some modeling choices and criteria for designing the model, and we demonstrate that it can be used to label data for a supervised classifier to achieve an even better performance without relying on any humanly-annotated training data. We publish the code of our approach at https://github.com/D2KLab/ZeSTE and we provide a user friendly demo at https://zeste.tools.eurecom.fr/. 2012 ACM Subject Classification Computing methodologies → Information extraction","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123203309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Automatic Detection of Language and Annotation Model Information in CoNLL Corpora CoNLL语料库中语言和标注模型信息的自动检测

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.23

Frank Abromeit, C. Chiarcos

引用次数: 5

A Workbench for Corpus Linguistic Discourse Analysis 语料库语言语篇分析工作台

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.26

J. Krasselt, Matthias Fluor, K. Rothenhäusler, P. Dreesen

引用次数: 2

Enriching Word Embeddings with Food Knowledge for Ingredient Retrieval 用食品知识丰富词嵌入用于成分检索

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.15

Álvaro Mendes Samagaio, Henrique Lopes Cardoso, David Ribeiro

{"title":"Enriching Word Embeddings with Food Knowledge for Ingredient Retrieval","authors":"Álvaro Mendes Samagaio, Henrique Lopes Cardoso, David Ribeiro","doi":"10.4230/OASIcs.LDK.2021.15","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.15","url":null,"abstract":"Smart assistants and recommender systems must deal with lots of information coming from different sources and having different formats. This is more frequent in text data, which presents increased variability and complexity, and is rather common for conversational assistants or chatbots. Moreover, this issue is very evident in the food and nutrition lexicon, where the semantics present increased variability, namely due to hypernyms and hyponyms. This work describes the creation of a set of word embeddings based on the incorporation of information from a food thesaurus – LanguaL – through retrofitting. The ingredients were classified according to three different facet label groups. Retrofitted embeddings seem to properly encode food-specific knowledge, as shown by an increase on accuracy as compared to generic embeddings (+23%, +10% and +31% per group). Moreover, a weighing mechanism based on TF-IDF was applied to embedding creation before retrofitting, also bringing an increase on accuracy (+5%, +9% and +5% per group). Finally, the approach has been tested with human users in an ingredient retrieval exercise, showing very positive evaluation (77.3% of the volunteer testers preferred this method over a string-based matching algorithm). 2012 ACM Subject Classification Computing methodologies → Artificial intelligence; Computing methodologies → Knowledge representation and reasoning; Computing methodologies → Lexical","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123862438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Inconsistency Detection in Job Postings 招聘信息中的不一致检测

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.25

Joana Urbano, M. Couto, Gil Rocha, Henrique Lopes Cardoso

引用次数: 0

A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian 基于推特语料库和词汇库的塞尔维亚语滥用语音检测

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.13

Danka Jokic, R. Stanković, Cvetana Krstev, Branislava Šandrih

引用次数: 3

A Smell is Worth a Thousand Words: Olfactory Information Extraction and Semantic Processing in a Multilingual Perspective (Invited Talk) 一个气味胜过千言万语:多语言视角下的嗅觉信息提取和语义处理(特邀演讲)

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.2

Sara Tonelli

{"title":"A Smell is Worth a Thousand Words: Olfactory Information Extraction and Semantic Processing in a Multilingual Perspective (Invited Talk)","authors":"Sara Tonelli","doi":"10.4230/OASIcs.LDK.2021.2","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.2","url":null,"abstract":"More than any other sense, smell is linked directly to our emotions and our memories. However, smells are intangible and very difficult to preserve, making it hard to effectively identify, consolidate, and promote the wide-ranging role scents and smelling have in our cultural heritage. While some novel approaches have been recently proposed to monitor so-called urban smellscapes and analyse the olfactory dimension of our environments (Quercia et al., [1]), when it comes to smellscapes from the past little research has been done to keep track of how places, events and people have been described from an olfactory perspective. Fortunately, some key prerequisites for addressing this problem are now in place. In recent years, European cultural heritage institutions have invested heavily in large-scale digitisation: we hold a wealth of object, text and image data which can now be analysed using artificial intelligence. What remains missing is a methodology for the extraction of scent-related information from large amounts of texts, as well as a broader awareness of the wealth of historical olfactory descriptions, experiences and memories contained within the heritage datasets. In this talk, I will describe ongoing activities towards this goal, focused on text mining and semantic processing of olfactory information. I will present the general framework designed to annotate smell events in documents, and some preliminary results on information extraction approaches in a multilingual scenario. I will discuss the main findings and the challenges related to modelling textual descriptions of smells, including the metaphorical use of smell-related terms and the well-known limitations of smell vocabulary in European languages compared to other senses. 2012 ACM Subject Classification Applied computing → Document analysis; Information systems → Digital libraries and archives","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129354682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Data Augmentation Approach for Sign-Language-To-Text Translation In-The-Wild 一种野外手语文本翻译的数据增强方法

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.36

Fabrizio Nunnari, C. España-Bonet, Eleftherios Avramidis

引用次数: 7

SPARQL Query Recommendation by Example: Assessing the Impact of Structural Analysis on Star-Shaped Queries SPARQL查询推荐示例:评估结构分析对星形查询的影响

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.1

A. Adamou, Carlo Allocca, M. d’Aquin, E. Motta

引用次数: 1