International Conference on Language, Data, and Knowledge最新文献_第7页

APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text apics - light:走向行间有光泽文本的语义丰富

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.27

Maxim Ionov

引用次数: 3

Exploring Causal Relationships Among Emotional and Topical Trajectories in Political Text Data 政治文本数据中情感轨迹与话题轨迹的因果关系探讨

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.38

Andreas Baumann, Klaus Hofmann, B. Kern, Anna Marakasova, J. Neidhardt, Tanja Wissik

引用次数: 0

Encoder-Attention-Based Automatic Term Recognition (EA-ATR) 基于编码器-注意的自动词识别

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.23

Sampritha H. Manjunath, John P. Mccrae

{"title":"Encoder-Attention-Based Automatic Term Recognition (EA-ATR)","authors":"Sampritha H. Manjunath, John P. Mccrae","doi":"10.4230/OASIcs.LDK.2021.23","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.23","url":null,"abstract":"Automated Term Recognition (ATR) is the task of finding terminology from raw text. It involves designing and developing techniques for the mining of possible terms from the text and filtering these identified terms based on their scores calculated using scoring methodologies like frequency of occurrence and then ranking the terms. Current approaches often rely on statistics and regular expressions over part-of-speech tags to identify terms, but this is error-prone. We propose a deep learning technique to improve the process of identifying a possible sequence of terms. We improve the term recognition by using Bidirectional Encoder Representations from Transformers (BERT) based embeddings to identify which sequence of words is a term. This model is trained on Wikipedia titles. We assume all Wikipedia titles to be the positive set, and random n-grams generated from the raw text as a weak negative set. The positive and negative set will be trained using the Embed, Encode, Attend and Predict (EEAP) formulation using BERT as embeddings. The model will then be evaluated against different domain-specific corpora like GENIA – annotated biological terms and Krapivin – scientific papers from the computer science domain. 2012 ACM Subject Classification Information systems → Top-k retrieval in databases; Computing methodologies → Information extraction; Computing methodologies → Neural networks","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133416276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Representing the Under-Represented: a Dataset of Post-Colonial, and Migrant Writers 代表未被代表的群体:后殖民和移民作家的数据集

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.7

M. Stranisci, V. Patti, R. Damiano

引用次数: 5

The Secret to Popular Chinese Web Novels: A Corpus-Driven Study 中国网络小说流行的秘密:语料库驱动的研究

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.24

Yi-Ju Lin, S. Hsieh

{"title":"The Secret to Popular Chinese Web Novels: A Corpus-Driven Study","authors":"Yi-Ju Lin, S. Hsieh","doi":"10.4230/OASIcs.LDK.2019.24","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.24","url":null,"abstract":"What is the secret to writing popular novels? The issue is an intriguing one among researchers from various ﬁelds. The goal of this study is to identify the linguistic features of several popular web novels as well as how the textual features found within and the overall tone interact with the genre and themes of each novel. Apart from writing style, non-textual information may also reveal details behind the success of web novels. Since web ﬁction has become a major industry with top writers making millions of dollars and their stories adapted into published books, determining essential elements of “publishable” novels is of importance. The present study further examines how non-textual information, namely, the number of hits, shares, favorites, and comments, may contribute to several features of the most popular published and unpublished web novels. Findings reveal that keywords, function words, and lexical diversity of a novel are highly related to its genres and writing style while dialogue proportion shows the narration voice of the story. In addition, relatively shorter sentences are found in these novels. The data also reveal that the number of favorites and comments serve as signiﬁcant predictors for the number of shares and hits of unpublished web novels, respectively; however, the number of hits and shares of published web novels is more unpredictable. 2012 ACM Subject Classiﬁcation","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130392584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages 关联数据中语言标签在对未知语言建模时的不足

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.4

Frances Gillis-Webber, Sabine Tittel

{"title":"The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages","authors":"Frances Gillis-Webber, Sabine Tittel","doi":"10.4230/OASIcs.LDK.2019.4","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.4","url":null,"abstract":"In recent years, the modeling of data from linguistic resources with Resource Description Framework (RDF), following the Linked Data paradigm and using the OntoLex-Lemon vocabulary, has become a prevalent method to create datasets for a multilingual web of data. An important aspect of data modeling is the use of language tags to mark lexicons, lexemes, word senses, etc. of a linguistic dataset. However, attempts to model data from lesser-known languages show significant shortcomings with the authoritative list of language codes by ISO 639: for many lesser-known languages spoken by minorities and also for historical stages of languages, language codes, the basis of language tags, are simply not available. This paper discusses these shortcomings based on the examples of three such languages, i.e., two varieties of click languages of Southern Africa together with Old French, and suggests solutions for the issues identified. 2012 ACM Subject Classification Computing methodologies → Language resources; Information systems → Dictionaries; Information systems → Semantic web description languages; Information systems → Graph-based database models; Information systems → Resource Description Framework (RDF); Software and its engineering → Interoperability; Information systems → Multilingual and cross-lingual retrieval; Computing methodologies → Information extraction; Computing methodologies → Artificial intelligence","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128531208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Enriching a Lexical Resource for French Verbs with Aspectual Information 用方面信息丰富法语动词词汇资源

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.10

Anna Kupsc, P. Haas, Rafael Marín, Antonio Balvet

引用次数: 0

Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus 利用从松散并行文本-数据语料库中挖掘的类特定关联规则弥合本体和词典之间的差距

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2021.33

Basil Ell, Mohammad Fazleh Elahi, P. Cimiano

{"title":"Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus","authors":"Basil Ell, Mohammad Fazleh Elahi, P. Cimiano","doi":"10.4230/OASIcs.LDK.2021.33","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2021.33","url":null,"abstract":"There is a well-known lexical gap between content expressed in the form of natural language (NL) texts and content stored in an RDF knowledge base (KB). For tasks such as Information Extraction (IE), this gap needs to be bridged from NL to KB, so that facts extracted from text can be represented in RDF and can then be added to an RDF KB. For tasks such as Natural Language Generation, this gap needs to be bridged from KB to NL, so that facts stored in an RDF KB can be verbalized and read by humans. In this paper we propose LexExMachina, a new methodology that induces correspondences between lexical elements and KB elements by mining class-specific association rules. As an example of such an association rule, consider the rule that predicts that if the text about a person contains the token \"Greek\", then this person has the relation nationality to the entity Greece. Another rule predicts that if the text about a settlement contains the token \"Greek\", then this settlement has the relation country to the entity Greece. Such a rule can help in question answering, as it maps an adjective to the relevant KB terms, and it can help in information extraction from text. We propose and empirically investigate a set of 20 types of class-specific association rules together with different interestingness measures to rank them. We apply our method on a loosely-parallel text-data corpus that consists of data from DBpedia and texts from Wikipedia, and evaluate and provide empirical evidence for the utility of the rules for Question Answering.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130731840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Cross-Dictionary Linking at Sense Level with a Double-Layer Classifier 基于双层分类器的语义级跨词典链接

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.20

R. Saurí, Louis Mahon, Irene Russo, Mironas Bitinis

引用次数: 4

Translation-Based Dictionary Alignment for Under-Resourced Bantu Languages 资源不足的班图语基于翻译的字典对齐

International Conference on Language, Data, and Knowledge Pub Date : 1900-01-01 DOI: 10.4230/OASIcs.LDK.2019.17

Thomas Eckart, Sonja E. Bosch, Dirk Goldhahn, U. Quasthoff, B. Klimek

{"title":"Translation-Based Dictionary Alignment for Under-Resourced Bantu Languages","authors":"Thomas Eckart, Sonja E. Bosch, Dirk Goldhahn, U. Quasthoff, B. Klimek","doi":"10.4230/OASIcs.LDK.2019.17","DOIUrl":"https://doi.org/10.4230/OASIcs.LDK.2019.17","url":null,"abstract":"Despite a large number of active speakers, most Bantu languages can be considered as underor lessresourced languages. This includes especially the current situation of lexicographical data, which is highly unsatisfactory concerning the size, quality and consistency in format and provided information. Unfortunately, this does not only hold for the amount and quality of data for monolingual dictionaries, but also for their lack of interconnection to form a network of dictionaries. Current endeavours to promote the use of Bantu languages in primary and secondary education in countries like South Africa show the urgent need for high-quality digital dictionaries. This contribution describes a prototypical implementation for aligning Xhosa, Zimbabwean Ndebele and Kalanga language dictionaries based on their English translations using simple string matching techniques and via WordNet URIs. The RDF-based representation of the data using the Bantu Language Model (BLM) and – partial – references to the established WordNet dataset supported this process significantly. 2012 ACM Subject Classification Information systems → Resource Description Framework (RDF); Computing methodologies → Phonology / morphology; Information systems → Dictionaries","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124030241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3