{"title":"Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus","authors":"Basil Ell, Mohammad Fazleh Elahi, P. Cimiano","doi":"10.4230/OASIcs.LDK.2021.33","DOIUrl":null,"url":null,"abstract":"There is a well-known lexical gap between content expressed in the form of natural language (NL) texts and content stored in an RDF knowledge base (KB). For tasks such as Information Extraction (IE), this gap needs to be bridged from NL to KB, so that facts extracted from text can be represented in RDF and can then be added to an RDF KB. For tasks such as Natural Language Generation, this gap needs to be bridged from KB to NL, so that facts stored in an RDF KB can be verbalized and read by humans. In this paper we propose LexExMachina, a new methodology that induces correspondences between lexical elements and KB elements by mining class-specific association rules. As an example of such an association rule, consider the rule that predicts that if the text about a person contains the token \"Greek\", then this person has the relation nationality to the entity Greece. Another rule predicts that if the text about a settlement contains the token \"Greek\", then this settlement has the relation country to the entity Greece. Such a rule can help in question answering, as it maps an adjective to the relevant KB terms, and it can help in information extraction from text. We propose and empirically investigate a set of 20 types of class-specific association rules together with different interestingness measures to rank them. We apply our method on a loosely-parallel text-data corpus that consists of data from DBpedia and texts from Wikipedia, and evaluate and provide empirical evidence for the utility of the rules for Question Answering.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Language, Data, and Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/OASIcs.LDK.2021.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
There is a well-known lexical gap between content expressed in the form of natural language (NL) texts and content stored in an RDF knowledge base (KB). For tasks such as Information Extraction (IE), this gap needs to be bridged from NL to KB, so that facts extracted from text can be represented in RDF and can then be added to an RDF KB. For tasks such as Natural Language Generation, this gap needs to be bridged from KB to NL, so that facts stored in an RDF KB can be verbalized and read by humans. In this paper we propose LexExMachina, a new methodology that induces correspondences between lexical elements and KB elements by mining class-specific association rules. As an example of such an association rule, consider the rule that predicts that if the text about a person contains the token "Greek", then this person has the relation nationality to the entity Greece. Another rule predicts that if the text about a settlement contains the token "Greek", then this settlement has the relation country to the entity Greece. Such a rule can help in question answering, as it maps an adjective to the relevant KB terms, and it can help in information extraction from text. We propose and empirically investigate a set of 20 types of class-specific association rules together with different interestingness measures to rank them. We apply our method on a loosely-parallel text-data corpus that consists of data from DBpedia and texts from Wikipedia, and evaluate and provide empirical evidence for the utility of the rules for Question Answering.
以自然语言(NL)文本形式表示的内容与存储在RDF知识库(KB)中的内容之间存在众所周知的词汇差距。对于诸如信息抽取(Information Extraction, IE)之类的任务,需要在NL和知识库之间架起桥梁,以便从文本中提取的事实可以用RDF表示,然后可以添加到RDF知识库中。对于诸如自然语言生成(Natural Language Generation)这样的任务,需要将这种差距从知识库弥合到自然语言,这样存储在RDF知识库中的事实就可以被人类用语言描述和读取。在本文中,我们提出了一种新的方法LexExMachina,它通过挖掘特定类的关联规则来诱导词法元素和知识库元素之间的对应关系。作为这种关联规则的一个示例,考虑这样一个规则,该规则预测,如果关于某人的文本包含令牌“Greek”,则此人具有与实体希腊的国籍关系。另一个规则预测,如果关于结算的文本包含令牌“Greek”,则该结算具有与实体希腊的关系国家。这样的规则可以帮助回答问题,因为它将形容词映射到相关的知识库术语,并且可以帮助从文本中提取信息。我们提出并实证研究了一组20种特定于类的关联规则,以及不同的有趣度度量来对它们进行排名。我们将我们的方法应用于一个松散并行的文本-数据语料库,该语料库由来自DBpedia的数据和来自Wikipedia的文本组成,并评估并提供了问答规则的实用性的经验证据。