Arcchit Jain, Tal Friedman, Ondřej Kuželka, Guy Van den Broeck, L. D. Raedt
{"title":"Scalable Rule Learning in Probabilistic Knowledge Bases","authors":"Arcchit Jain, Tal Friedman, Ondřej Kuželka, Guy Van den Broeck, L. D. Raedt","doi":"10.24432/C5MW26","DOIUrl":"https://doi.org/10.24432/C5MW26","url":null,"abstract":"Knowledge Bases (KBs) are becoming increasingly large, sparse and probabilistic. These KBs are typically used to perform query inferences and rule mining. But their efficacy is only as high as their completeness. Efficiently utilizing incomplete KBs remains a major challenge as the current KB completion techniques either do not take into account the inherent uncertainty associated with each KB tuple or do not scale to large KBs. Probabilistic rule learning not only considers the probability of every KB tuple but also tackles the problem of KB completion in an explainable way. For any given probabilistic KB, it learns probabilistic first-order rules from its relations to identify interesting patterns. But, the current probabilistic rule learning techniques perform grounding to do probabilistic inference for evaluation of candidate rules. It does not scale well to large KBs as the time complexity of inference using grounding is exponential over the size of the KB. In this paper, we present SafeLearner -- a scalable solution to probabilistic KB completion that performs probabilistic rule learning using lifted probabilistic inference -- as faster approach instead of grounding. We compared SafeLearner to the state-of-the-art probabilistic rule learner ProbFOIL+ and to its deterministic contemporary AMIE+ on standard probabilistic KBs of NELL (Never-Ending Language Learner) and Yago. Our results demonstrate that SafeLearner scales as good as AMIE+ when learning simple rules and is also significantly faster than ProbFOIL+.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133203143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying entities into an incomplete ontology","authors":"Bhavana Dalvi, William W. Cohen, Jamie Callan","doi":"10.1145/2509558.2509564","DOIUrl":"https://doi.org/10.1145/2509558.2509564","url":null,"abstract":"Exponential growth of unlabeled web-scale datasets, and class hierarchies to represent them, has given rise to new challenges for hierarchical classification. It is costly and time consuming to create a complete ontology of classes to represent entities on the Web. Hence, there is a need for techniques that can do hierarchical classification of entities into incomplete ontologies. In this paper we present Hierarchical Exploratory EM algorithm (an extension of the Exploratory EM algorithm [7]) that takes a seed class hierarchy and seed class instances as input. Our method classifies relevant entities into some of the classes from the seed hierarchy and on its way adds newly discovered classes into the hierarchy. Experiments with subsets of the NELL ontology and text datasets derived from the ClueWeb09 corpus show that our Hierarchical Exploratory EM approach improves seed class F1 by up to 21% when compared to its semi-supervised counterpart.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124915576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael L. Wick, Sameer Singh, Harshal Pandya, A. McCallum
{"title":"A joint model for discovering and linking entities","authors":"Michael L. Wick, Sameer Singh, Harshal Pandya, A. McCallum","doi":"10.1145/2509558.2509570","DOIUrl":"https://doi.org/10.1145/2509558.2509570","url":null,"abstract":"Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133609580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ontology-aware partitioning for knowledge graph identification","authors":"J. Pujara, Hui Miao, L. Getoor, William W. Cohen","doi":"10.1145/2509558.2509562","DOIUrl":"https://doi.org/10.1145/2509558.2509562","url":null,"abstract":"Knowledge graphs provide a powerful representation of entities and the relationships between them, but automatically constructing such graphs from noisy extractions presents numerous challenges. Knowledge graph identification (KGI) is a technique for knowledge graph construction that jointly reasons about entities, attributes and relations in the presence of uncertain inputs and ontological constraints. Although knowledge graph identification shows promise scaling to knowledge graphs built from millions of extractions, increasingly powerful extraction engines may soon require knowledge graphs built from billions of extractions. One tool for scaling is partitioning extractions to allow reasoning to occur in parallel. We explore approaches which leverage ontological information and distributional information in partitioning. We compare these techniques with hash-based approaches, and show that using a richer partitioning model that incorporates the ontology graph and distribution of extractions provides superior results. Our results demonstrate that partitioning can result in order-of-magnitude speedups without reducing model performance.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131889658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Universal schema for entity type prediction","authors":"Limin Yao, S. Riedel, A. McCallum","doi":"10.1145/2509558.2509572","DOIUrl":"https://doi.org/10.1145/2509558.2509572","url":null,"abstract":"Categorizing entities by their types is useful in many applications, including knowledge base construction, relation extraction and query intent prediction. Fine-grained entity type ontologies are especially valuable, but typically difficult to design because of unavoidable quandaries about level of detail and boundary cases. Automatically classifying entities by type is challenging as well, usually involving hand-labeling data and training a supervised predictor.\u0000 This paper presents a universal schema approach to fine-grained entity type prediction. The set of types is taken as the union of textual surface patterns (e.g. appositives) and pre-defined types from available databases (e.g. Freebase)---yielding not tens or hundreds of types, but more than ten thousands of entity types, such as financier, criminologist, and musical trio. We robustly learn mutual implication among this large union by learning latent vector embeddings from probabilistic matrix factorization, thus avoiding the need for hand-labeled data. Experimental results demonstrate more than 30% reduction in error versus a traditional classification approach on predicting fine-grained entities types.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129840840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Roth, Tassilo Barth, Michael Wiegand, D. Klakow
{"title":"A survey of noise reduction methods for distant supervision","authors":"Benjamin Roth, Tassilo Barth, Michael Wiegand, D. Klakow","doi":"10.1145/2509558.2509571","DOIUrl":"https://doi.org/10.1145/2509558.2509571","url":null,"abstract":"We survey recent approaches to noise reduction in distant supervision learning for relation extraction. We group them according to the principles they are based on: at-least-one constraints, topic-based models, or pattern correlations. Besides describing them, we illustrate the fundamental differences and attempt to give an outlook to potentially fruitful further research. In addition, we identify related work in sentiment analysis which could profit from approaches to noise reduction.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"275 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122162878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting DBpedia for web search results clustering","authors":"M. Schuhmacher, Simone Paolo Ponzetto","doi":"10.1145/2509558.2509574","DOIUrl":"https://doi.org/10.1145/2509558.2509574","url":null,"abstract":"We present a knowledge-rich approach to Web search result clustering which exploits the output of an open-domain entity linker, as well as the types and topical concepts encoded within a wide-coverage ontology. Our results indicate that, thanks to an accurate and compact semantification of the search result snippets, we are able to achieve a competitive performance on a benchmarking dataset for this task.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124387883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting meronyms for a biology knowledge base using distant supervision","authors":"Xiao Ling, Peter Clark, Daniel S. Weld","doi":"10.1145/2509558.2509560","DOIUrl":"https://doi.org/10.1145/2509558.2509560","url":null,"abstract":"Knowledge of objects and their parts, meronym relations, are at the heart of many question-answering systems, but manually encoding these facts is impractical. Past researchers have tried hand-written patterns, supervised learning, and bootstrapped methods, but achieving both high precision and recall has proven elusive. This paper reports on a thorough exploration of distant supervision to learn a meronym extractor for the domain of college biology. We introduce a novel algorithm, generalizing the ``at least one'' assumption of multi-instance learning to handle the case where a fixed (but unknown) percentage of bag members are positive examples. Detailed experiments compare strategies for mention detection, negative example generation, leveraging out-of-domain meronyms, and evaluate the benefit of our multi-instance percentage model.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133883702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maryam Siahbani, Ravikiran Vadlapudi, M. Whitney, Anoop Sarkar
{"title":"Knowledge base population and visualization using an ontology based on semantic roles","authors":"Maryam Siahbani, Ravikiran Vadlapudi, M. Whitney, Anoop Sarkar","doi":"10.1145/2509558.2509573","DOIUrl":"https://doi.org/10.1145/2509558.2509573","url":null,"abstract":"This paper extracts facts using \"micro-reading\" of text in contrast to approaches that extract common-sense knowledge using \"macro-reading\" methods. Our goal is to extract detailed facts about events from natural language using a predicate-centered view of events (who did what to whom, when and how). We exploit semantic role labels in order to create a novel predicate-centric ontology for entities in our knowledge base. This allows users to find uncommon facts easily. To this end, we tightly couple our knowledge base and ontology to an information visualization system that can be used to explore and navigate events extracted from a large natural language text collection. We use our methodology to create a web-based visual browser of history events in Wikipedia.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123874162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using natural language to integrate, evaluate, and optimize extracted knowledge bases","authors":"Doug Downey, Chandra Bhagavatula, A. Yates","doi":"10.1145/2509558.2509569","DOIUrl":"https://doi.org/10.1145/2509558.2509569","url":null,"abstract":"Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by exchanging probability distributions over natural language, and evaluated by how well the output distributions predict held-out text. We describe the advantages of the approach, and detail remaining research challenges.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129146789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}