{"title":"Mining rules to align knowledge bases","authors":"Luis Galárraga, N. Preda, Fabian M. Suchanek","doi":"10.1145/2509558.2509566","DOIUrl":"https://doi.org/10.1145/2509558.2509566","url":null,"abstract":"The Semantic Web has made huge progress in the last decade, and now comprises hundreds of knowledge bases (KBs). The Linked Open Data cloud connects the KBs in this Web of data. However, the links between the KBs are mostly concerned with the instances, not with the schema. Aligning the schemas is not easy, because the KBs may differ not just in their names for relations and classes, but also in their inherent structure. Therefore, we argue in this paper that advanced schema alignment is needed to tie the Semantic Web together. We put forward a particularly simple approach to illustrate how that might look.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130156397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reporting bias and knowledge acquisition","authors":"Jonathan Gordon, Benjamin Van Durme","doi":"10.1145/2509558.2509563","DOIUrl":"https://doi.org/10.1145/2509558.2509563","url":null,"abstract":"Much work in knowledge extraction from text tacitly assumes that the frequency with which people write about actions, outcomes, or properties is a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals. In this paper, we question this idea, examining the phenomenon of reporting bias and the challenge it poses for knowledge extraction. We conclude with discussion of approaches to learning commonsense knowledge from text despite this distortion.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130600619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Zheng, A. McCallum
{"title":"Joint inference of entities, relations, and coreference","authors":"Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Zheng, A. McCallum","doi":"10.1145/2509558.2509559","DOIUrl":"https://doi.org/10.1145/2509558.2509559","url":null,"abstract":"Although joint inference is an effective approach to avoid cascading of errors when inferring multiple natural language tasks, its application to information extraction has been limited to modeling only two tasks at a time, leading to modest improvements. In this paper, we focus on the three crucial tasks of automated extraction pipelines: entity tagging, relation extraction, and coreference. We propose a single, joint graphical model that represents the various dependencies between the tasks, allowing flow of uncertainty across task boundaries. Since the resulting model has a high tree-width and contains a large number of variables, we present a novel extension to belief propagation that sparsifies the domains of variables during inference. Experimental results show that our joint model consistently improves results on all three tasks as we represent more dependencies. In particular, our joint model obtains 12% error reduction on tagging over the isolated models.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133269061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining history with Le Monde","authors":"T. Huet, J. Biega, Fabian M. Suchanek","doi":"10.1145/2509558.2509567","DOIUrl":"https://doi.org/10.1145/2509558.2509567","url":null,"abstract":"The last decade has seen the rise of large knowledge bases, such as YAGO, DBpedia, Freebase, or NELL. In this paper, we show how this structured knowledge can help understand and mine trends in unstructured data. By combining YAGO with the archive of the French newspaper Le Monde, we can conduct analyses that would not be possible with word frequency statistics alone. We find indications about the increasing role that women play in politics, about the impact that the city of birth can have on a person's career, or about the average age of famous people in different professions.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116472335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael L. Wick, Sameer Singh, Ari Kobren, A. McCallum
{"title":"Assessing confidence of knowledge base content with an experimental study in entity resolution","authors":"Michael L. Wick, Sameer Singh, Ari Kobren, A. McCallum","doi":"10.1145/2509558.2509561","DOIUrl":"https://doi.org/10.1145/2509558.2509561","url":null,"abstract":"The purpose of this paper is to begin a conversation about the importance and role of confidence estimation in knowledge bases (KBs). KBs are never perfectly accurate, yet without confidence reporting their users are likely to treat them as if they were, possibly with serious real-world consequences. We define a notion of confidence based on the probability of a KB fact being true. For automatically constructed KBs we propose several algorithms for estimating this confidence from pre-existing probabilistic models of data integration and KB construction. In particular, this paper focuses on confidence estimation in entity resolution. A goal of our exposition here is to encourage creators and curators of KBs to include confidence estimates for entities and relations in their KBs.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133512586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyu Li, Roya Rastan, J. Shepherd, Hye-young Paik
{"title":"Automatic affiliation extraction from calls-for-papers","authors":"Xinyu Li, Roya Rastan, J. Shepherd, Hye-young Paik","doi":"10.1145/2509558.2509575","DOIUrl":"https://doi.org/10.1145/2509558.2509575","url":null,"abstract":"In this paper, we describe a system to collect information about academic affiliation (organisations where researchers work) from Calls-for-Papers for academic conferences. The system uses a range of heuristic approaches and open-source tools in order to extract and identify entities, and to incorporate the information into a pre-defined database schema. This forms part of a larger project to automatically populate and maintain a range of data related to academic research. The proposed system is currently being tested and some promising preliminary results are available.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127010900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter Clark, P. Harrison, Niranjan Balasubramanian
{"title":"A study of the knowledge base requirements for passing an elementary science test","authors":"Peter Clark, P. Harrison, Niranjan Balasubramanian","doi":"10.1145/2509558.2509565","DOIUrl":"https://doi.org/10.1145/2509558.2509565","url":null,"abstract":"Our long-term interest is in machines that contain large amounts of general and scientific knowledge, stored in a \"computable\" form that supports reasoning and explanation. As a medium-term focus for this, our goal is to have the computer pass a fourth-grade science test, anticipating that much of the required knowledge will need to be acquired semi-automatically. This paper presents the first step towards this goal, namely a blueprint of the knowledge requirements for an early science exam, and a brief description of the resources, methods, and challenges involved in the semi-automatic acquisition of that knowledge. The result of our analysis suggests that as well as fact extraction from text and statistically driven rule extraction, three other styles of automatic knowledge base construction (AKBC) would be useful: acquiring definitional knowledge, direct 'reading' of rules from texts that state them, and, given a particular representational framework (e.g., qualitative reasoning), acquisition of specific instances of those models from text (e..g, specific qualitative models).","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121931906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constructing query-specific knowledge bases","authors":"Jeffrey Dalton, Laura Dietz","doi":"10.1145/2509558.2509568","DOIUrl":"https://doi.org/10.1145/2509558.2509568","url":null,"abstract":"Abstract Large general purpose knowledge bases (KB) support a variety of complex tasks because of their structured relationships. However, these KBs lack coverage for specialized topics or use cases. In these scenarios, users often use keyword search over large unstructured collections, such as the web. Instead, we propose constructing a 'knowledge sketch' that leverages existing KB data elements and relevant text documents to construct query-specific KB data. A knowledge sketch is a distribution over entities, documents, and relationships between entities, all for a specific information need. In our experiments we construct knowledge sketches for queries from the TREC 2004 Robust track, which emphasizes complex queries which perform poorly with existing text retrieval approaches.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132552458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised discovery and extraction of semi-structured regions in text via self-information","authors":"Eric Yeh, J. Niekrasz, Dayne Freitag","doi":"10.1145/2509558.2509576","DOIUrl":"https://doi.org/10.1145/2509558.2509576","url":null,"abstract":"We describe a general method for identifying and extracting information from semi-structured regions of text embedded within a natural language document. These regions encode information according to ad hoc schemas and visual cues, instead of using the grammatical and presentational conventions of normal sentential language. Examples include tables, key-value listings, or repeated enumerations of properties. Because of their generally non-sentential nature, these regions can present problems for standard information extraction algorithms. Unlike previous work in table extraction, which relies on a relatively noiseless two-dimensional layout, our aim is to accommodate a wide variety of structure types. Our approach for identifying semi-structured regions is an unsupervised one, based on scoring unusual regularity inside the document. As content in semi-structured regions are governed by a schema, the occurrence of features encompassing textual content and visual appearance would be unusual compared to those seen in sentential language. Regularity refers to repetition of these unusual features, as semi-structured regions commonly encode more than a single row or group of information. To score this, we present a measure based on expected self-information, derived from statistics over patterns of textual categories and visual layout. We describe the results of an initial study to assess the ability of these measures to detect semi-structured text in a corpus culled from the web, and show that this measure outperform baseline methods on an average precision measure. We present initial work that uses these significant patterns to generate extraction rules, and conclude with a discussion of future directions.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128587349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting semantic knowledge from Wikipedia category names","authors":"P. Radhakrishnan, Vasudeva Varma","doi":"10.1145/2509558.2509577","DOIUrl":"https://doi.org/10.1145/2509558.2509577","url":null,"abstract":"Wikipedia being a large, freely available, frequently updated and community maintained knowledge base, has been central to much recent research. However, quite often we find that the information extracted from it has extraneous content. This paper proposes a method to extract useful information from Wikipedia, using Semantic Features derived from Wikipedia categories. The proposed method provides good performance as a Wikipedia category based method. Experimental results on benchmark datasets show that the proposed method achieves a correlation coefficient of 0.66 with human judgments. The Semantic Features derived by this method gave good correlation with human rankings in a web search query completion application.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"436 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115560209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}