{"title":"知识驱动的命名实体分类与链接联合后验修正","authors":"Marco Rospocher , Francesco Corcoglioniti","doi":"10.1016/j.websem.2020.100617","DOIUrl":null,"url":null,"abstract":"<div><p>In this work we address the problem of extracting quality entity knowledge from natural language text, an important task for the automatic construction of knowledge graphs from unstructured content.</p><p><span><span>More in details, we investigate the benefit of performing a joint posterior revision, driven by ontological background knowledge, of the annotations resulting from </span>natural language processing<span> (NLP) entity analyses such as named entity recognition<span> and classification (NERC) and entity linking (EL). The revision is performed via a probabilistic model, called </span></span></span><span>jpark</span><span>, that given the candidate annotations independently identified by NERC and EL tools on the same textual entity mention, reconsiders the best annotation choice performed by the tools in light of the coherence of the candidate annotations with the ontological knowledge. The model can be explicitly instructed to handle the information that an entity can potentially be NIL (i.e., lacking a corresponding referent in the target linking knowledge base), exploiting it for predicting the best NERC and EL annotation combination.</span></p><p>We present a comprehensive evaluation of <span>jpark</span> along various dimensions, comparing its performances with and without exploiting NIL information, as well as the usage of three different background knowledge resources (YAGO, DBpedia, and Wikidata) to build the model. The evaluation, conducted using different tools (the popular Stanford NER and DBpedia Spotlight, as well as the more recent Flair NER and End-to-End Neural EL) with three reference datasets (AIDA, MEANTIME, and TAC-KBP), empirically confirms the capability of the model to improve the quality of the annotations of the given tools, and thus their performances on the tasks they are designed for.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2020.100617","citationCount":"0","resultStr":"{\"title\":\"Knowledge-driven joint posterior revision of named entity classification and linking\",\"authors\":\"Marco Rospocher , Francesco Corcoglioniti\",\"doi\":\"10.1016/j.websem.2020.100617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In this work we address the problem of extracting quality entity knowledge from natural language text, an important task for the automatic construction of knowledge graphs from unstructured content.</p><p><span><span>More in details, we investigate the benefit of performing a joint posterior revision, driven by ontological background knowledge, of the annotations resulting from </span>natural language processing<span> (NLP) entity analyses such as named entity recognition<span> and classification (NERC) and entity linking (EL). The revision is performed via a probabilistic model, called </span></span></span><span>jpark</span><span>, that given the candidate annotations independently identified by NERC and EL tools on the same textual entity mention, reconsiders the best annotation choice performed by the tools in light of the coherence of the candidate annotations with the ontological knowledge. The model can be explicitly instructed to handle the information that an entity can potentially be NIL (i.e., lacking a corresponding referent in the target linking knowledge base), exploiting it for predicting the best NERC and EL annotation combination.</span></p><p>We present a comprehensive evaluation of <span>jpark</span> along various dimensions, comparing its performances with and without exploiting NIL information, as well as the usage of three different background knowledge resources (YAGO, DBpedia, and Wikidata) to build the model. The evaluation, conducted using different tools (the popular Stanford NER and DBpedia Spotlight, as well as the more recent Flair NER and End-to-End Neural EL) with three reference datasets (AIDA, MEANTIME, and TAC-KBP), empirically confirms the capability of the model to improve the quality of the annotations of the given tools, and thus their performances on the tasks they are designed for.</p></div>\",\"PeriodicalId\":49951,\"journal\":{\"name\":\"Journal of Web Semantics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.websem.2020.100617\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Web Semantics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1570826820300500\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Semantics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1570826820300500","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
在这项工作中,我们解决了从自然语言文本中提取优质实体知识的问题,这是从非结构化内容中自动构建知识图的重要任务。更详细地说,我们研究了在本体背景知识的驱动下,对自然语言处理(NLP)实体分析(如命名实体识别和分类(NERC)和实体链接(EL))产生的注释进行联合后验修正的好处。修改是通过一个称为jpark的概率模型执行的,该模型给定NERC和EL工具在相同文本实体上独立识别的候选注释,根据候选注释与本体论知识的一致性,重新考虑工具执行的最佳注释选择。可以明确地指示模型处理实体可能为NIL的信息(即,在目标链接知识库中缺乏相应的参考),利用它来预测最佳的NERC和EL注释组合。我们从各个方面对jpark进行了全面的评估,比较了它在使用和不使用NIL信息时的性能,以及使用三种不同的背景知识资源(YAGO、DBpedia和Wikidata)来构建模型。使用不同的工具(流行的Stanford NER和DBpedia Spotlight,以及最近的Flair NER和end - end Neural EL)和三个参考数据集(AIDA、MEANTIME和TAC-KBP)进行的评估,从经验上证实了该模型提高给定工具注释质量的能力,从而提高了它们在设计任务上的性能。
Knowledge-driven joint posterior revision of named entity classification and linking
In this work we address the problem of extracting quality entity knowledge from natural language text, an important task for the automatic construction of knowledge graphs from unstructured content.
More in details, we investigate the benefit of performing a joint posterior revision, driven by ontological background knowledge, of the annotations resulting from natural language processing (NLP) entity analyses such as named entity recognition and classification (NERC) and entity linking (EL). The revision is performed via a probabilistic model, called jpark, that given the candidate annotations independently identified by NERC and EL tools on the same textual entity mention, reconsiders the best annotation choice performed by the tools in light of the coherence of the candidate annotations with the ontological knowledge. The model can be explicitly instructed to handle the information that an entity can potentially be NIL (i.e., lacking a corresponding referent in the target linking knowledge base), exploiting it for predicting the best NERC and EL annotation combination.
We present a comprehensive evaluation of jpark along various dimensions, comparing its performances with and without exploiting NIL information, as well as the usage of three different background knowledge resources (YAGO, DBpedia, and Wikidata) to build the model. The evaluation, conducted using different tools (the popular Stanford NER and DBpedia Spotlight, as well as the more recent Flair NER and End-to-End Neural EL) with three reference datasets (AIDA, MEANTIME, and TAC-KBP), empirically confirms the capability of the model to improve the quality of the annotations of the given tools, and thus their performances on the tasks they are designed for.
期刊介绍:
The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.