Qingxia Liu , Gong Cheng , Kalpa Gunaratna , Yuzhong Qu
{"title":"Entity summarization: State of the art and future challenges","authors":"Qingxia Liu , Gong Cheng , Kalpa Gunaratna , Yuzhong Qu","doi":"10.1016/j.websem.2021.100647","DOIUrl":null,"url":null,"abstract":"<div><p><span>The increasing availability of semantic data has substantially enhanced Web applications. Semantic data such as RDF data is commonly represented as entity-property-value triples. The magnitude of semantic data, in particular the large number of triples describing an entity, could overload users with excessive amounts of information. This has motivated fruitful research on automated generation of summaries for entity descriptions to satisfy users’ information needs efficiently and effectively. We focus on this prominent topic of entity summarization, and our research objective is to present the first comprehensive survey of entity summarization research. Rather than separately reviewing each method, our contributions include (1) identifying and classifying technical features of existing methods to form a high-level overview, (2) identifying and classifying frameworks for combining multiple technical features adopted by existing methods, (3) collecting known benchmarks for intrinsic evaluation and efforts for extrinsic evaluation, and (4) suggesting research directions for future work. By investigating the literature, we synthesized two hierarchies of techniques. The first hierarchy categories generic technical features into several perspectives: frequency and centrality, informativeness, and diversity and coverage. In the second hierarchy we present domain-specific and task-specific technical features, including the use of domain knowledge, </span>context awareness<span><span><span>, and personalization. Our review demonstrated that existing methods are mainly unsupervised and they combine multiple technical features using various frameworks: random surfer models, similarity-based grouping, MMR-like re-ranking, or combinatorial optimization. We also found a few </span>deep learning based methods in recent research. Current evaluation results and our case study showed that the problem of entity summarization is still far from being solved. Based on the limitations of existing methods revealed in the review, we identified several future directions: the use of semantics, </span>human factors, machine and deep learning, non-extractive methods, and interactive methods.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.websem.2021.100647","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Semantics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1570826821000226","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 28
Abstract
The increasing availability of semantic data has substantially enhanced Web applications. Semantic data such as RDF data is commonly represented as entity-property-value triples. The magnitude of semantic data, in particular the large number of triples describing an entity, could overload users with excessive amounts of information. This has motivated fruitful research on automated generation of summaries for entity descriptions to satisfy users’ information needs efficiently and effectively. We focus on this prominent topic of entity summarization, and our research objective is to present the first comprehensive survey of entity summarization research. Rather than separately reviewing each method, our contributions include (1) identifying and classifying technical features of existing methods to form a high-level overview, (2) identifying and classifying frameworks for combining multiple technical features adopted by existing methods, (3) collecting known benchmarks for intrinsic evaluation and efforts for extrinsic evaluation, and (4) suggesting research directions for future work. By investigating the literature, we synthesized two hierarchies of techniques. The first hierarchy categories generic technical features into several perspectives: frequency and centrality, informativeness, and diversity and coverage. In the second hierarchy we present domain-specific and task-specific technical features, including the use of domain knowledge, context awareness, and personalization. Our review demonstrated that existing methods are mainly unsupervised and they combine multiple technical features using various frameworks: random surfer models, similarity-based grouping, MMR-like re-ranking, or combinatorial optimization. We also found a few deep learning based methods in recent research. Current evaluation results and our case study showed that the problem of entity summarization is still far from being solved. Based on the limitations of existing methods revealed in the review, we identified several future directions: the use of semantics, human factors, machine and deep learning, non-extractive methods, and interactive methods.
期刊介绍:
The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.