Demystifying the Semantics of Relevant Objects in Scholarly Collections: A Probabilistic Approach

Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries Pub Date : 2015-06-21 DOI:10.1145/2756406.2756923

J. M. Pinto, Wolf-Tilo Balke

{"title":"Demystifying the Semantics of Relevant Objects in Scholarly Collections: A Probabilistic Approach","authors":"J. M. Pinto, Wolf-Tilo Balke","doi":"10.1145/2756406.2756923","DOIUrl":null,"url":null,"abstract":"Efforts to make highly specialized knowledge accessible through scientific digital libraries need to go beyond mere bibliographic metadata, since here information search is mostly entity-centric. Previous work has realized this trend and developed different methods to recognize and (to some degree even automatically) annotate several important types of entities: genes and proteins, chemical structures and molecules, or drug names to name but a few. Moreover, such entities are often crossreferenced with entries in curated databases. However, several questions still remain to be answered: Given a scientific discipline what are the important entities? How can they be automatically identified? Are really all of them relevant, i.e. do all of them carry deeper semantics for assessing a publication? How can they be represented, described, and subsequently annotated? How can they be used for search tasks? In this work we focus on answering some of these questions. We claim that to bring the use of scientific digital libraries to the next level we must find treat topic-specific entities as first class citizens and deeply integrate their semantics into the search process. To support this we propose a novel probabilistic approach that not only successfully provides a solution to the integration problem, but also demonstrates how to leverage the knowledge encoded in entities and provide insights to explore the use of our approach in different scenarios. Finally, we show how our results can benefit information providers.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"os-44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2756406.2756923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Efforts to make highly specialized knowledge accessible through scientific digital libraries need to go beyond mere bibliographic metadata, since here information search is mostly entity-centric. Previous work has realized this trend and developed different methods to recognize and (to some degree even automatically) annotate several important types of entities: genes and proteins, chemical structures and molecules, or drug names to name but a few. Moreover, such entities are often crossreferenced with entries in curated databases. However, several questions still remain to be answered: Given a scientific discipline what are the important entities? How can they be automatically identified? Are really all of them relevant, i.e. do all of them carry deeper semantics for assessing a publication? How can they be represented, described, and subsequently annotated? How can they be used for search tasks? In this work we focus on answering some of these questions. We claim that to bring the use of scientific digital libraries to the next level we must find treat topic-specific entities as first class citizens and deeply integrate their semantics into the search process. To support this we propose a novel probabilistic approach that not only successfully provides a solution to the integration problem, but also demonstrates how to leverage the knowledge encoded in entities and provide insights to explore the use of our approach in different scenarios. Finally, we show how our results can benefit information providers.

查看原文本刊更多论文

学术收藏中相关对象语义的揭秘:一种概率方法

通过科学数字图书馆使高度专业化的知识可访问的努力需要超越仅仅书目元数据，因为这里的信息搜索主要是以实体为中心的。以前的工作已经意识到这一趋势，并开发了不同的方法来识别和(在某种程度上甚至是自动的)注释几种重要类型的实体:基因和蛋白质，化学结构和分子，或药物名称等等。此外，这些实体经常与管理数据库中的条目交叉引用。然而，仍有几个问题有待回答:给定一门科学学科，什么是重要的实体?如何自动识别它们?它们真的都是相关的吗?也就是说，它们是否都有更深层次的语义来评估一篇文章?如何表示、描述和随后注释它们?如何将它们用于搜索任务?在这项工作中，我们专注于回答其中的一些问题。我们声称，为了将科学数字图书馆的使用提升到一个新的水平，我们必须将特定主题的实体视为一流公民，并将其语义深度整合到搜索过程中。为了支持这一点，我们提出了一种新颖的概率方法，该方法不仅成功地提供了集成问题的解决方案，而且还演示了如何利用实体中编码的知识，并为探索在不同场景中使用我们的方法提供了见解。最后，我们展示了我们的结果如何使信息提供者受益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries

自引率

0.00%

发文量