Some Challenges in Integrating Information on Protein Interactions and a Partial Solution

19th International Conference on Scientific and Statistical Database Management (SSDBM 2007) Pub Date : 2007-07-09 DOI:10.1109/SSDBM.2007.23

H. Jagadish

{"title":"Some Challenges in Integrating Information on Protein Interactions and a Partial Solution","authors":"H. Jagadish","doi":"10.1109/SSDBM.2007.23","DOIUrl":null,"url":null,"abstract":"Summary form only given. Independently constructed sources of (scientific) data frequently have overlapping, and sometimes contradictory, information content. Current methods of use fall into two categories: force the integration step onto the user, or merely collate the data, at most transforming it into a common format. The first method places an undue burden on the user to fit all of the jigsaw puzzle pieces together. The second leads to redundancy and possible inconsistency. We propose a third: deep data integration. The idea is to provide a cohesive view of all information currently available for a protein, interaction, or other object of scientific interest. Doing so requires that multiple pieces of data about the object, in different sources, first be identified as referring to the same object, if required through \"third party\" information; then that a single \"record\" be created comprising the union of the information in multiple matched records, keeping track of differences where these occur; and finally by tracking the provenance of every value in the dataset so scientists can judge what items to use, and how to resolve differences.In this talk, the I will describe our experiences with this approach in MiMI (http://mimi.ncibi.org). I will also discuss barriers to domain scientist use of the system, and my thoughts regarding how to make systems truly \"usable\".","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2007.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Summary form only given. Independently constructed sources of (scientific) data frequently have overlapping, and sometimes contradictory, information content. Current methods of use fall into two categories: force the integration step onto the user, or merely collate the data, at most transforming it into a common format. The first method places an undue burden on the user to fit all of the jigsaw puzzle pieces together. The second leads to redundancy and possible inconsistency. We propose a third: deep data integration. The idea is to provide a cohesive view of all information currently available for a protein, interaction, or other object of scientific interest. Doing so requires that multiple pieces of data about the object, in different sources, first be identified as referring to the same object, if required through "third party" information; then that a single "record" be created comprising the union of the information in multiple matched records, keeping track of differences where these occur; and finally by tracking the provenance of every value in the dataset so scientists can judge what items to use, and how to resolve differences.In this talk, the I will describe our experiences with this approach in MiMI (http://mimi.ncibi.org). I will also discuss barriers to domain scientist use of the system, and my thoughts regarding how to make systems truly "usable".

查看原文本刊更多论文

蛋白质相互作用信息整合的若干挑战及部分解决方案

只提供摘要形式。独立构建的(科学)数据来源经常有重叠，有时甚至是相互矛盾的信息内容。当前的使用方法分为两类:强迫用户执行集成步骤，或者仅仅整理数据，最多将其转换为通用格式。第一种方法给用户带来了不必要的负担，让他们把所有的拼图碎片拼凑在一起。第二种导致冗余和可能的不一致。我们提出第三种方法:深度数据集成。其目的是为蛋白质、相互作用或其他科学兴趣对象提供当前可用的所有信息的一个有凝聚力的视图。如果需要通过“第三方”信息，则需要首先将来自不同来源的关于对象的多个数据块识别为指向同一对象;然后，创建一个单一的“记录”，包括多个匹配记录中的信息的并集，并跟踪这些发生的差异;最后，通过跟踪数据集中每个值的来源，科学家可以判断使用什么项目，以及如何解决差异。在这次演讲中，我将描述我们在MiMI (http://mimi.ncibi.org)中使用这种方法的经验。我还将讨论领域科学家使用系统的障碍，以及我关于如何使系统真正“可用”的想法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)

自引率

0.00%

发文量