{"title":"蛋白质相互作用信息整合的若干挑战及部分解决方案","authors":"H. Jagadish","doi":"10.1109/SSDBM.2007.23","DOIUrl":null,"url":null,"abstract":"Summary form only given. Independently constructed sources of (scientific) data frequently have overlapping, and sometimes contradictory, information content. Current methods of use fall into two categories: force the integration step onto the user, or merely collate the data, at most transforming it into a common format. The first method places an undue burden on the user to fit all of the jigsaw puzzle pieces together. The second leads to redundancy and possible inconsistency. We propose a third: deep data integration. The idea is to provide a cohesive view of all information currently available for a protein, interaction, or other object of scientific interest. Doing so requires that multiple pieces of data about the object, in different sources, first be identified as referring to the same object, if required through \"third party\" information; then that a single \"record\" be created comprising the union of the information in multiple matched records, keeping track of differences where these occur; and finally by tracking the provenance of every value in the dataset so scientists can judge what items to use, and how to resolve differences.In this talk, the I will describe our experiences with this approach in MiMI (http://mimi.ncibi.org). I will also discuss barriers to domain scientist use of the system, and my thoughts regarding how to make systems truly \"usable\".","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Some Challenges in Integrating Information on Protein Interactions and a Partial Solution\",\"authors\":\"H. Jagadish\",\"doi\":\"10.1109/SSDBM.2007.23\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. Independently constructed sources of (scientific) data frequently have overlapping, and sometimes contradictory, information content. Current methods of use fall into two categories: force the integration step onto the user, or merely collate the data, at most transforming it into a common format. The first method places an undue burden on the user to fit all of the jigsaw puzzle pieces together. The second leads to redundancy and possible inconsistency. We propose a third: deep data integration. The idea is to provide a cohesive view of all information currently available for a protein, interaction, or other object of scientific interest. Doing so requires that multiple pieces of data about the object, in different sources, first be identified as referring to the same object, if required through \\\"third party\\\" information; then that a single \\\"record\\\" be created comprising the union of the information in multiple matched records, keeping track of differences where these occur; and finally by tracking the provenance of every value in the dataset so scientists can judge what items to use, and how to resolve differences.In this talk, the I will describe our experiences with this approach in MiMI (http://mimi.ncibi.org). I will also discuss barriers to domain scientist use of the system, and my thoughts regarding how to make systems truly \\\"usable\\\".\",\"PeriodicalId\":122925,\"journal\":{\"name\":\"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSDBM.2007.23\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2007.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Some Challenges in Integrating Information on Protein Interactions and a Partial Solution
Summary form only given. Independently constructed sources of (scientific) data frequently have overlapping, and sometimes contradictory, information content. Current methods of use fall into two categories: force the integration step onto the user, or merely collate the data, at most transforming it into a common format. The first method places an undue burden on the user to fit all of the jigsaw puzzle pieces together. The second leads to redundancy and possible inconsistency. We propose a third: deep data integration. The idea is to provide a cohesive view of all information currently available for a protein, interaction, or other object of scientific interest. Doing so requires that multiple pieces of data about the object, in different sources, first be identified as referring to the same object, if required through "third party" information; then that a single "record" be created comprising the union of the information in multiple matched records, keeping track of differences where these occur; and finally by tracking the provenance of every value in the dataset so scientists can judge what items to use, and how to resolve differences.In this talk, the I will describe our experiences with this approach in MiMI (http://mimi.ncibi.org). I will also discuss barriers to domain scientist use of the system, and my thoughts regarding how to make systems truly "usable".