检测科学文档中的数据和模式更改

N. Adam, Igg Adiwijaya, T. Critchlow, R. Musick
{"title":"检测科学文档中的数据和模式更改","authors":"N. Adam, Igg Adiwijaya, T. Critchlow, R. Musick","doi":"10.1109/ADL.2000.848379","DOIUrl":null,"url":null,"abstract":"Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transferred and entered into the warehouse. Another alternative, periodically reloading from scratch, is obviously inefficient. When the schema of an information source changes, all components that interact with, or make use of data originating from that source must be updated to conform. The change detection problem is the problem of detecting data and schema changes by comparing two versions of the same semi-structured document. We present an approach to detecting data and schema changes for scientific documents. Scientific data is of particular interest because it is normally stored as a semi-structured document, and suffers frequent schema updates. This paper demonstrates the use of graphs to represent scientific documents in particular and semi-structured documents in general as well as their schema. It also demonstrates an approach to efficiently detect data and schema changes by merging the detection with parsing the document.","PeriodicalId":426762,"journal":{"name":"Proceedings IEEE Advances in Digital Libraries 2000","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Detecting data and schema changes in scientific documents\",\"authors\":\"N. Adam, Igg Adiwijaya, T. Critchlow, R. Musick\",\"doi\":\"10.1109/ADL.2000.848379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transferred and entered into the warehouse. Another alternative, periodically reloading from scratch, is obviously inefficient. When the schema of an information source changes, all components that interact with, or make use of data originating from that source must be updated to conform. The change detection problem is the problem of detecting data and schema changes by comparing two versions of the same semi-structured document. We present an approach to detecting data and schema changes for scientific documents. Scientific data is of particular interest because it is normally stored as a semi-structured document, and suffers frequent schema updates. This paper demonstrates the use of graphs to represent scientific documents in particular and semi-structured documents in general as well as their schema. It also demonstrates an approach to efficiently detect data and schema changes by merging the detection with parsing the document.\",\"PeriodicalId\":426762,\"journal\":{\"name\":\"Proceedings IEEE Advances in Digital Libraries 2000\",\"volume\":\"140 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE Advances in Digital Libraries 2000\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ADL.2000.848379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE Advances in Digital Libraries 2000","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ADL.2000.848379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

存储在数据仓库中的数据必须与底层信息源保持一致和最新。通过提供识别、分类和检测这些源中的更改的功能,只需要传输修改过的数据并将其输入仓库。另一种选择,定期从头重新加载,显然效率很低。当一个信息源的模式发生变化时,所有与该信息源交互或使用源自该信息源的数据的组件都必须进行更新以保持一致。变更检测问题是通过比较同一半结构化文档的两个版本来检测数据和模式更改的问题。我们提出了一种检测科学文档数据和模式变化的方法。科学数据特别令人感兴趣,因为它通常以半结构化文档的形式存储,并且经常受到模式更新的影响。本文演示了用图形来表示科学文献,特别是半结构化文献,以及它们的模式。它还演示了一种通过将检测与解析文档合并来有效检测数据和模式更改的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detecting data and schema changes in scientific documents
Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transferred and entered into the warehouse. Another alternative, periodically reloading from scratch, is obviously inefficient. When the schema of an information source changes, all components that interact with, or make use of data originating from that source must be updated to conform. The change detection problem is the problem of detecting data and schema changes by comparing two versions of the same semi-structured document. We present an approach to detecting data and schema changes for scientific documents. Scientific data is of particular interest because it is normally stored as a semi-structured document, and suffers frequent schema updates. This paper demonstrates the use of graphs to represent scientific documents in particular and semi-structured documents in general as well as their schema. It also demonstrates an approach to efficiently detect data and schema changes by merging the detection with parsing the document.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信