{"title":"检测科学文档中的数据和模式更改","authors":"N. Adam, Igg Adiwijaya, T. Critchlow, R. Musick","doi":"10.1109/ADL.2000.848379","DOIUrl":null,"url":null,"abstract":"Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transferred and entered into the warehouse. Another alternative, periodically reloading from scratch, is obviously inefficient. When the schema of an information source changes, all components that interact with, or make use of data originating from that source must be updated to conform. The change detection problem is the problem of detecting data and schema changes by comparing two versions of the same semi-structured document. We present an approach to detecting data and schema changes for scientific documents. Scientific data is of particular interest because it is normally stored as a semi-structured document, and suffers frequent schema updates. This paper demonstrates the use of graphs to represent scientific documents in particular and semi-structured documents in general as well as their schema. It also demonstrates an approach to efficiently detect data and schema changes by merging the detection with parsing the document.","PeriodicalId":426762,"journal":{"name":"Proceedings IEEE Advances in Digital Libraries 2000","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Detecting data and schema changes in scientific documents\",\"authors\":\"N. Adam, Igg Adiwijaya, T. Critchlow, R. Musick\",\"doi\":\"10.1109/ADL.2000.848379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transferred and entered into the warehouse. Another alternative, periodically reloading from scratch, is obviously inefficient. When the schema of an information source changes, all components that interact with, or make use of data originating from that source must be updated to conform. The change detection problem is the problem of detecting data and schema changes by comparing two versions of the same semi-structured document. We present an approach to detecting data and schema changes for scientific documents. Scientific data is of particular interest because it is normally stored as a semi-structured document, and suffers frequent schema updates. This paper demonstrates the use of graphs to represent scientific documents in particular and semi-structured documents in general as well as their schema. It also demonstrates an approach to efficiently detect data and schema changes by merging the detection with parsing the document.\",\"PeriodicalId\":426762,\"journal\":{\"name\":\"Proceedings IEEE Advances in Digital Libraries 2000\",\"volume\":\"140 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE Advances in Digital Libraries 2000\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ADL.2000.848379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE Advances in Digital Libraries 2000","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ADL.2000.848379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting data and schema changes in scientific documents
Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transferred and entered into the warehouse. Another alternative, periodically reloading from scratch, is obviously inefficient. When the schema of an information source changes, all components that interact with, or make use of data originating from that source must be updated to conform. The change detection problem is the problem of detecting data and schema changes by comparing two versions of the same semi-structured document. We present an approach to detecting data and schema changes for scientific documents. Scientific data is of particular interest because it is normally stored as a semi-structured document, and suffers frequent schema updates. This paper demonstrates the use of graphs to represent scientific documents in particular and semi-structured documents in general as well as their schema. It also demonstrates an approach to efficiently detect data and schema changes by merging the detection with parsing the document.