{"title":"使用MapReduce从无模式文档存储中捕获基于日志的更改数据","authors":"Kun Ma, Bo Yang","doi":"10.1109/CLOUDTECH.2015.7336969","DOIUrl":null,"url":null,"abstract":"Change data capture (CDC) is an approach to data integration that is used to determine and track the data that has changed so that action can be taken using the change data. However, the state of art of change data capture (CDC) in the context of document-oriented NoSQL databases is not mature. Therefore, it is urgent to require a NoSQL CDC solution. Although some manufacturers of NoSQL databases start to research on CDC for NoSQL, these approaches are just for the specific product. In our paper, we propose a log-based CDC approach from abstract schema-free document stores using MapReduce. The process is divided into map and reduce procedures, benefited from MapReduce framework, to generate cell state models (CSMs). In order to infinitely look back to any revision, we enable our proposed CSM to support copy-modify-merge model to manage the revisions of change data. Finally, experimental results show that this approach is independent and appropriate for document stores, with high performance and throughput capacity.","PeriodicalId":293168,"journal":{"name":"2015 International Conference on Cloud Technologies and Applications (CloudTech)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Log-based change data capture from schema-free document stores using MapReduce\",\"authors\":\"Kun Ma, Bo Yang\",\"doi\":\"10.1109/CLOUDTECH.2015.7336969\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Change data capture (CDC) is an approach to data integration that is used to determine and track the data that has changed so that action can be taken using the change data. However, the state of art of change data capture (CDC) in the context of document-oriented NoSQL databases is not mature. Therefore, it is urgent to require a NoSQL CDC solution. Although some manufacturers of NoSQL databases start to research on CDC for NoSQL, these approaches are just for the specific product. In our paper, we propose a log-based CDC approach from abstract schema-free document stores using MapReduce. The process is divided into map and reduce procedures, benefited from MapReduce framework, to generate cell state models (CSMs). In order to infinitely look back to any revision, we enable our proposed CSM to support copy-modify-merge model to manage the revisions of change data. Finally, experimental results show that this approach is independent and appropriate for document stores, with high performance and throughput capacity.\",\"PeriodicalId\":293168,\"journal\":{\"name\":\"2015 International Conference on Cloud Technologies and Applications (CloudTech)\",\"volume\":\"218 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Cloud Technologies and Applications (CloudTech)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLOUDTECH.2015.7336969\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Cloud Technologies and Applications (CloudTech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUDTECH.2015.7336969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Log-based change data capture from schema-free document stores using MapReduce
Change data capture (CDC) is an approach to data integration that is used to determine and track the data that has changed so that action can be taken using the change data. However, the state of art of change data capture (CDC) in the context of document-oriented NoSQL databases is not mature. Therefore, it is urgent to require a NoSQL CDC solution. Although some manufacturers of NoSQL databases start to research on CDC for NoSQL, these approaches are just for the specific product. In our paper, we propose a log-based CDC approach from abstract schema-free document stores using MapReduce. The process is divided into map and reduce procedures, benefited from MapReduce framework, to generate cell state models (CSMs). In order to infinitely look back to any revision, we enable our proposed CSM to support copy-modify-merge model to manage the revisions of change data. Finally, experimental results show that this approach is independent and appropriate for document stores, with high performance and throughput capacity.