{"title":"面向大数据的数据迁移生态系统特邀论文","authors":"Koong Wah Yan, N. Perumal, T. Dillon","doi":"10.1109/DEST.2013.6611352","DOIUrl":null,"url":null,"abstract":"Data Migration is the process of moving data from a system or systems to a new environment. Often, it is a sub-activity of a business application deployment. Big data is defined as data that is huge, has heterogeneous data dictionaries and involves complex manipulation. Due to nature of the process complexity and its resources hungry approach in migrating Big Data, special attention is required to have a proven methodology and ecosystem to govern the process. The Data Migration Ecosystem for Big Data is the productive set of interacting processes, practices and environments, to collect data from one location, storage medium, or hardware/software system, to cleanse, transform and transfer it to another. The processes and practices are governed by rules and disciplines, with the goal of ensuring information is complete, of high accuracy and consistent. This paper is based on our experience in migrating data for a Malaysia government agency, which involves approximately 1 billion rows of data from 31 heterogeneous sources / systems. Some of the data migrated was created in the seventies (1970), for which the business logic has since been enhanced or changed. The challenge is further complicated by available data being from proprietary databases that are non-RDMS compliance and includes data that is manually maintained in Microsoft Excel spreadsheets.","PeriodicalId":145109,"journal":{"name":"2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Data migration ecosystem for big data invited paper\",\"authors\":\"Koong Wah Yan, N. Perumal, T. Dillon\",\"doi\":\"10.1109/DEST.2013.6611352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data Migration is the process of moving data from a system or systems to a new environment. Often, it is a sub-activity of a business application deployment. Big data is defined as data that is huge, has heterogeneous data dictionaries and involves complex manipulation. Due to nature of the process complexity and its resources hungry approach in migrating Big Data, special attention is required to have a proven methodology and ecosystem to govern the process. The Data Migration Ecosystem for Big Data is the productive set of interacting processes, practices and environments, to collect data from one location, storage medium, or hardware/software system, to cleanse, transform and transfer it to another. The processes and practices are governed by rules and disciplines, with the goal of ensuring information is complete, of high accuracy and consistent. This paper is based on our experience in migrating data for a Malaysia government agency, which involves approximately 1 billion rows of data from 31 heterogeneous sources / systems. Some of the data migrated was created in the seventies (1970), for which the business logic has since been enhanced or changed. The challenge is further complicated by available data being from proprietary databases that are non-RDMS compliance and includes data that is manually maintained in Microsoft Excel spreadsheets.\",\"PeriodicalId\":145109,\"journal\":{\"name\":\"2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEST.2013.6611352\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEST.2013.6611352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data migration ecosystem for big data invited paper
Data Migration is the process of moving data from a system or systems to a new environment. Often, it is a sub-activity of a business application deployment. Big data is defined as data that is huge, has heterogeneous data dictionaries and involves complex manipulation. Due to nature of the process complexity and its resources hungry approach in migrating Big Data, special attention is required to have a proven methodology and ecosystem to govern the process. The Data Migration Ecosystem for Big Data is the productive set of interacting processes, practices and environments, to collect data from one location, storage medium, or hardware/software system, to cleanse, transform and transfer it to another. The processes and practices are governed by rules and disciplines, with the goal of ensuring information is complete, of high accuracy and consistent. This paper is based on our experience in migrating data for a Malaysia government agency, which involves approximately 1 billion rows of data from 31 heterogeneous sources / systems. Some of the data migrated was created in the seventies (1970), for which the business logic has since been enhanced or changed. The challenge is further complicated by available data being from proprietary databases that are non-RDMS compliance and includes data that is manually maintained in Microsoft Excel spreadsheets.