{"title":"SQL到NoSQL转换的关联感知技术","authors":"Jen-Chun Hsu, Ching-Hsien Hsu, Shih-Chang Chen, Yeh-Ching Chung","doi":"10.1109/U-MEDIA.2014.27","DOIUrl":null,"url":null,"abstract":"For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis. With the same concept Apache Sqoop separates each table into four parts and randomly distributes them on data nodes. However, there is still a database performance concern with this data placement mechanism. This paper proposes a Correlation Aware method on Sqoop (CA_Sqoop) to improve the data placement. By gathering related data as closer as it could be to reduce the data transformation cost on the network and improve the performance in terms of database usage. The CA_Sqoop also considers the table correlation and size for better data locality and query efficiency. Simulation results show that data locality of CA_Sqoop is two times better than that of original Apache Sqoop.","PeriodicalId":174849,"journal":{"name":"2014 7th International Conference on Ubi-Media Computing and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Correlation Aware Technique for SQL to NoSQL Transformation\",\"authors\":\"Jen-Chun Hsu, Ching-Hsien Hsu, Shih-Chang Chen, Yeh-Ching Chung\",\"doi\":\"10.1109/U-MEDIA.2014.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis. With the same concept Apache Sqoop separates each table into four parts and randomly distributes them on data nodes. However, there is still a database performance concern with this data placement mechanism. This paper proposes a Correlation Aware method on Sqoop (CA_Sqoop) to improve the data placement. By gathering related data as closer as it could be to reduce the data transformation cost on the network and improve the performance in terms of database usage. The CA_Sqoop also considers the table correlation and size for better data locality and query efficiency. Simulation results show that data locality of CA_Sqoop is two times better than that of original Apache Sqoop.\",\"PeriodicalId\":174849,\"journal\":{\"name\":\"2014 7th International Conference on Ubi-Media Computing and Workshops\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 7th International Conference on Ubi-Media Computing and Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/U-MEDIA.2014.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 7th International Conference on Ubi-Media Computing and Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/U-MEDIA.2014.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Correlation Aware Technique for SQL to NoSQL Transformation
For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis. With the same concept Apache Sqoop separates each table into four parts and randomly distributes them on data nodes. However, there is still a database performance concern with this data placement mechanism. This paper proposes a Correlation Aware method on Sqoop (CA_Sqoop) to improve the data placement. By gathering related data as closer as it could be to reduce the data transformation cost on the network and improve the performance in terms of database usage. The CA_Sqoop also considers the table correlation and size for better data locality and query efficiency. Simulation results show that data locality of CA_Sqoop is two times better than that of original Apache Sqoop.