M. Tatineni, J. Greenberg, R. Wagner, Eva Hocks, Christopher Irving
{"title":"Hadoop在Gordon数据密集型超级计算机上的部署和性能","authors":"M. Tatineni, J. Greenberg, R. Wagner, Eva Hocks, Christopher Irving","doi":"10.1145/2484762.2484831","DOIUrl":null,"url":null,"abstract":"The Hadoop framework is extensively used for scalable distributed processing of large datasets. This extended abstract provides information on the optimization of the Hadoop deployment on the Gordon data intensive supercomputer, at the San Diego Supercomputer Center (SDSC) at the University of California San Diego, using the myHadoop software. The details of the system configuration, the storage and network options (1 Gig-E, IPOIB, and UDA), tuning options considered, results using the TestDFSIO, TeraSort benchmarks, and bulk copy tests with distcp are presented in this extended abstract.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"1970 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Hadoop deployment and performance on Gordon data intensive supercomputer\",\"authors\":\"M. Tatineni, J. Greenberg, R. Wagner, Eva Hocks, Christopher Irving\",\"doi\":\"10.1145/2484762.2484831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Hadoop framework is extensively used for scalable distributed processing of large datasets. This extended abstract provides information on the optimization of the Hadoop deployment on the Gordon data intensive supercomputer, at the San Diego Supercomputer Center (SDSC) at the University of California San Diego, using the myHadoop software. The details of the system configuration, the storage and network options (1 Gig-E, IPOIB, and UDA), tuning options considered, results using the TestDFSIO, TeraSort benchmarks, and bulk copy tests with distcp are presented in this extended abstract.\",\"PeriodicalId\":426819,\"journal\":{\"name\":\"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery\",\"volume\":\"1970 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2484762.2484831\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484762.2484831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hadoop deployment and performance on Gordon data intensive supercomputer
The Hadoop framework is extensively used for scalable distributed processing of large datasets. This extended abstract provides information on the optimization of the Hadoop deployment on the Gordon data intensive supercomputer, at the San Diego Supercomputer Center (SDSC) at the University of California San Diego, using the myHadoop software. The details of the system configuration, the storage and network options (1 Gig-E, IPOIB, and UDA), tuning options considered, results using the TestDFSIO, TeraSort benchmarks, and bulk copy tests with distcp are presented in this extended abstract.