R. Addanki, S. Maji, M. Veeraraghavan, Chris Tracy
{"title":"基于测量的大数据移动研究","authors":"R. Addanki, S. Maji, M. Veeraraghavan, Chris Tracy","doi":"10.1109/EuCNC.2015.7194115","DOIUrl":null,"url":null,"abstract":"Parallel TCP connections are used for large scientific dataset transfers to increase throughput. Therefore, to accurately characterize big-data movement, it is important to reconstruct parallel flowsets from traffic measurements. In this work, we start with NetFlow records collected in an operational research-and-education network across which large scientific datasets are moved routinely, reconstruct individual elephant flows from the NetFlow records, and assemble parallel flowsets from elephant flows. Our findings are as follows. The top 1% of flowset sizes were in the hundreds of GBs to low TBs range, 95% of flowsets had rates less than 2.5 Gbps, and 99% of flowsets had durations shorter than 4 hours. Median flowset rate increases and rate variance decreases with increasing number of per-flowset component flows. Such findings are useful for network planning, traffic engineering, and for improving user performance, since large dataset transfers are among the most demanding of network applications.","PeriodicalId":310313,"journal":{"name":"2015 European Conference on Networks and Communications (EuCNC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A measurement-based study of big-data movement\",\"authors\":\"R. Addanki, S. Maji, M. Veeraraghavan, Chris Tracy\",\"doi\":\"10.1109/EuCNC.2015.7194115\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallel TCP connections are used for large scientific dataset transfers to increase throughput. Therefore, to accurately characterize big-data movement, it is important to reconstruct parallel flowsets from traffic measurements. In this work, we start with NetFlow records collected in an operational research-and-education network across which large scientific datasets are moved routinely, reconstruct individual elephant flows from the NetFlow records, and assemble parallel flowsets from elephant flows. Our findings are as follows. The top 1% of flowset sizes were in the hundreds of GBs to low TBs range, 95% of flowsets had rates less than 2.5 Gbps, and 99% of flowsets had durations shorter than 4 hours. Median flowset rate increases and rate variance decreases with increasing number of per-flowset component flows. Such findings are useful for network planning, traffic engineering, and for improving user performance, since large dataset transfers are among the most demanding of network applications.\",\"PeriodicalId\":310313,\"journal\":{\"name\":\"2015 European Conference on Networks and Communications (EuCNC)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 European Conference on Networks and Communications (EuCNC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EuCNC.2015.7194115\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 European Conference on Networks and Communications (EuCNC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EuCNC.2015.7194115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel TCP connections are used for large scientific dataset transfers to increase throughput. Therefore, to accurately characterize big-data movement, it is important to reconstruct parallel flowsets from traffic measurements. In this work, we start with NetFlow records collected in an operational research-and-education network across which large scientific datasets are moved routinely, reconstruct individual elephant flows from the NetFlow records, and assemble parallel flowsets from elephant flows. Our findings are as follows. The top 1% of flowset sizes were in the hundreds of GBs to low TBs range, 95% of flowsets had rates less than 2.5 Gbps, and 99% of flowsets had durations shorter than 4 hours. Median flowset rate increases and rate variance decreases with increasing number of per-flowset component flows. Such findings are useful for network planning, traffic engineering, and for improving user performance, since large dataset transfers are among the most demanding of network applications.