{"title":"顺序访问模型在费米实验室D0实验中的分布式数据访问","authors":"I. Terekhov, V. White","doi":"10.1109/HPDC.2000.868672","DOIUrl":null,"url":null,"abstract":"Presents the Sequential Access Model (SAM), which is the data-handling system for D0, one of two primary high-energy experiments at Fermilab. During the next several years, the D0 experiment will store a total of about 1 PByte of data, including raw detector data and data processed at various levels. The design of SAM is not specific to the D0 experiment and carries few assumptions about the underlying mass storage level; its ideas are applicable to any sequential data access. By definition, in the sequential access mode, a user application needs to process a stream of data by accessing each data unit exactly once, the order of the data units in the stream being irrelevant. The units of data are laid out sequentially in files. The adopted model allows for a significant optimization of system performance, a reduction in user file latency and an increase in the overall throughput. In particular, caching is done with the knowledge of all the files that are needed \"in the near future\", which is defined as all the files being used by already-running or submitted jobs. The bulk of the data is stored in files on tape in the mass storage system Enstore. All of the data managed by SAM is cataloged in great detail in a relational database (Oracle).","PeriodicalId":400728,"journal":{"name":"Proceedings the Ninth International Symposium on High-Performance Distributed Computing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Distributed data access in the Sequential Access Model at the D0 experiment at Fermilab\",\"authors\":\"I. Terekhov, V. White\",\"doi\":\"10.1109/HPDC.2000.868672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Presents the Sequential Access Model (SAM), which is the data-handling system for D0, one of two primary high-energy experiments at Fermilab. During the next several years, the D0 experiment will store a total of about 1 PByte of data, including raw detector data and data processed at various levels. The design of SAM is not specific to the D0 experiment and carries few assumptions about the underlying mass storage level; its ideas are applicable to any sequential data access. By definition, in the sequential access mode, a user application needs to process a stream of data by accessing each data unit exactly once, the order of the data units in the stream being irrelevant. The units of data are laid out sequentially in files. The adopted model allows for a significant optimization of system performance, a reduction in user file latency and an increase in the overall throughput. In particular, caching is done with the knowledge of all the files that are needed \\\"in the near future\\\", which is defined as all the files being used by already-running or submitted jobs. The bulk of the data is stored in files on tape in the mass storage system Enstore. All of the data managed by SAM is cataloged in great detail in a relational database (Oracle).\",\"PeriodicalId\":400728,\"journal\":{\"name\":\"Proceedings the Ninth International Symposium on High-Performance Distributed Computing\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings the Ninth International Symposium on High-Performance Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPDC.2000.868672\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings the Ninth International Symposium on High-Performance Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2000.868672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed data access in the Sequential Access Model at the D0 experiment at Fermilab
Presents the Sequential Access Model (SAM), which is the data-handling system for D0, one of two primary high-energy experiments at Fermilab. During the next several years, the D0 experiment will store a total of about 1 PByte of data, including raw detector data and data processed at various levels. The design of SAM is not specific to the D0 experiment and carries few assumptions about the underlying mass storage level; its ideas are applicable to any sequential data access. By definition, in the sequential access mode, a user application needs to process a stream of data by accessing each data unit exactly once, the order of the data units in the stream being irrelevant. The units of data are laid out sequentially in files. The adopted model allows for a significant optimization of system performance, a reduction in user file latency and an increase in the overall throughput. In particular, caching is done with the knowledge of all the files that are needed "in the near future", which is defined as all the files being used by already-running or submitted jobs. The bulk of the data is stored in files on tape in the mass storage system Enstore. All of the data managed by SAM is cataloged in great detail in a relational database (Oracle).