NILE: wide-area computing for high energy physics

EW 7 Pub Date : 1996-09-09 DOI:10.1145/504450.504460
K. Marzullo, Michael Ogg, Aleta Ricciardi, A. Amoroso, F. A. Calkins, Eric Rothfus
{"title":"NILE: wide-area computing for high energy physics","authors":"K. Marzullo, Michael Ogg, Aleta Ricciardi, A. Amoroso, F. A. Calkins, Eric Rothfus","doi":"10.1145/504450.504460","DOIUrl":null,"url":null,"abstract":"The CLEO project [2], centered at Cornell University, is alarge-scale high energy physics project. The goals of the projectarise from an esoteric question---why is there apparently so littleantimatter in the universe?---and the computational problems thatarise in trying to answer this question are quite challenging.\nTo answer this question, the CESR storage ring at Cornell isused to generate a beam of electrons directed at an equally strongbeam of positrons. These two beams meet inside a detector that isembedded in a magnetic field and is equipped with sensors. Thecollisions of electrons and positrons generate several secondarysubatomic particles. Each collision is called an event andis sensed by detecting charged particles (via the ionization theyproduce in a drift chamber) and neutral particles (in the case ofphotons, via their deposition of energy in a crystal calorimeter),as well as by other specialized detector elements. Most events areignored, but some are recorded in what is called raw data(typically 8Kbytes per event). Offline, a second program calledpass2 computes, for each event, the physical properties ofthe particles, such as their momenta, masses, and charges. Thiscompute-bound program produces a new set of records describing theevents (now typically 20Kbytes per event). Finally, a third programreads these events, and produces a lossily-compressed version ofonly certain frequently-accessed fields, written in what is calledroar format (typically 2Kbytes per event).\nThe physicists analyze this data with programs that are, for themost part, embarrassingly parallel and I/O limited. Such programstypically compute a result based on a projection of a selection ofa large number of events, where the result is insensitive to theorder in which the events are processed. For example, a program mayconstruct histograms, or compute statistics, or cull the rawdata for physical inspection. The projection is either the completepass2 record or (much more often) the smaller roarrecord, and the selection is done in an ad-hoc manner by theprogram itself.\nOther programs are run as well. For example, a Monte Carlosimulation of the experiment is also run (called montecarlo) in order to correct the data for detector acceptance andinefficiencies, as well as testing aspects of the model used tointerpret the data. This program is compute bound. Anotherimportant example is called recompress. Roughly every twoyears, improvements in detector calibration and reconstructionalgorithms make it worthwhile to recompute more accuratepass2 data (and hence, more accurate roar data) fromall of the raw data. This program is compute-bound (itcurrently requires 24 200-MIP workstations running flat out forthree months) and so must be carefully worked into the schedule sothat it does not seriously impact the ongoing operations.\nMaking this more concrete, the current experiment generatesapproximately 1 terabyte of event data a year. Only recentroar data can be kept on disk; all other data must reside ontape. The data processing demands consume approximately 12,000SPECint92 cycles a year. Improvements in the performance of CESRand the sensitivity of the detector will cause both of these valuesto go up by a factor of ten in the next few years, which willcorrespondingly increase the storage and computational needs by afactor of ten.\nThe CLEO project prides itself on being able to do big scienceon a tight budget, and so the programming environment that the CLEOproject provides for researchers is innovative but somewhatprimitive. Jobs that access the entire data set can take days tocomplete. To circumvent limited access to tape, the network, orcompute resources close to the central disk, physicists often dopreliminary selections and projections (called skims) tocreate private disk data sets of events for further local analysis.Limited resources usually exact a high human price for resource andjob management and ironically, can sometimes lead toinefficiencies. Given the increase in data storage, data retrieval,and computational needs, it has become clear that the CLEOphysicists require a better distributed environment in which to dotheir work.\nHence, an NSF-funded National Challenge project was started withparticipants from both high energy physics, distributed computing,and data storage, in order to provide a better environment for theCLEO experiment. The goals of this project, called NILE [7], are:\nto build a scalable environment for storing and processing HighEnergy Physics data from the CLEO experiment. The environment mustscale to allow 100 terabytes or more of data to be addressable, andto be able to use several hundreds of geographically dispersedprocessors.to radically decrease the processing time of computationsthrough parallelism.to be practicable. NILE, albeit in a limited form,should be deployed very soon, and evolve to its full form by theend of the project in June 1999.Finally, the CLEO necessity of building on a budget carries overto NILE. There aresome more expensive resources, such as ATM switches and tape silos,that it will be necessary to use. However, as far as possible weare using commodity equipment, and free or inexpensive softwarewhenever possible. For example, one of our principal developmentplatforms is Pentium-based PCs, interconnected with 100 MbpsEthernet, running Linux and the GNU suite of tools.","PeriodicalId":137590,"journal":{"name":"EW 7","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EW 7","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/504450.504460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

Abstract

The CLEO project [2], centered at Cornell University, is alarge-scale high energy physics project. The goals of the projectarise from an esoteric question---why is there apparently so littleantimatter in the universe?---and the computational problems thatarise in trying to answer this question are quite challenging. To answer this question, the CESR storage ring at Cornell isused to generate a beam of electrons directed at an equally strongbeam of positrons. These two beams meet inside a detector that isembedded in a magnetic field and is equipped with sensors. Thecollisions of electrons and positrons generate several secondarysubatomic particles. Each collision is called an event andis sensed by detecting charged particles (via the ionization theyproduce in a drift chamber) and neutral particles (in the case ofphotons, via their deposition of energy in a crystal calorimeter),as well as by other specialized detector elements. Most events areignored, but some are recorded in what is called raw data(typically 8Kbytes per event). Offline, a second program calledpass2 computes, for each event, the physical properties ofthe particles, such as their momenta, masses, and charges. Thiscompute-bound program produces a new set of records describing theevents (now typically 20Kbytes per event). Finally, a third programreads these events, and produces a lossily-compressed version ofonly certain frequently-accessed fields, written in what is calledroar format (typically 2Kbytes per event). The physicists analyze this data with programs that are, for themost part, embarrassingly parallel and I/O limited. Such programstypically compute a result based on a projection of a selection ofa large number of events, where the result is insensitive to theorder in which the events are processed. For example, a program mayconstruct histograms, or compute statistics, or cull the rawdata for physical inspection. The projection is either the completepass2 record or (much more often) the smaller roarrecord, and the selection is done in an ad-hoc manner by theprogram itself. Other programs are run as well. For example, a Monte Carlosimulation of the experiment is also run (called montecarlo) in order to correct the data for detector acceptance andinefficiencies, as well as testing aspects of the model used tointerpret the data. This program is compute bound. Anotherimportant example is called recompress. Roughly every twoyears, improvements in detector calibration and reconstructionalgorithms make it worthwhile to recompute more accuratepass2 data (and hence, more accurate roar data) fromall of the raw data. This program is compute-bound (itcurrently requires 24 200-MIP workstations running flat out forthree months) and so must be carefully worked into the schedule sothat it does not seriously impact the ongoing operations. Making this more concrete, the current experiment generatesapproximately 1 terabyte of event data a year. Only recentroar data can be kept on disk; all other data must reside ontape. The data processing demands consume approximately 12,000SPECint92 cycles a year. Improvements in the performance of CESRand the sensitivity of the detector will cause both of these valuesto go up by a factor of ten in the next few years, which willcorrespondingly increase the storage and computational needs by afactor of ten. The CLEO project prides itself on being able to do big scienceon a tight budget, and so the programming environment that the CLEOproject provides for researchers is innovative but somewhatprimitive. Jobs that access the entire data set can take days tocomplete. To circumvent limited access to tape, the network, orcompute resources close to the central disk, physicists often dopreliminary selections and projections (called skims) tocreate private disk data sets of events for further local analysis.Limited resources usually exact a high human price for resource andjob management and ironically, can sometimes lead toinefficiencies. Given the increase in data storage, data retrieval,and computational needs, it has become clear that the CLEOphysicists require a better distributed environment in which to dotheir work. Hence, an NSF-funded National Challenge project was started withparticipants from both high energy physics, distributed computing,and data storage, in order to provide a better environment for theCLEO experiment. The goals of this project, called NILE [7], are: to build a scalable environment for storing and processing HighEnergy Physics data from the CLEO experiment. The environment mustscale to allow 100 terabytes or more of data to be addressable, andto be able to use several hundreds of geographically dispersedprocessors.to radically decrease the processing time of computationsthrough parallelism.to be practicable. NILE, albeit in a limited form,should be deployed very soon, and evolve to its full form by theend of the project in June 1999.Finally, the CLEO necessity of building on a budget carries overto NILE. There aresome more expensive resources, such as ATM switches and tape silos,that it will be necessary to use. However, as far as possible weare using commodity equipment, and free or inexpensive softwarewhenever possible. For example, one of our principal developmentplatforms is Pentium-based PCs, interconnected with 100 MbpsEthernet, running Linux and the GNU suite of tools.
尼罗河:高能物理的广域计算
CLEO项目[2]是以康奈尔大学为中心的大型高能物理项目。这个项目的目标来自一个深奥的问题——为什么宇宙中明显只有这么少的物质?——而在试图回答这个问题时出现的计算问题是相当具有挑战性的。为了回答这个问题,康奈尔大学的CESR存储环被用来产生一束电子,指向同样强大的正电子束。这两束光在一个嵌入磁场并装有传感器的探测器内相遇。电子和正电子的碰撞产生几个次级亚原子粒子。每次碰撞都被称为一个事件,并通过检测带电粒子(通过它们在漂移室中产生的电离)和中性粒子(对于光子,通过它们在晶体量热计中的能量沉积)以及其他专门的探测器元件来感知。大多数事件被忽略,但有些事件被记录在所谓的原始数据中(通常每个事件8kb)。离线时,另一个名为pass2的程序计算每个事件的粒子的物理性质,如动量、质量和电荷。这个与计算机绑定的程序产生一组描述事件的新记录(现在每个事件通常为20kb)。最后,第三个程序读取这些事件,并仅生成某些频繁访问字段的无损压缩版本,以所谓的droar格式(每个事件通常为2kb)编写。物理学家用程序来分析这些数据,这些程序在很大程度上是并行的,而且I/O有限。这样的程序通常基于大量事件的选择的投影来计算结果,其中结果对事件处理的顺序不敏感。例如,程序可以构造直方图,或者计算统计数据,或者剔除原始数据进行物理检查。投影要么是完整的pass2记录,要么是(更常见的)较小的roarrecord,选择是由程序本身以一种特殊的方式完成的。其他程序也会运行。例如,实验的蒙特卡罗模拟也会运行(称为蒙特卡罗),以纠正探测器接受和低效率的数据,以及测试用于解释数据的模型的各个方面。这个程序是计算界的。另一个重要的例子叫做重压缩。大约每两年,探测器校准和重建算法的改进使得从原始数据中重新计算更准确的epass2数据(因此,更准确的咆哮数据)是值得的。这个程序是计算机绑定的(它目前需要24个200-MIP的工作站全速运行三个月),因此必须仔细地纳入计划,以免严重影响正在进行的操作。更具体地说,目前的实验每年产生大约1tb的事件数据。只有最近的数据可以保存在磁盘上;所有其他数据必须驻留在磁带上。数据处理需求每年消耗大约12,000SPECint92个周期。cesr性能的改进和探测器灵敏度的提高将导致这两个值在未来几年内增加十倍,这将相应地增加十倍的存储和计算需求。CLEO项目以能够在预算紧张的情况下进行大型科学研究而自豪,因此,CLEO项目为研究人员提供的编程环境是创新的,但有些原始。访问整个数据集的作业可能需要几天才能完成。为了规避对磁带、网络或靠近中央磁盘的计算资源的有限访问,物理学家通常采用初步选择和投影(称为略图)来创建事件的私有磁盘数据集,以供进一步的本地分析。有限的资源通常要求人力为资源和工作管理付出高昂的代价,具有讽刺意味的是,有时会导致效率低下。考虑到数据存储、数据检索和计算需求的增加,cleophys物理学家们显然需要一个更好的分布式环境来完成他们的工作。因此,一个由nsf资助的国家挑战项目开始了,参与者来自高能物理、分布式计算和数据存储,以便为cleo实验提供更好的环境。该项目被称为NILE[7],其目标是:建立一个可扩展的环境,用于存储和处理CLEO实验的高能物理数据。环境必须扩展到允许100tb或更多的数据可寻址,并且能够使用数百个地理上分散的处理器。通过并行性从根本上减少计算的处理时间。切实可行。NILE,尽管是有限的形式,应该很快部署,并在1999年6月项目结束时发展到它的全部形式。最后,基于预算进行建设的CLEO必要性也适用于NILE。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信