G. Cancelo, E. Gottschalk, V. Pavlicek, M. Wang, J. Wu
{"title":"高并行处理器中l触发的失效分析","authors":"G. Cancelo, E. Gottschalk, V. Pavlicek, M. Wang, J. Wu","doi":"10.1109/NSSMIC.2003.1351928","DOIUrl":null,"url":null,"abstract":"The current paper studies how processor failures affect the dataflow of the Level I Trigger in the BTeV experiment proposed to run at Fermilab's Tevatron. The failure analysis is crucial for a system with over 2500 processing nodes and a number of storage units and communication links of the same order of magnitude. The failure analysis is based on models of the L1 Trigger architecture and shows the dynamics of the architecture's dataflow. The failure analysis provides insight into how system variables are affected by single component failures and provides key information to the implementation Of error recovery strategies. The analysis includes both short term failures from which the system can recover quickly and long term failures which imply a more drastic error recovery strategy. The modeling results are supported by behavioral simulations of the L1 Trigger processing BTeV's Geant Monte Carlo data.","PeriodicalId":186175,"journal":{"name":"2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Failure Analysis in a highly parallel processor for Ll Triggering\",\"authors\":\"G. Cancelo, E. Gottschalk, V. Pavlicek, M. Wang, J. Wu\",\"doi\":\"10.1109/NSSMIC.2003.1351928\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The current paper studies how processor failures affect the dataflow of the Level I Trigger in the BTeV experiment proposed to run at Fermilab's Tevatron. The failure analysis is crucial for a system with over 2500 processing nodes and a number of storage units and communication links of the same order of magnitude. The failure analysis is based on models of the L1 Trigger architecture and shows the dynamics of the architecture's dataflow. The failure analysis provides insight into how system variables are affected by single component failures and provides key information to the implementation Of error recovery strategies. The analysis includes both short term failures from which the system can recover quickly and long term failures which imply a more drastic error recovery strategy. The modeling results are supported by behavioral simulations of the L1 Trigger processing BTeV's Geant Monte Carlo data.\",\"PeriodicalId\":186175,\"journal\":{\"name\":\"2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NSSMIC.2003.1351928\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NSSMIC.2003.1351928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Failure Analysis in a highly parallel processor for Ll Triggering
The current paper studies how processor failures affect the dataflow of the Level I Trigger in the BTeV experiment proposed to run at Fermilab's Tevatron. The failure analysis is crucial for a system with over 2500 processing nodes and a number of storage units and communication links of the same order of magnitude. The failure analysis is based on models of the L1 Trigger architecture and shows the dynamics of the architecture's dataflow. The failure analysis provides insight into how system variables are affected by single component failures and provides key information to the implementation Of error recovery strategies. The analysis includes both short term failures from which the system can recover quickly and long term failures which imply a more drastic error recovery strategy. The modeling results are supported by behavioral simulations of the L1 Trigger processing BTeV's Geant Monte Carlo data.