M. Gardner, Wu-chun Feng, J. Archuleta, Heshan Lin, Xiasong Ma
{"title":"平行基因组序列搜索在一个特设网格:经验,教训,和影响","authors":"M. Gardner, Wu-chun Feng, J. Archuleta, Heshan Lin, Xiasong Ma","doi":"10.1145/1188455.1188564","DOIUrl":null,"url":null,"abstract":"The Basic local alignment search tool (BLAST) allows bioinformaticists to characterize an unknown sequence by comparing it against a database of known sequences. The similarity between sequences enables biologists to detect evolutionary relationships and infer biological properties of the unknown sequence. mpiBLAST, our parallel BLAST, decreases the search time of a 300 KB query on the current NT database from over two full days to under 10 minutes on a 128-processor cluster and allows larger query files to be compared. Consequently, we propose to compare the largest query available, the entire NT database, against the largest database available, the entire NT database. The result of this comparison can provide critical information to the biology community, including insightful evolutionary, structural, and functional relationships between every sequence and family in the NT database. Preliminary projections indicated that to complete the task in a reasonable length of time required more processors than were available to us at a single site. Hence, we assembled GreenGene, an ad-hoc grid that was constructed \"on the fly\" from donated computational, network, and storage resources during last year's SC|05. GreenGene consisted of 3048 processors from machines that were distributed across the United States. This paper presents a case study of mpiBLAST on GreenGene - specifically, a pre-run characterization of the computation, the hardware and software architectural design, experimental results, and future directions","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":"{\"title\":\"Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications\",\"authors\":\"M. Gardner, Wu-chun Feng, J. Archuleta, Heshan Lin, Xiasong Ma\",\"doi\":\"10.1145/1188455.1188564\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Basic local alignment search tool (BLAST) allows bioinformaticists to characterize an unknown sequence by comparing it against a database of known sequences. The similarity between sequences enables biologists to detect evolutionary relationships and infer biological properties of the unknown sequence. mpiBLAST, our parallel BLAST, decreases the search time of a 300 KB query on the current NT database from over two full days to under 10 minutes on a 128-processor cluster and allows larger query files to be compared. Consequently, we propose to compare the largest query available, the entire NT database, against the largest database available, the entire NT database. The result of this comparison can provide critical information to the biology community, including insightful evolutionary, structural, and functional relationships between every sequence and family in the NT database. Preliminary projections indicated that to complete the task in a reasonable length of time required more processors than were available to us at a single site. Hence, we assembled GreenGene, an ad-hoc grid that was constructed \\\"on the fly\\\" from donated computational, network, and storage resources during last year's SC|05. GreenGene consisted of 3048 processors from machines that were distributed across the United States. This paper presents a case study of mpiBLAST on GreenGene - specifically, a pre-run characterization of the computation, the hardware and software architectural design, experimental results, and future directions\",\"PeriodicalId\":333909,\"journal\":{\"name\":\"ACM/IEEE SC 2006 Conference (SC'06)\",\"volume\":\"215 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"64\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM/IEEE SC 2006 Conference (SC'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1188455.1188564\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 2006 Conference (SC'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1188455.1188564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications
The Basic local alignment search tool (BLAST) allows bioinformaticists to characterize an unknown sequence by comparing it against a database of known sequences. The similarity between sequences enables biologists to detect evolutionary relationships and infer biological properties of the unknown sequence. mpiBLAST, our parallel BLAST, decreases the search time of a 300 KB query on the current NT database from over two full days to under 10 minutes on a 128-processor cluster and allows larger query files to be compared. Consequently, we propose to compare the largest query available, the entire NT database, against the largest database available, the entire NT database. The result of this comparison can provide critical information to the biology community, including insightful evolutionary, structural, and functional relationships between every sequence and family in the NT database. Preliminary projections indicated that to complete the task in a reasonable length of time required more processors than were available to us at a single site. Hence, we assembled GreenGene, an ad-hoc grid that was constructed "on the fly" from donated computational, network, and storage resources during last year's SC|05. GreenGene consisted of 3048 processors from machines that were distributed across the United States. This paper presents a case study of mpiBLAST on GreenGene - specifically, a pre-run characterization of the computation, the hardware and software architectural design, experimental results, and future directions