利用数据分期加速生物研究的分布式处理系统

Ipsj Digital Courier Pub Date : 2008-03-15 DOI:10.2197/IPSJDC.4.250

Y. Kido, S. Seno, S. Date, Y. Takenaka, H. Matsuda

{"title":"利用数据分期加速生物研究的分布式处理系统","authors":"Y. Kido, S. Seno, S. Date, Y. Takenaka, H. Matsuda","doi":"10.2197/IPSJDC.4.250","DOIUrl":null,"url":null,"abstract":"The number of biological databases has been increasing rapidly as a result of progress in biotechnology. As the amount and heterogeneity of biological data increase, it becomes more difficult to manage the data in a few centralized databases. Moreover, the number of sites storing these databases is getting larger, and the geographic distribution of these databases has become wider. In addition, biological research tends to require a large amount of computational resources, i.e., a large number of computing nodes. As such, the computational demand has been increasing with the rapid progress of biological research. Thus, the development of methods that enable computing nodes to use such widely-distributed database sites effectively is desired. In this paper, we propose a method for providing data from the database sites to computing nodes. Since it is difficult to decide which program runs on a node and which data are requested as their inputs in advance, we have introduced the notion of “data-staging” in the proposed method. Data-staging dynamically searches for the input data from the database sites and transfers the input data to the node where the program runs. We have developed a prototype system with data-staging using grid middleware. The effectiveness of the prototype system is demonstrated by measurement of the execution time of similarity search of several-hundred gene sequences against 527 prokaryotic genome data.","PeriodicalId":432390,"journal":{"name":"Ipsj Digital Courier","volume":"163 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Distributed-Processing System for Accelerating Biological Research Using Data-Staging\",\"authors\":\"Y. Kido, S. Seno, S. Date, Y. Takenaka, H. Matsuda\",\"doi\":\"10.2197/IPSJDC.4.250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The number of biological databases has been increasing rapidly as a result of progress in biotechnology. As the amount and heterogeneity of biological data increase, it becomes more difficult to manage the data in a few centralized databases. Moreover, the number of sites storing these databases is getting larger, and the geographic distribution of these databases has become wider. In addition, biological research tends to require a large amount of computational resources, i.e., a large number of computing nodes. As such, the computational demand has been increasing with the rapid progress of biological research. Thus, the development of methods that enable computing nodes to use such widely-distributed database sites effectively is desired. In this paper, we propose a method for providing data from the database sites to computing nodes. Since it is difficult to decide which program runs on a node and which data are requested as their inputs in advance, we have introduced the notion of “data-staging” in the proposed method. Data-staging dynamically searches for the input data from the database sites and transfers the input data to the node where the program runs. We have developed a prototype system with data-staging using grid middleware. The effectiveness of the prototype system is demonstrated by measurement of the execution time of similarity search of several-hundred gene sequences against 527 prokaryotic genome data.\",\"PeriodicalId\":432390,\"journal\":{\"name\":\"Ipsj Digital Courier\",\"volume\":\"163 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ipsj Digital Courier\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2197/IPSJDC.4.250\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ipsj Digital Courier","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/IPSJDC.4.250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于生物技术的进步，生物数据库的数量迅速增加。随着生物数据的数量和异构性的增加，在几个集中的数据库中管理数据变得越来越困难。此外，储存这些数据库的地点越来越多，这些数据库的地理分布也越来越广。此外，生物研究往往需要大量的计算资源，即大量的计算节点。因此，随着生物研究的快速发展，对计算的需求也在不断增加。因此，需要开发使计算节点能够有效地使用这种广泛分布的数据库站点的方法。在本文中，我们提出了一种从数据库站点向计算节点提供数据的方法。由于很难事先决定在节点上运行哪个程序以及请求哪些数据作为其输入，因此我们在建议的方法中引入了“数据分段”的概念。数据暂存动态地从数据库站点搜索输入数据，并将输入数据传输到程序运行的节点。我们利用网格中间件开发了一个具有数据分段功能的原型系统。通过对527个原核生物基因组数据进行数百个基因序列相似性搜索的执行时间测量，验证了原型系统的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Distributed-Processing System for Accelerating Biological Research Using Data-Staging

The number of biological databases has been increasing rapidly as a result of progress in biotechnology. As the amount and heterogeneity of biological data increase, it becomes more difficult to manage the data in a few centralized databases. Moreover, the number of sites storing these databases is getting larger, and the geographic distribution of these databases has become wider. In addition, biological research tends to require a large amount of computational resources, i.e., a large number of computing nodes. As such, the computational demand has been increasing with the rapid progress of biological research. Thus, the development of methods that enable computing nodes to use such widely-distributed database sites effectively is desired. In this paper, we propose a method for providing data from the database sites to computing nodes. Since it is difficult to decide which program runs on a node and which data are requested as their inputs in advance, we have introduced the notion of “data-staging” in the proposed method. Data-staging dynamically searches for the input data from the database sites and transfers the input data to the node where the program runs. We have developed a prototype system with data-staging using grid middleware. The effectiveness of the prototype system is demonstrated by measurement of the execution time of similarity search of several-hundred gene sequences against 527 prokaryotic genome data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ipsj Digital Courier

自引率

0.00%

发文量