{"title":"面向数据密集型计算的网格文件系统数据管理","authors":"Hitoshi Sato, S. Matsuoka","doi":"10.1109/SAINT-W.2007.38","DOIUrl":null,"url":null,"abstract":"In parallel computing environments such as HPC clusters and the grid, data-intensive applications involve large overhead costs due to a concentration of access to the files on common nodes. To avoid this problem in traditional distributed file systems, users have to distribute the file access manually. However, such solution has some difficulties for users in the grid environment. We propose a data management mechanism for data-intensive computing on grid filesystem. Our technique improves the file access performance by automatically scheduling the file access and the data management on the filesystem. The filesystem is based on dynamically configured node groups corresponding to the network topology. Utilizing the configuration, it monitors file access to detect concentrated situations, creates the file replica, and schedules its placement and access. We applied the proposal technique to the Gfarm, a filesystem that scales to the grid. We emulate real application workloads using a job scheduler and confirmed a speedup of factor 3.7 compared with a filesystem without automatic file access distribution techniques","PeriodicalId":254195,"journal":{"name":"2007 International Symposium on Applications and the Internet Workshops","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Data Management on Grid Filesystem for Data-Intensive Computing\",\"authors\":\"Hitoshi Sato, S. Matsuoka\",\"doi\":\"10.1109/SAINT-W.2007.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In parallel computing environments such as HPC clusters and the grid, data-intensive applications involve large overhead costs due to a concentration of access to the files on common nodes. To avoid this problem in traditional distributed file systems, users have to distribute the file access manually. However, such solution has some difficulties for users in the grid environment. We propose a data management mechanism for data-intensive computing on grid filesystem. Our technique improves the file access performance by automatically scheduling the file access and the data management on the filesystem. The filesystem is based on dynamically configured node groups corresponding to the network topology. Utilizing the configuration, it monitors file access to detect concentrated situations, creates the file replica, and schedules its placement and access. We applied the proposal technique to the Gfarm, a filesystem that scales to the grid. We emulate real application workloads using a job scheduler and confirmed a speedup of factor 3.7 compared with a filesystem without automatic file access distribution techniques\",\"PeriodicalId\":254195,\"journal\":{\"name\":\"2007 International Symposium on Applications and the Internet Workshops\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 International Symposium on Applications and the Internet Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SAINT-W.2007.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Symposium on Applications and the Internet Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAINT-W.2007.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data Management on Grid Filesystem for Data-Intensive Computing
In parallel computing environments such as HPC clusters and the grid, data-intensive applications involve large overhead costs due to a concentration of access to the files on common nodes. To avoid this problem in traditional distributed file systems, users have to distribute the file access manually. However, such solution has some difficulties for users in the grid environment. We propose a data management mechanism for data-intensive computing on grid filesystem. Our technique improves the file access performance by automatically scheduling the file access and the data management on the filesystem. The filesystem is based on dynamically configured node groups corresponding to the network topology. Utilizing the configuration, it monitors file access to detect concentrated situations, creates the file replica, and schedules its placement and access. We applied the proposal technique to the Gfarm, a filesystem that scales to the grid. We emulate real application workloads using a job scheduler and confirmed a speedup of factor 3.7 compared with a filesystem without automatic file access distribution techniques