{"title":"GEM:在内存受限架构上开发共享内存并行基因组应用的框架","authors":"Mucahid Kutlu, G. Agrawal","doi":"10.1109/ICPP.2015.92","DOIUrl":null,"url":null,"abstract":"Amount of available genomic data is increasing rapidly with the recent developments in sequencing technologies. Analysis of such data can potentially lead significant advancements in medical research and even practice. However, it is imperative to exploit parallelism and utilize computational resources effectively to handle large scale genomic data. At the same time, the trends in computing technologies are towards architectures with large number of cores and smaller memory size per core (e.g. Intel Xeon Phi). Innovative solutions that meet the requirements of parallel genomic data processing with the constraints of the new computational architectures are urgently needed. In this work, we develop a novel middleware system, GEM, for developing shared-memory parallel genomic applications with memory constraint architectures. We propose load-map-reduce approach and a novel scheduling scheme to decrease I/O contention and prevent over-consumption of the limited memory. We also use domain specific knowledge to decrease the memory requirements of the tasks. In our experiments, we show that GEM has high scalability on Intel Xeon Phi architecture. We also compare GEM against two other frameworks for genomic data processing, GATK and PAGE, and show that our middleware outperforms both.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"GEM: A Framework for Developing Shared-Memory Parallel Genomic Applications on Memory Constrained Architectures\",\"authors\":\"Mucahid Kutlu, G. Agrawal\",\"doi\":\"10.1109/ICPP.2015.92\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Amount of available genomic data is increasing rapidly with the recent developments in sequencing technologies. Analysis of such data can potentially lead significant advancements in medical research and even practice. However, it is imperative to exploit parallelism and utilize computational resources effectively to handle large scale genomic data. At the same time, the trends in computing technologies are towards architectures with large number of cores and smaller memory size per core (e.g. Intel Xeon Phi). Innovative solutions that meet the requirements of parallel genomic data processing with the constraints of the new computational architectures are urgently needed. In this work, we develop a novel middleware system, GEM, for developing shared-memory parallel genomic applications with memory constraint architectures. We propose load-map-reduce approach and a novel scheduling scheme to decrease I/O contention and prevent over-consumption of the limited memory. We also use domain specific knowledge to decrease the memory requirements of the tasks. In our experiments, we show that GEM has high scalability on Intel Xeon Phi architecture. We also compare GEM against two other frameworks for genomic data processing, GATK and PAGE, and show that our middleware outperforms both.\",\"PeriodicalId\":423007,\"journal\":{\"name\":\"2015 44th International Conference on Parallel Processing\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 44th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2015.92\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 44th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2015.92","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GEM: A Framework for Developing Shared-Memory Parallel Genomic Applications on Memory Constrained Architectures
Amount of available genomic data is increasing rapidly with the recent developments in sequencing technologies. Analysis of such data can potentially lead significant advancements in medical research and even practice. However, it is imperative to exploit parallelism and utilize computational resources effectively to handle large scale genomic data. At the same time, the trends in computing technologies are towards architectures with large number of cores and smaller memory size per core (e.g. Intel Xeon Phi). Innovative solutions that meet the requirements of parallel genomic data processing with the constraints of the new computational architectures are urgently needed. In this work, we develop a novel middleware system, GEM, for developing shared-memory parallel genomic applications with memory constraint architectures. We propose load-map-reduce approach and a novel scheduling scheme to decrease I/O contention and prevent over-consumption of the limited memory. We also use domain specific knowledge to decrease the memory requirements of the tasks. In our experiments, we show that GEM has high scalability on Intel Xeon Phi architecture. We also compare GEM against two other frameworks for genomic data processing, GATK and PAGE, and show that our middleware outperforms both.