脑成像遗传学大数据加速稀疏典型相关分析

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI:10.1145/2616498.2616515

Jingwen Yan, Hui Zhang, Lei Du, E. Wernert, A. Saykin, Li Shen

{"title":"脑成像遗传学大数据加速稀疏典型相关分析","authors":"Jingwen Yan, Hui Zhang, Lei Du, E. Wernert, A. Saykin, Li Shen","doi":"10.1145/2616498.2616515","DOIUrl":null,"url":null,"abstract":"Recent advances in acquiring high throughput neuroimaging and genomics data provide exciting new opportunities to study the influence of genetic variation on brain structure and function. Research in this emergent field, known as imaging genetics, aims to identify the association between genetic variations such as single nucleotide polymorphisms (SNPs) and neuroimaging quantitative traits (QTs). Sparse canonical correlation analysis (SCCA) is a bi-multivariate analysis method that has the potential to reveal complex multi-SNP-multi-QT associations. However, the scale and complexity of the imaging genetic data have presented critical computational bottlenecks requiring new concepts and enabling tools. In this paper, we present our initial efforts on developing a set of massively parallel strategies to accelerate a widely used SCCA implementation provided by the Penalized Multivariate Analysis (PMA) software package. In particular, we exploit parallel packages of R, optimized mathematical libraries, and the automatic offload model for Intel Many Integrated Core (MIC) architecture to accelerate SCCA. We create several simulated imaging genetics data sets of different sizes and use these synthetic data to perform comparative study. Our performance evaluation demonstrates that a 2-fold speedup can be achieved by the proposed acceleration. The preliminary results show that by combining data parallel strategy and the offload model for MIC we can significantly reduce the knowledge discovery timelines involving applying SCCA on large brain imaging genetics data.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"1 1","pages":"4:1-4:7"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data\",\"authors\":\"Jingwen Yan, Hui Zhang, Lei Du, E. Wernert, A. Saykin, Li Shen\",\"doi\":\"10.1145/2616498.2616515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in acquiring high throughput neuroimaging and genomics data provide exciting new opportunities to study the influence of genetic variation on brain structure and function. Research in this emergent field, known as imaging genetics, aims to identify the association between genetic variations such as single nucleotide polymorphisms (SNPs) and neuroimaging quantitative traits (QTs). Sparse canonical correlation analysis (SCCA) is a bi-multivariate analysis method that has the potential to reveal complex multi-SNP-multi-QT associations. However, the scale and complexity of the imaging genetic data have presented critical computational bottlenecks requiring new concepts and enabling tools. In this paper, we present our initial efforts on developing a set of massively parallel strategies to accelerate a widely used SCCA implementation provided by the Penalized Multivariate Analysis (PMA) software package. In particular, we exploit parallel packages of R, optimized mathematical libraries, and the automatic offload model for Intel Many Integrated Core (MIC) architecture to accelerate SCCA. We create several simulated imaging genetics data sets of different sizes and use these synthetic data to perform comparative study. Our performance evaluation demonstrates that a 2-fold speedup can be achieved by the proposed acceleration. The preliminary results show that by combining data parallel strategy and the offload model for MIC we can significantly reduce the knowledge discovery timelines involving applying SCCA on large brain imaging genetics data.\",\"PeriodicalId\":93364,\"journal\":{\"name\":\"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)\",\"volume\":\"1 1\",\"pages\":\"4:1-4:7\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2616498.2616515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2616498.2616515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在获取高通量神经影像学和基因组学数据方面的最新进展为研究遗传变异对大脑结构和功能的影响提供了令人兴奋的新机会。这一新兴领域的研究被称为成像遗传学，旨在确定遗传变异(如单核苷酸多态性(SNPs))与神经成像数量性状(QTs)之间的关系。稀疏典型相关分析(SCCA)是一种双变量分析方法，有可能揭示复杂的多snp -多qt关联。然而，成像遗传数据的规模和复杂性提出了关键的计算瓶颈，需要新的概念和启用工具。在本文中，我们介绍了我们在开发一套大规模并行策略方面的初步努力，以加速由惩罚多元分析(PMA)软件包提供的广泛使用的SCCA实现。特别是，我们利用R的并行包、优化的数学库和英特尔多集成核心(MIC)架构的自动卸载模型来加速SCCA。我们创建了几个不同大小的模拟成像遗传学数据集，并使用这些合成数据进行比较研究。我们的性能评估表明，通过提议的加速可以实现2倍的加速。初步结果表明，将数据并行策略与MIC卸载模型相结合，可以显著缩短SCCA应用于大容量脑成像遗传学数据的知识发现时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data

Recent advances in acquiring high throughput neuroimaging and genomics data provide exciting new opportunities to study the influence of genetic variation on brain structure and function. Research in this emergent field, known as imaging genetics, aims to identify the association between genetic variations such as single nucleotide polymorphisms (SNPs) and neuroimaging quantitative traits (QTs). Sparse canonical correlation analysis (SCCA) is a bi-multivariate analysis method that has the potential to reveal complex multi-SNP-multi-QT associations. However, the scale and complexity of the imaging genetic data have presented critical computational bottlenecks requiring new concepts and enabling tools. In this paper, we present our initial efforts on developing a set of massively parallel strategies to accelerate a widely used SCCA implementation provided by the Penalized Multivariate Analysis (PMA) software package. In particular, we exploit parallel packages of R, optimized mathematical libraries, and the automatic offload model for Intel Many Integrated Core (MIC) architecture to accelerate SCCA. We create several simulated imaging genetics data sets of different sizes and use these synthetic data to perform comparative study. Our performance evaluation demonstrates that a 2-fold speedup can be achieved by the proposed acceleration. The preliminary results show that by combining data parallel strategy and the offload model for MIC we can significantly reduce the knowledge discovery timelines involving applying SCCA on large brain imaging genetics data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

自引率

0.00%

发文量