{"title":"SpEpistasis: A sparse approach for three-way epistasis detection","authors":"Diogo Marques, Leonel Sousa, Aleksandar Ilic","doi":"10.1016/j.jpdc.2024.104989","DOIUrl":null,"url":null,"abstract":"<div><div>Epistasis detection is a fundamental application in the areas of bioinformatics and biomedicine, providing important insights regarding the relationship between the human genome and the occurrence of certain diseases. Exhaustive epistasis detection approaches are employed to achieve an accurate and deterministic solution, at the cost of high computational complexity, especially when targeting high-order epistasis. While recent works employ vectorization and cache-blocking techniques to alleviate this burden, these solutions are now limited by the maximum performance of the functional units of computing systems. Thus, to further improve the performance of epistasis detection it is necessary to reduce its number of memory transfers and computations. To tackle this issue, this work proposes SpEpistasis, which performs three-way epistasis detection by relying on sparse features, which by only storing the non-zero elements of the dataset, allows for reducing the number of operations needed for epistasis detection. To achieve this goal, a new hybrid format to represent the input dataset is proposed, which stores a subset of the data in the compressed sparse row format. Moreover, new sparse-aware algorithmic approaches are also proposed in order to leverage both the hybrid format and the vector capabilities of current CPUs from Intel, AMD, and ARM. The experimental results show that SpEpistasis provides a speedup up to 3.7× and average speedups of around 1.8× and 1.33× when compared with other state-of-the-art works.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001539","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Epistasis detection is a fundamental application in the areas of bioinformatics and biomedicine, providing important insights regarding the relationship between the human genome and the occurrence of certain diseases. Exhaustive epistasis detection approaches are employed to achieve an accurate and deterministic solution, at the cost of high computational complexity, especially when targeting high-order epistasis. While recent works employ vectorization and cache-blocking techniques to alleviate this burden, these solutions are now limited by the maximum performance of the functional units of computing systems. Thus, to further improve the performance of epistasis detection it is necessary to reduce its number of memory transfers and computations. To tackle this issue, this work proposes SpEpistasis, which performs three-way epistasis detection by relying on sparse features, which by only storing the non-zero elements of the dataset, allows for reducing the number of operations needed for epistasis detection. To achieve this goal, a new hybrid format to represent the input dataset is proposed, which stores a subset of the data in the compressed sparse row format. Moreover, new sparse-aware algorithmic approaches are also proposed in order to leverage both the hybrid format and the vector capabilities of current CPUs from Intel, AMD, and ARM. The experimental results show that SpEpistasis provides a speedup up to 3.7× and average speedups of around 1.8× and 1.33× when compared with other state-of-the-art works.
期刊介绍:
This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing.
The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.