{"title":"A parallel software pipeline to select relevant genes for pathway enrichment","authors":"Giuseppe Agapito, M. Cannataro","doi":"10.1109/pdp55904.2022.00041","DOIUrl":null,"url":null,"abstract":"The continuous technological development of experimental omics technologies such as microarrays, allows to perform large scale genomics studies. After the initial enthusiasm, it became pretty clear that even the results provided by microarrays in form of lists of differential expressed genes (DEGs), were mainly as enigmatic as the first sequence of the genome, because these lists of DEGs are detached from the influenced biological mechanisms. Pathway enrichment analysis (PEA) supports researchers to provide the clues necessary to link DEGs to the influenced biological pathways and consequently to the underlying biological mechanisms and processes. Putting DEGs data sets in a suitable format for the PEA can be a tedious error-prone and laborious process even for bioinformaticians, who needs to perform it manually before to be ready for the PEA. To fill this lack, we present a parallel software pipeline which uploads a list of DEGs and automatically provides as results the enriched pathways.The parallel software pipeline is implemented in Python and provides the following automated actions: i) parallel splitting of DEGs in groups; ii) parallel building of the similarity matrices related to the DEGs groups; iii) parallel mapping of similarity matrices in networks; iv) parallel pathway enrichment analysis for each group of identified DEGs.Preliminary results shown that the pipeline can help to analyze DEGs and easily generate in a few minutes a list of pathway enrichment results that otherwise would require numerous hours of manual work and several different scripts.The parallel software pipeline provides a two-fold benefits: first, it contributes to speed up the computation of pathway enrichment, automating several steps currently performed manually. Second, it provides a more peculiar list of DEGs to calculate pathway enrichment, contributing to improve the relevance and significance of the enriched pathways.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The continuous technological development of experimental omics technologies such as microarrays, allows to perform large scale genomics studies. After the initial enthusiasm, it became pretty clear that even the results provided by microarrays in form of lists of differential expressed genes (DEGs), were mainly as enigmatic as the first sequence of the genome, because these lists of DEGs are detached from the influenced biological mechanisms. Pathway enrichment analysis (PEA) supports researchers to provide the clues necessary to link DEGs to the influenced biological pathways and consequently to the underlying biological mechanisms and processes. Putting DEGs data sets in a suitable format for the PEA can be a tedious error-prone and laborious process even for bioinformaticians, who needs to perform it manually before to be ready for the PEA. To fill this lack, we present a parallel software pipeline which uploads a list of DEGs and automatically provides as results the enriched pathways.The parallel software pipeline is implemented in Python and provides the following automated actions: i) parallel splitting of DEGs in groups; ii) parallel building of the similarity matrices related to the DEGs groups; iii) parallel mapping of similarity matrices in networks; iv) parallel pathway enrichment analysis for each group of identified DEGs.Preliminary results shown that the pipeline can help to analyze DEGs and easily generate in a few minutes a list of pathway enrichment results that otherwise would require numerous hours of manual work and several different scripts.The parallel software pipeline provides a two-fold benefits: first, it contributes to speed up the computation of pathway enrichment, automating several steps currently performed manually. Second, it provides a more peculiar list of DEGs to calculate pathway enrichment, contributing to improve the relevance and significance of the enriched pathways.