A parallel software pipeline to select relevant genes for pathway enrichment

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2022-03-01 DOI:10.1109/pdp55904.2022.00041

Giuseppe Agapito, M. Cannataro

{"title":"A parallel software pipeline to select relevant genes for pathway enrichment","authors":"Giuseppe Agapito, M. Cannataro","doi":"10.1109/pdp55904.2022.00041","DOIUrl":null,"url":null,"abstract":"The continuous technological development of experimental omics technologies such as microarrays, allows to perform large scale genomics studies. After the initial enthusiasm, it became pretty clear that even the results provided by microarrays in form of lists of differential expressed genes (DEGs), were mainly as enigmatic as the first sequence of the genome, because these lists of DEGs are detached from the influenced biological mechanisms. Pathway enrichment analysis (PEA) supports researchers to provide the clues necessary to link DEGs to the influenced biological pathways and consequently to the underlying biological mechanisms and processes. Putting DEGs data sets in a suitable format for the PEA can be a tedious error-prone and laborious process even for bioinformaticians, who needs to perform it manually before to be ready for the PEA. To fill this lack, we present a parallel software pipeline which uploads a list of DEGs and automatically provides as results the enriched pathways.The parallel software pipeline is implemented in Python and provides the following automated actions: i) parallel splitting of DEGs in groups; ii) parallel building of the similarity matrices related to the DEGs groups; iii) parallel mapping of similarity matrices in networks; iv) parallel pathway enrichment analysis for each group of identified DEGs.Preliminary results shown that the pipeline can help to analyze DEGs and easily generate in a few minutes a list of pathway enrichment results that otherwise would require numerous hours of manual work and several different scripts.The parallel software pipeline provides a two-fold benefits: first, it contributes to speed up the computation of pathway enrichment, automating several steps currently performed manually. Second, it provides a more peculiar list of DEGs to calculate pathway enrichment, contributing to improve the relevance and significance of the enriched pathways.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The continuous technological development of experimental omics technologies such as microarrays, allows to perform large scale genomics studies. After the initial enthusiasm, it became pretty clear that even the results provided by microarrays in form of lists of differential expressed genes (DEGs), were mainly as enigmatic as the first sequence of the genome, because these lists of DEGs are detached from the influenced biological mechanisms. Pathway enrichment analysis (PEA) supports researchers to provide the clues necessary to link DEGs to the influenced biological pathways and consequently to the underlying biological mechanisms and processes. Putting DEGs data sets in a suitable format for the PEA can be a tedious error-prone and laborious process even for bioinformaticians, who needs to perform it manually before to be ready for the PEA. To fill this lack, we present a parallel software pipeline which uploads a list of DEGs and automatically provides as results the enriched pathways.The parallel software pipeline is implemented in Python and provides the following automated actions: i) parallel splitting of DEGs in groups; ii) parallel building of the similarity matrices related to the DEGs groups; iii) parallel mapping of similarity matrices in networks; iv) parallel pathway enrichment analysis for each group of identified DEGs.Preliminary results shown that the pipeline can help to analyze DEGs and easily generate in a few minutes a list of pathway enrichment results that otherwise would require numerous hours of manual work and several different scripts.The parallel software pipeline provides a two-fold benefits: first, it contributes to speed up the computation of pathway enrichment, automating several steps currently performed manually. Second, it provides a more peculiar list of DEGs to calculate pathway enrichment, contributing to improve the relevance and significance of the enriched pathways.

查看原文本刊更多论文

一个平行的软件管道来选择相关基因进行途径富集

实验组学技术的不断发展，如微阵列，允许进行大规模的基因组学研究。在最初的热情之后，很明显，即使是微阵列以差异表达基因列表(deg)的形式提供的结果，也主要像基因组的第一个序列一样神秘，因为这些差异表达基因列表与受影响的生物机制是分离的。途径富集分析(Pathway enrichment analysis, PEA)支持研究人员提供必要的线索，将deg与受影响的生物学途径联系起来，从而与潜在的生物学机制和过程联系起来。将DEGs数据集转换为适合PEA的格式可能是一个乏味且容易出错的过程，即使对于生物信息学家来说也是如此，他们需要在为PEA做好准备之前手动执行该过程。为了填补这一不足，我们提出了一个并行的软件管道，它上传了一个deg列表，并自动提供了丰富的路径结果。并行软件管道是在Python中实现的，并提供以下自动操作:i)分组并行拆分deg;ii)平行构建与deg群相关的相似性矩阵;网络中相似矩阵的并行映射;iv)对每组鉴定的DEGs进行平行途径富集分析。初步结果表明，该管道可以帮助分析deg，并在几分钟内轻松生成途径富集结果列表，否则需要许多小时的手工工作和几个不同的脚本。并行软件管道提供了双重好处:首先，它有助于加快路径富集的计算速度，使目前手动执行的几个步骤自动化。其次，它提供了一个更奇特的deg列表来计算途径富集，有助于提高富集途径的相关性和意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量