{"title":"R和Mathematica软件包在胰腺癌微阵列数据集上差异基因表达分析的比较分析","authors":"Kinza Qazi, Tehreem Anwar","doi":"10.54393/pbmj.v6i04.863","DOIUrl":null,"url":null,"abstract":"Microarrays produces enormous amounts of information requiring a series of repeated analyses to condense data. To analyze this data several computational software is used. Objective: To compare the analysis of R and Mathematica package for differential gene expression analysis using microarray dataset. Methods: Microarray Data were collected from an online database GEO (gene expression omnibus). Mathematica and R software was used for comparative analysis. In R software, Robust Multi-Array Average (RMA), was used for data normalization. While Limma package was used for DGE analysis. In Mathematica software, AffyDGED was used for normalization and DGE analysis of dataset. Results: 3,426 non-differentially expressed genes and 14936 genes with differential expression were separated from R. The thresholds for identifying \"up\" and \"down\" gene expression were estimated to be 0.98 and -0.19, respectively, using the RMA method to analyze this dataset. AffyDGED from Mathematica detected 1,832 genes as differentially expressed; of them, 1,591 genes overlap with the real and 1,944 differently expressed genes, giving the true positive rate of (1591/1944) =0.818. This indicates that 18% of the genuine list of differentially expressed genes could not be reliably identified by AffyDGED. Conclusions: R programming is one of the most popular and recommendable tools for microarrays to perform different analysis, and along with Bioconductor it makes one of the best analysis algorithms for DGE analysis. On the other hand, AffyDGED brings a contemporary algorithm useful in the real world to the Mathematica user.","PeriodicalId":19844,"journal":{"name":"Pakistan BioMedical Journal","volume":"1072 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Analysis of R and Mathematica Package for Differential Gene Expression Analysis Using Microarray Dataset on Pancreatic Cancer\",\"authors\":\"Kinza Qazi, Tehreem Anwar\",\"doi\":\"10.54393/pbmj.v6i04.863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microarrays produces enormous amounts of information requiring a series of repeated analyses to condense data. To analyze this data several computational software is used. Objective: To compare the analysis of R and Mathematica package for differential gene expression analysis using microarray dataset. Methods: Microarray Data were collected from an online database GEO (gene expression omnibus). Mathematica and R software was used for comparative analysis. In R software, Robust Multi-Array Average (RMA), was used for data normalization. While Limma package was used for DGE analysis. In Mathematica software, AffyDGED was used for normalization and DGE analysis of dataset. Results: 3,426 non-differentially expressed genes and 14936 genes with differential expression were separated from R. The thresholds for identifying \\\"up\\\" and \\\"down\\\" gene expression were estimated to be 0.98 and -0.19, respectively, using the RMA method to analyze this dataset. AffyDGED from Mathematica detected 1,832 genes as differentially expressed; of them, 1,591 genes overlap with the real and 1,944 differently expressed genes, giving the true positive rate of (1591/1944) =0.818. This indicates that 18% of the genuine list of differentially expressed genes could not be reliably identified by AffyDGED. Conclusions: R programming is one of the most popular and recommendable tools for microarrays to perform different analysis, and along with Bioconductor it makes one of the best analysis algorithms for DGE analysis. On the other hand, AffyDGED brings a contemporary algorithm useful in the real world to the Mathematica user.\",\"PeriodicalId\":19844,\"journal\":{\"name\":\"Pakistan BioMedical Journal\",\"volume\":\"1072 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pakistan BioMedical Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54393/pbmj.v6i04.863\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pakistan BioMedical Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54393/pbmj.v6i04.863","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
微阵列产生大量的信息,需要一系列的重复分析来压缩数据。为了分析这些数据,使用了几种计算软件。目的:比较R和Mathematica软件包对微阵列数据集差异基因表达分析的分析效果。方法:从基因表达综合数据库GEO (gene expression omnibus)中收集微阵列数据。采用Mathematica和R软件进行对比分析。在R软件中,使用鲁棒多阵列平均(Robust Multi-Array Average, RMA)进行数据归一化。采用Limma包进行DGE分析。在Mathematica软件中,使用affyged对数据集进行归一化和DGE分析。结果:从r中分离出3,426个非差异表达基因和14936个差异表达基因,使用RMA方法对该数据集进行分析,估计基因表达“向上”和“向下”的识别阈值分别为0.98和-0.19。affyged from Mathematica检测到1832个差异表达基因;其中,1591个基因与真实基因重叠,1944个基因表达不同,真阳性率为(1591/1944)=0.818。这表明18%的真正的差异表达基因列表不能被affyged可靠地识别。结论:R编程是微阵列执行不同分析的最流行和最值得推荐的工具之一,并且与Bioconductor一起成为DGE分析的最佳分析算法之一。另一方面,affyged为Mathematica用户带来了一个在现实世界中有用的现代算法。
Comparative Analysis of R and Mathematica Package for Differential Gene Expression Analysis Using Microarray Dataset on Pancreatic Cancer
Microarrays produces enormous amounts of information requiring a series of repeated analyses to condense data. To analyze this data several computational software is used. Objective: To compare the analysis of R and Mathematica package for differential gene expression analysis using microarray dataset. Methods: Microarray Data were collected from an online database GEO (gene expression omnibus). Mathematica and R software was used for comparative analysis. In R software, Robust Multi-Array Average (RMA), was used for data normalization. While Limma package was used for DGE analysis. In Mathematica software, AffyDGED was used for normalization and DGE analysis of dataset. Results: 3,426 non-differentially expressed genes and 14936 genes with differential expression were separated from R. The thresholds for identifying "up" and "down" gene expression were estimated to be 0.98 and -0.19, respectively, using the RMA method to analyze this dataset. AffyDGED from Mathematica detected 1,832 genes as differentially expressed; of them, 1,591 genes overlap with the real and 1,944 differently expressed genes, giving the true positive rate of (1591/1944) =0.818. This indicates that 18% of the genuine list of differentially expressed genes could not be reliably identified by AffyDGED. Conclusions: R programming is one of the most popular and recommendable tools for microarrays to perform different analysis, and along with Bioconductor it makes one of the best analysis algorithms for DGE analysis. On the other hand, AffyDGED brings a contemporary algorithm useful in the real world to the Mathematica user.