A New Distribution Family for Microarray Data †

Microarrays Pub Date : 2017-02-10 DOI:10.3390/microarrays6010005

D. Kelmansky, L. Ricci

{"title":"A New Distribution Family for Microarray Data †","authors":"D. Kelmansky, L. Ricci","doi":"10.3390/microarrays6010005","DOIUrl":null,"url":null,"abstract":"The traditional approach with microarray data has been to apply transformations that approximately normalize them, with the drawback of losing the original scale. The alternative standpoint taken here is to search for models that fit the data, characterized by the presence of negative values, preserving their scale; one advantage of this strategy is that it facilitates a direct interpretation of the results. A new family of distributions named gpower-normal indexed by p∈R is introduced and it is proven that these variables become normal or truncated normal when a suitable gpower transformation is applied. Expressions are given for moments and quantiles, in terms of the truncated normal density. This new family can be used to model asymmetric data that include non-positive values, as required for microarray analysis. Moreover, it has been proven that the gpower-normal family is a special case of pseudo-dispersion models, inheriting all the good properties of these models, such as asymptotic normality for small variances. A combined maximum likelihood method is proposed to estimate the model parameters, and it is applied to microarray and contamination data. R codes are available from the authors upon request.","PeriodicalId":56355,"journal":{"name":"Microarrays","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3390/microarrays6010005","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microarrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/microarrays6010005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The traditional approach with microarray data has been to apply transformations that approximately normalize them, with the drawback of losing the original scale. The alternative standpoint taken here is to search for models that fit the data, characterized by the presence of negative values, preserving their scale; one advantage of this strategy is that it facilitates a direct interpretation of the results. A new family of distributions named gpower-normal indexed by p∈R is introduced and it is proven that these variables become normal or truncated normal when a suitable gpower transformation is applied. Expressions are given for moments and quantiles, in terms of the truncated normal density. This new family can be used to model asymmetric data that include non-positive values, as required for microarray analysis. Moreover, it has been proven that the gpower-normal family is a special case of pseudo-dispersion models, inheriting all the good properties of these models, such as asymptotic normality for small variances. A combined maximum likelihood method is proposed to estimate the model parameters, and it is applied to microarray and contamination data. R codes are available from the authors upon request.

查看原文本刊更多论文

微阵列数据的新分布族†

微阵列数据的传统方法是应用近似归一化的转换，其缺点是丢失原始规模。这里采取的另一种观点是寻找符合数据的模型，其特征是存在负值，并保持其规模；这种策略的一个优点是，它有助于直接解释结果。引入了一个新的分布族，称为p∈R索引的gpower正态，并证明了当应用适当的gpower变换时，这些变量变为正态或截断正态。给出了矩和分位数的表达式，用截断法向密度表示。根据微阵列分析的需要，这个新家族可以用于对包括非正值的不对称数据进行建模。此外，已经证明了gpower正态族是伪色散模型的一个特例，继承了这些模型的所有良好性质，例如小方差的渐近正态性。提出了一种组合最大似然法来估计模型参数，并将其应用于微阵列和污染数据。作者可根据要求提供R代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Microarrays

自引率

0.00%

发文量

审稿时长

11 weeks

期刊介绍： High-Throughput (formerly Microarrays, ISSN 2076-3905) is a multidisciplinary peer-reviewed scientific journal that provides an advanced forum for the publication of studies reporting high-dimensional approaches and developments in Life Sciences, Chemistry and related fields. Our aim is to encourage scientists to publish their experimental and theoretical results based on high-throughput techniques as well as computational and statistical tools for data analysis and interpretation. The full experimental or methodological details must be provided so that the results can be reproduced. There is no restriction on the length of the papers. High-Throughput invites submissions covering several topics, including, but not limited to: Microarrays, DNA Sequencing, RNA Sequencing, Protein Identification and Quantification, Cell-based Approaches, Omics Technologies, Imaging, Bioinformatics, Computational Biology/Chemistry, Statistics, Integrative Omics, Drug Discovery and Development, Microfluidics, Lab-on-a-chip, Data Mining, Databases, Multiplex Assays.