Statistical Applications in Genetics and Molecular Biology最新文献

When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself? 当两个种群之间的等位基因共享相似性超过一个种群与自身的等位基因共享相似性时？

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2023-12-11 DOI: 10.1515/sagmb-2023-0004

Xiran Liu, Zarif Ahsan, Tarun K. Martheswaran, Noah A. Rosenberg

{"title":"When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself?","authors":"Xiran Liu, Zarif Ahsan, Tarun K. Martheswaran, Noah A. Rosenberg","doi":"10.1515/sagmb-2023-0004","DOIUrl":"https://doi.org/10.1515/sagmb-2023-0004","url":null,"abstract":"Allele-sharing statistics for a genetic locus measure the dissimilarity between two populations as a mean of the dissimilarity between random pairs of individuals, one from each population. Owing to within-population variation in genotype, allele-sharing dissimilarities can have the property that they have a nonzero value when computed between a population and itself. We consider the mathematical properties of allele-sharing dissimilarities in a pair of populations, treating the allele frequencies in the two populations parametrically. Examining two formulations of allele-sharing dissimilarity, we obtain the distributions of within-population and between-population dissimilarities for pairs of individuals. We then mathematically explore the scenarios in which, for certain allele-frequency distributions, the within-population dissimilarity – the mean dissimilarity between randomly chosen members of a population – can exceed the dissimilarity between two populations. Such scenarios assist in explaining observations in population-genetic data that members of a population can be empirically more genetically dissimilar from each other on average than they are from members of another population. For a population pair, however, the mathematical analysis finds that at least one of the two populations always possesses smaller within-population dissimilarity than the value of the between-population dissimilarity. We illustrate the mathematical results with an application to human population-genetic data.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"18 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138572820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse latent factor regression models for genome-wide and epigenome-wide association studies 全基因组和表观全基因组关联研究的稀疏潜在因子回归模型

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2022-01-01 DOI: 10.1515/sagmb-2021-0035

Basile Jumentier, Kevin Caye, Barbara Heude, Johanna Lepeule, Olivier François

{"title":"Sparse latent factor regression models for genome-wide and epigenome-wide association studies","authors":"Basile Jumentier, Kevin Caye, Barbara Heude, Johanna Lepeule, Olivier François","doi":"10.1515/sagmb-2021-0035","DOIUrl":"https://doi.org/10.1515/sagmb-2021-0035","url":null,"abstract":"Association of phenotypes or exposures with genomic and epigenomic data faces important statistical challenges. One of these challenges is to account for variation due to unobserved confounding factors, such as individual ancestry or cell-type composition in tissues. This issue can be addressed with penalized latent factor regression models, where penalties are introduced to cope with high dimension in the data. If a relatively small proportion of genomic or epigenomic markers correlate with the variable of interest, sparsity penalties may help to capture the relevant associations, but the improvement over non-sparse approaches has not been fully evaluated yet. Here, we present least-squares algorithms that jointly estimate effect sizes and confounding factors in sparse latent factor regression models. In simulated data, sparse latent factor regression models generally achieved higher statistical performance than other sparse methods, including the least absolute shrinkage and selection operator and a Bayesian sparse linear mixed model. In generative model simulations, statistical performance was slightly lower (while being comparable) to non-sparse methods, but in simulations based on empirical data, sparse latent factor regression models were more robust to departure from the model than the non-sparse approaches. We applied sparse latent factor regression models to a genome-wide association study of a flowering trait for the plant Arabidopsis thaliana and to an epigenome-wide association study of smoking status in pregnant women. For both applications, sparse latent factor regression models facilitated the estimation of non-null effect sizes while overcoming multiple testing issues. The results were not only consistent with previous discoveries, but they also pinpointed new genes with functional annotations relevant to each application.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"4 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138528299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low variability in the underlying cellular landscape adversely affects the performance of interaction-based approaches for conducting cell-specific analyses of DNA methylation in bulk samples. 潜在细胞景观的低变异性对基于相互作用的方法在大量样本中进行DNA甲基化的细胞特异性分析的性能产生了不利影响。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2021-08-10 DOI: 10.1515/sagmb-2021-0004

Richard Meier, Emily Nissen, Devin C Koestler

{"title":"Low variability in the underlying cellular landscape adversely affects the performance of interaction-based approaches for conducting cell-specific analyses of DNA methylation in bulk samples.","authors":"Richard Meier, Emily Nissen, Devin C Koestler","doi":"10.1515/sagmb-2021-0004","DOIUrl":"10.1515/sagmb-2021-0004","url":null,"abstract":"Statistical methods that allow for cell type specific DNA methylation (DNAm) analyses based on bulk-tissue methylation data have great potential to improve our understanding of human disease and have created unprecedented opportunities for new insights using the wealth of publicly available bulk-tissue methylation data. These methodologies involve incorporating interaction terms formed between the phenotypes/exposures of interest and proportions of the cell types underlying the bulk-tissue sample used for DNAm profiling. Despite growing interest in such \"interaction-based\" methods, there has been no comprehensive assessment how variability in the cellular landscape across study samples affects their performance. To answer this question, we used numerous publicly available whole-blood DNAm data sets along with extensive simulation studies and evaluated the performance of interaction-based approaches in detecting cell-specific methylation effects. Our results show that low cell proportion variability results in large estimation error and low statistical power for detecting cell-specific effects of DNAm. Further, we identified that many studies targeting methylation profiling in whole-blood may be at risk to be underpowered due to low variability in the cellular landscape across study samples. Finally, we discuss guidelines for researchers seeking to conduct studies utilizing interaction-based approaches to help ensure that their studies are adequately powered.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"20 3","pages":"73-84"},"PeriodicalIF":0.9,"publicationDate":"2021-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9125800/pdf/sagmb-20-3-sagmb-2021-0004.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39300900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions. AdaReg:线性回归中数据自适应稳健估计及其在GTEx基因表达中的应用。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2021-07-13 DOI: 10.1515/sagmb-2020-0042

Meng Wang, Lihua Jiang, Michael P Snyder

{"title":"AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions.","authors":"Meng Wang, Lihua Jiang, Michael P Snyder","doi":"10.1515/sagmb-2020-0042","DOIUrl":"https://doi.org/10.1515/sagmb-2020-0042","url":null,"abstract":"The Genotype-Tissue Expression (GTEx) project provides a valuable resource of large-scale gene expressions across multiple tissue types. Under various technical noise and unknown or unmeasured factors, how to robustly estimate the major tissue effect becomes challenging. Moreover, different genes exhibit heterogeneous expressions across different tissue types. Therefore, we need a robust method which adapts to the heterogeneities of gene expressions to improve the estimation for the tissue effect. We followed the approach of the robust estimation based on γ-density-power-weight in the works of Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. J. Multivariate Anal. 99: 2053-2081 and Windham, M.P. (1995). Robustifying model fitting. J. Roy. Stat. Soc. B: 599-609, where γ is the exponent of density weight which controls the balance between bias and variance. As far as we know, our work is the first to propose a procedure to tune the parameter γ to balance the bias-variance trade-off under the mixture models. We constructed a robust likelihood criterion based on weighted densities in the mixture model of Gaussian population distribution mixed with unknown outlier distribution, and developed a data-adaptive γ-selection procedure embedded into the robust estimation. We provided a heuristic analysis on the selection criterion and found that our practical selection trend under various γ's in average performance has similar capability to capture minimizer γ as the inestimable mean squared error (MSE) trend from our simulation studies under a series of settings. Our data-adaptive robustifying procedure in the linear regression problem (AdaReg) showed a significant advantage in both simulation studies and real data application in estimating tissue effect of heart samples from the GTEx project, compared to the fixed γ procedure and other robust methods. At the end, the paper discussed some limitations on this method and future work.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"20 2","pages":"51-71"},"PeriodicalIF":0.9,"publicationDate":"2021-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2020-0042","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39177341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Collocation based training of neural ordinary differential equations. 基于配置的神经常微分方程训练。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2021-07-09 DOI: 10.1515/sagmb-2020-0025

Elisabeth Roesch, Christopher Rackauckas, Michael P H Stumpf

{"title":"Collocation based training of neural ordinary differential equations.","authors":"Elisabeth Roesch, Christopher Rackauckas, Michael P H Stumpf","doi":"10.1515/sagmb-2020-0025","DOIUrl":"https://doi.org/10.1515/sagmb-2020-0025","url":null,"abstract":"The predictive power of machine learning models often exceeds that of mechanistic modeling approaches. However, the interpretability of purely data-driven models, without any mechanistic basis is often complicated, and predictive power by itself can be a poor metric by which we might want to judge different methods. In this work, we focus on the relatively new modeling techniques of neural ordinary differential equations. We discuss how they relate to machine learning and mechanistic models, with the potential to narrow the gulf between these two frameworks: they constitute a class of hybrid model that integrates ideas from data-driven and dynamical systems approaches. Training neural ODEs as representations of dynamical systems data has its own specific demands, and we here propose a collocation scheme as a fast and efficient training strategy. This alleviates the need for costly ODE solvers. We illustrate the advantages that collocation approaches offer, as well as their robustness to qualitative features of a dynamical system, and the quantity and quality of observational data. We focus on systems that exemplify some of the hallmarks of complex dynamical systems encountered in systems biology, and we map out how these methods can be used in the analysis of mathematical models of cellular and physiological processes.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"20 2","pages":"37-49"},"PeriodicalIF":0.9,"publicationDate":"2021-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2020-0025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39164853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

An Empirical Bayes approach for the identification of long-range chromosomal interaction from Hi-C data. 从Hi-C数据中鉴定远距离染色体相互作用的经验贝叶斯方法。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2021-01-25 DOI: 10.1515/sagmb-2020-0026

Qi Zhang, Zheng Xu, Yutong Lai

{"title":"An Empirical Bayes approach for the identification of long-range chromosomal interaction from Hi-C data.","authors":"Qi Zhang, Zheng Xu, Yutong Lai","doi":"10.1515/sagmb-2020-0026","DOIUrl":"https://doi.org/10.1515/sagmb-2020-0026","url":null,"abstract":"Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak detection from Hi-C data. The proposed framework provides flexible over-dispersion modeling by explicitly including the \"true\" interaction intensities as latent variables. To implement the proposed peak identification method (via the empirical Bayes test), we estimate the overall distributions of the observed counts semiparametrically using a Smoothed Expectation Maximization algorithm, and the empirical null based on the zero assumption. We conducted extensive simulations to validate and evaluate the performance of our proposed approach and applied it to real datasets. Our results suggest that EBHiC can identify better peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. The source code is available on Github (https://github.com/QiZhangStat/EBHiC).","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"20 1","pages":"1-15"},"PeriodicalIF":0.9,"publicationDate":"2021-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2020-0026","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25336730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining dependent p-values by gamma distributions. 通过伽马分布合并从属 P 值。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2020-11-06 DOI: 10.1515/sagmb-2019-0057

Li-Chu Chien

{"title":"Combining dependent p-values by gamma distributions.","authors":"Li-Chu Chien","doi":"10.1515/sagmb-2019-0057","DOIUrl":"10.1515/sagmb-2019-0057","url":null,"abstract":"Combining correlated p-values from multiple hypothesis testing is a most frequently used method for integrating information in genetic and genomic data analysis. However, most existing methods for combining independent p-values from individual component problems into a single unified p-value are unsuitable for the correlational structure among p-values from multiple hypothesis testing. Although some existing p-value combination methods had been modified to overcome the potential limitations, there is no uniformly most powerful method for combining correlated p-values in genetic data analysis. Therefore, providing a p-value combination method that can robustly control type I errors and keep the good power rates is necessary. In this paper, we propose an empirical method based on the gamma distribution (EMGD) for combining dependent p-values from multiple hypothesis testing. The proposed test, EMGD, allows for flexible accommodating the highly correlated p-values from the multiple hypothesis testing into a unified p-value for examining the combined hypothesis that we are interested in. The EMGD retains the robustness character of the empirical Brown's method (EBM) for pooling the dependent p-values from multiple hypothesis testing. Moreover, the EMGD keeps the character of the method based on the gamma distribution that simultaneously retains the advantages of the z-transform test and the gamma-transform test for combining dependent p-values from multiple statistical tests. The two characters lead to the EMGD that can keep the robust power for combining dependent p-values from multiple hypothesis testing. The performance of the proposed method EMGD is illustrated with simulations and real data applications by comparing with the existing methods, such as Kost and McDermott's method, the EBM and the harmonic mean p-value method.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":" ","pages":""},"PeriodicalIF":0.9,"publicationDate":"2020-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38572300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian reconstruction of transmission trees from genetic sequences and uncertain infection times. 根据基因序列和不确定的感染时间贝叶斯法重建传播树。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2020-10-21 DOI: 10.1515/sagmb-2019-0026

Hesam Montazeri, Susan Little, Mozhgan Mozaffarilegha, Niko Beerenwinkel, Victor DeGruttola

{"title":"Bayesian reconstruction of transmission trees from genetic sequences and uncertain infection times.","authors":"Hesam Montazeri, Susan Little, Mozhgan Mozaffarilegha, Niko Beerenwinkel, Victor DeGruttola","doi":"10.1515/sagmb-2019-0026","DOIUrl":"10.1515/sagmb-2019-0026","url":null,"abstract":"Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":" ","pages":""},"PeriodicalIF":0.9,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8212962/pdf/nihms-1709644.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38519010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spectral dynamic causal modelling of resting-state fMRI: an exploratory study relating effective brain connectivity in the default mode network to genetics. 静息状态fMRI的频谱动态因果建模:一项关于默认模式网络中有效大脑连接与遗传学的探索性研究。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2020-08-31 DOI: 10.1515/sagmb-2019-0058

Yunlong Nie, Eugene Opoku, Laila Yasmin, Yin Song, Jie Wang, Sidi Wu, Vanessa Scarapicchia, Jodie Gawryluk, Liangliang Wang, Jiguo Cao, Farouk S Nathoo

{"title":"Spectral dynamic causal modelling of resting-state fMRI: an exploratory study relating effective brain connectivity in the default mode network to genetics.","authors":"Yunlong Nie, Eugene Opoku, Laila Yasmin, Yin Song, Jie Wang, Sidi Wu, Vanessa Scarapicchia, Jodie Gawryluk, Liangliang Wang, Jiguo Cao, Farouk S Nathoo","doi":"10.1515/sagmb-2019-0058","DOIUrl":"https://doi.org/10.1515/sagmb-2019-0058","url":null,"abstract":"We conduct an imaging genetics study to explore how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer's disease and mild cognitive impairment. We develop an analysis of longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) and genetic data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. A Dynamic Causal Model (DCM) is fit to the rs-fMRI scans to estimate effective brain connectivity within the DMN and related to a set of single nucleotide polymorphisms (SNPs) contained in an empirical disease-constrained set which is obtained out-of-sample from 663 ADNI subjects having only genome-wide data. We relate longitudinal effective brain connectivity estimated using spectral DCM to SNPs using both linear mixed effect (LME) models as well as function-on-scalar regression (FSR). In both cases we implement a parametric bootstrap for testing SNP coefficients and make comparisons with p-values obtained from asymptotic null distributions. In both networks at an initial q-value threshold of 0.1 no effects are found. We report on exploratory patterns of associations with relatively high ranks that exhibit stability to the differing assumptions made by both FSR and LME.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"19 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2019-0058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38327608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Assessing genome-wide significance for the detection of differentially methylated regions. 评估检测差异甲基化区域的全基因组意义。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2018-09-19 DOI: 10.1515/sagmb-2017-0050

Christian M Page, Linda Vos, Trine B Rounge, Hanne F Harbo, Bettina K Andreassen

{"title":"Assessing genome-wide significance for the detection of differentially methylated regions.","authors":"Christian M Page, Linda Vos, Trine B Rounge, Hanne F Harbo, Bettina K Andreassen","doi":"10.1515/sagmb-2017-0050","DOIUrl":"https://doi.org/10.1515/sagmb-2017-0050","url":null,"abstract":"DNA methylation plays an important role in human health and disease, and methods for the identification of differently methylated regions are of increasing interest. There is currently a lack of statistical methods which properly address multiple testing, i.e. control genome-wide significance for differentially methylated regions. We introduce a scan statistic (DMRScan), which overcomes these limitations. We benchmark DMRScan against two well established methods (bumphunter, DMRcate), using a simulation study based on real methylation data. An implementation of DMRScan is available from Bioconductor. Our method has higher power than alternative methods across different simulation scenarios, particularly for small effect sizes. DMRScan exhibits greater flexibility in statistical modeling and can be used with more complex designs than current methods. DMRScan is the first dynamic approach which properly addresses the multiple-testing challenges for the identification of differently methylated regions. DMRScan outperformed alternative methods in terms of power, while keeping the false discovery rate controlled.","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"17 5","pages":""},"PeriodicalIF":0.9,"publicationDate":"2018-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2017-0050","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36502575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4