{"title":"FlexLMM: a Nextflow linear mixed model framework for GWAS.","authors":"Saul Pierotti, Tomas Fitzgerald, Ewan Birney","doi":"10.1093/bioinformatics/btaf021","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Linear mixed models (LMMs) are a commonly used statistical approach in genome-wide association studies when population structure is present. However, naive permutations of the phenotype to empirically estimate the null distribution of a statistic of interest are not appropriate in the presence of population structure or covariates. This is because the samples are not exchangeable with each other under the null hypothesis, and because permuting the phenotypes breaks the relationship among those and eventual covariates. For this reason, we developed FlexLMM, a Nextflow pipeline that can perform appropriate permutations in LMMs while allowing for flexibility in the definition of the exact statistical model to be used. FlexLMM can set a significance threshold via permutations, thanks to a two-step process where the population structure is first regressed out, and only then are the permutations performed on the uncorrelated residuals. We envision this pipeline will be particularly useful for researchers working on multi-parental crosses among inbred lines of model organisms or farm animals and plants.</p><p><strong>Availability and implementation: </strong>The source code and documentation for the FlexLMM is available at https://github.com/birneylab/flexlmm.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783306/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Summary: Linear mixed models (LMMs) are a commonly used statistical approach in genome-wide association studies when population structure is present. However, naive permutations of the phenotype to empirically estimate the null distribution of a statistic of interest are not appropriate in the presence of population structure or covariates. This is because the samples are not exchangeable with each other under the null hypothesis, and because permuting the phenotypes breaks the relationship among those and eventual covariates. For this reason, we developed FlexLMM, a Nextflow pipeline that can perform appropriate permutations in LMMs while allowing for flexibility in the definition of the exact statistical model to be used. FlexLMM can set a significance threshold via permutations, thanks to a two-step process where the population structure is first regressed out, and only then are the permutations performed on the uncorrelated residuals. We envision this pipeline will be particularly useful for researchers working on multi-parental crosses among inbred lines of model organisms or farm animals and plants.
Availability and implementation: The source code and documentation for the FlexLMM is available at https://github.com/birneylab/flexlmm.