{"title":"Monte Carlo approximation of the logarithm of the determinant of large matrices with applications for linear mixed models in quantitative genetics","authors":"Matias Bermann, Alejandra Alvarez-Munera, Andres Legarra, Ignacio Aguilar, Ignacy Misztal, Daniela Lourenco","doi":"10.1186/s12711-025-00991-1","DOIUrl":null,"url":null,"abstract":"Likelihood-based inferences such as variance components estimation and hypothesis testing need logarithms of the determinant (log-determinant) of high dimensional matrices. Calculating the log-determinant is memory and time-consuming, making it impossible to perform likelihood-based inferences for large datasets. We presented a method for approximating the log-determinant of positive semi-definite matrices based on repeated matrix–vector products and complex calculus. We tested the approximation of the log-determinant in beef and dairy cattle, chicken, and pig datasets including single and multiple-trait models. Average absolute relative differences between the approximated and exact log-determinant were around 10–3. The approximation was between 2 and 500 times faster than the exact calculation for medium and large matrices. We compared the restricted likelihood with (approximated) and without (exact) the approximation of the log-determinant for different values of heritability for a single-trait model. We also compared estimated variance components using exact expectation–maximization (EM) and average information (AI) REML algorithms, against two derivative-free approaches using the restricted likelihood calculated with the log-determinant approximation. The approximated and exact restricted likelihood showed maxima at the same heritability value. Derivative-free estimation of variance components with the approximated log-determinant converged to the same values as EM and AI-REML. The proposed approach is feasible to apply to any data size. The method presented in this study allows to approximate the log-determinant of positive semi-definite matrices and, therefore, the likelihood for datasets of any size. This opens the possibility of performing likelihood-based inferences for large datasets in animal and plant breeding.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"64 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-025-00991-1","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Likelihood-based inferences such as variance components estimation and hypothesis testing need logarithms of the determinant (log-determinant) of high dimensional matrices. Calculating the log-determinant is memory and time-consuming, making it impossible to perform likelihood-based inferences for large datasets. We presented a method for approximating the log-determinant of positive semi-definite matrices based on repeated matrix–vector products and complex calculus. We tested the approximation of the log-determinant in beef and dairy cattle, chicken, and pig datasets including single and multiple-trait models. Average absolute relative differences between the approximated and exact log-determinant were around 10–3. The approximation was between 2 and 500 times faster than the exact calculation for medium and large matrices. We compared the restricted likelihood with (approximated) and without (exact) the approximation of the log-determinant for different values of heritability for a single-trait model. We also compared estimated variance components using exact expectation–maximization (EM) and average information (AI) REML algorithms, against two derivative-free approaches using the restricted likelihood calculated with the log-determinant approximation. The approximated and exact restricted likelihood showed maxima at the same heritability value. Derivative-free estimation of variance components with the approximated log-determinant converged to the same values as EM and AI-REML. The proposed approach is feasible to apply to any data size. The method presented in this study allows to approximate the log-determinant of positive semi-definite matrices and, therefore, the likelihood for datasets of any size. This opens the possibility of performing likelihood-based inferences for large datasets in animal and plant breeding.
期刊介绍:
Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.