Monte Carlo approximation of the logarithm of the determinant of large matrices with applications for linear mixed models in quantitative genetics

IF 3.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution Pub Date : 2025-08-06 DOI:10.1186/s12711-025-00991-1

Matias Bermann, Alejandra Alvarez-Munera, Andres Legarra, Ignacio Aguilar, Ignacy Misztal, Daniela Lourenco

{"title":"Monte Carlo approximation of the logarithm of the determinant of large matrices with applications for linear mixed models in quantitative genetics","authors":"Matias Bermann, Alejandra Alvarez-Munera, Andres Legarra, Ignacio Aguilar, Ignacy Misztal, Daniela Lourenco","doi":"10.1186/s12711-025-00991-1","DOIUrl":null,"url":null,"abstract":"Likelihood-based inferences such as variance components estimation and hypothesis testing need logarithms of the determinant (log-determinant) of high dimensional matrices. Calculating the log-determinant is memory and time-consuming, making it impossible to perform likelihood-based inferences for large datasets. We presented a method for approximating the log-determinant of positive semi-definite matrices based on repeated matrix–vector products and complex calculus. We tested the approximation of the log-determinant in beef and dairy cattle, chicken, and pig datasets including single and multiple-trait models. Average absolute relative differences between the approximated and exact log-determinant were around 10–3. The approximation was between 2 and 500 times faster than the exact calculation for medium and large matrices. We compared the restricted likelihood with (approximated) and without (exact) the approximation of the log-determinant for different values of heritability for a single-trait model. We also compared estimated variance components using exact expectation–maximization (EM) and average information (AI) REML algorithms, against two derivative-free approaches using the restricted likelihood calculated with the log-determinant approximation. The approximated and exact restricted likelihood showed maxima at the same heritability value. Derivative-free estimation of variance components with the approximated log-determinant converged to the same values as EM and AI-REML. The proposed approach is feasible to apply to any data size. The method presented in this study allows to approximate the log-determinant of positive semi-definite matrices and, therefore, the likelihood for datasets of any size. This opens the possibility of performing likelihood-based inferences for large datasets in animal and plant breeding.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"64 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-025-00991-1","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Likelihood-based inferences such as variance components estimation and hypothesis testing need logarithms of the determinant (log-determinant) of high dimensional matrices. Calculating the log-determinant is memory and time-consuming, making it impossible to perform likelihood-based inferences for large datasets. We presented a method for approximating the log-determinant of positive semi-definite matrices based on repeated matrix–vector products and complex calculus. We tested the approximation of the log-determinant in beef and dairy cattle, chicken, and pig datasets including single and multiple-trait models. Average absolute relative differences between the approximated and exact log-determinant were around 10–3. The approximation was between 2 and 500 times faster than the exact calculation for medium and large matrices. We compared the restricted likelihood with (approximated) and without (exact) the approximation of the log-determinant for different values of heritability for a single-trait model. We also compared estimated variance components using exact expectation–maximization (EM) and average information (AI) REML algorithms, against two derivative-free approaches using the restricted likelihood calculated with the log-determinant approximation. The approximated and exact restricted likelihood showed maxima at the same heritability value. Derivative-free estimation of variance components with the approximated log-determinant converged to the same values as EM and AI-REML. The proposed approach is feasible to apply to any data size. The method presented in this study allows to approximate the log-determinant of positive semi-definite matrices and, therefore, the likelihood for datasets of any size. This opens the possibility of performing likelihood-based inferences for large datasets in animal and plant breeding.

查看原文本刊更多论文

大矩阵行列式对数的蒙特卡罗近似及其在定量遗传学中线性混合模型中的应用

基于似然的推断，如方差成分估计和假设检验，需要对高维矩阵的行列式（log-行列式）取对数。计算对数行列式既占用内存又耗时，因此无法对大型数据集执行基于似然的推断。提出了一种基于重复阵向量积和复微积分的正半定矩阵对数行列式逼近方法。我们在包括单性状和多性状模型的肉牛、奶牛、鸡和猪数据集中测试了对数决定因素的近似值。近似和精确对数行列式之间的平均绝对相对差异约为10-3。这种近似值比中型和大型矩阵的精确计算快2到500倍。我们比较了限制似然与（近似）和不（精确）对数行列式的近似对于单性状模型的不同遗传力值。我们还比较了使用精确期望最大化（EM）和平均信息（AI） REML算法的估计方差成分，以及使用对数行列式近似计算的限制似然的两种无导数方法。近似限制似然和精确限制似然在相同的遗传力值下均达到最大值。用近似的对数行列式对方差分量的无导数估计收敛到与EM和AI-REML相同的值。所提出的方法适用于任何数据大小。本研究中提出的方法允许近似正半定矩阵的对数行列式，因此，任何大小的数据集的可能性。这开启了对动植物育种中的大型数据集进行基于似然的推断的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetics Selection Evolution 生物-奶制品与动物科学

CiteScore

6.50

自引率

9.80%

发文量

审稿时长

1 months

期刊介绍： Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.