Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation.

Q2 Agricultural and Biological Sciences
Genomics and Informatics Pub Date : 2022-06-01 Epub Date: 2022-06-30 DOI:10.5808/gi.22033
Jaeyong Yee, Taesung Park, Mira Park
{"title":"Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation.","authors":"Jaeyong Yee,&nbsp;Taesung Park,&nbsp;Mira Park","doi":"10.5808/gi.22033","DOIUrl":null,"url":null,"abstract":"<p><p>Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"20 2","pages":"e17"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9299569/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5808/gi.22033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/6/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.

Abstract Image

Abstract Image

Abstract Image

利用基于熵的核密度估计鉴定基因与数量性状之间的关联。
遗传关联已经用一些统计方法进行了量化。基于熵的互信息可能是估计关联的更直接的方法之一,因为它不依赖于参数化。为此,需要同时获得表型分布的熵和条件熵。然而,数量特征通常不允许熵的精确评估。熵的估计需要一个概率密度函数,这个概率密度函数可以用核密度估计近似。为了计算互信息,我们研究了将核密度估计和熵估计与概率密度函数相结合的适当程序顺序。构建基因型及其相互作用,为条件熵设置条件。使用三种类型的生成函数创建的大量模拟数据使用两种不同的核以及两种类型的多因素降维和另一种称为m-spacing的概率密度近似方法进行了分析。比较正确检出率方面的统计能力。当性状分布比简单的正态分布或伽玛分布更复杂时,使用核函数是最有用的。利用2小时口服葡萄糖耐量试验结果和γ-谷氨酰转肽酶水平作为表型,研究了一个完整的基因组数据集,以确定两者之间的关联。发现了与这些表型相关的明显可区分的单核苷酸多态性(SNP)和相互作用的SNP对,并列出了经验p值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genomics and Informatics
Genomics and Informatics Agricultural and Biological Sciences-Ecology, Evolution, Behavior and Systematics
CiteScore
1.90
自引率
0.00%
发文量
0
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信