{"title":"Identification of significant SNPs and the quantification of correlation using genomic informational field theory (GIFT)","authors":"Scott Gadsby , Cyril Rauch , Jonathan A D Wattis","doi":"10.1016/j.mbs.2025.109606","DOIUrl":null,"url":null,"abstract":"<div><div>Given data on genotypes and phenotypes from a sample population, we show how ordering the data by phenotype and analysing the information contained in the corresponding list of genotypes can identify those SNPs which have a significant correlation with phenotype. We derive formulae for <em>p</em>-values to quantify the significance of each SNP, and show how to analyse the correlations <em>between</em> different SNPs. As well as using classical covariance and correlations, we introduce an information-theoretic measure of correlation which is based on Shannon’s informational entropy. This variational formulation also gives rise to other ways of determining the strength of a SNP’s influence on phenotype in a biallelic population using ‘field’ functions which account for the relationship between phenotype and genotype. By computing this field for each SNP, we are able to quantify the correlations between SNPs. The results are shown to depend on the number of each genostate (aa, Aa and AA) in the population in a predictable manner. The methods are illustrated using data on horse height.</div></div>","PeriodicalId":51119,"journal":{"name":"Mathematical Biosciences","volume":"393 ","pages":"Article 109606"},"PeriodicalIF":1.8000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0025556425002329","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/10 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Given data on genotypes and phenotypes from a sample population, we show how ordering the data by phenotype and analysing the information contained in the corresponding list of genotypes can identify those SNPs which have a significant correlation with phenotype. We derive formulae for p-values to quantify the significance of each SNP, and show how to analyse the correlations between different SNPs. As well as using classical covariance and correlations, we introduce an information-theoretic measure of correlation which is based on Shannon’s informational entropy. This variational formulation also gives rise to other ways of determining the strength of a SNP’s influence on phenotype in a biallelic population using ‘field’ functions which account for the relationship between phenotype and genotype. By computing this field for each SNP, we are able to quantify the correlations between SNPs. The results are shown to depend on the number of each genostate (aa, Aa and AA) in the population in a predictable manner. The methods are illustrated using data on horse height.
期刊介绍:
Mathematical Biosciences publishes work providing new concepts or new understanding of biological systems using mathematical models, or methodological articles likely to find application to multiple biological systems. Papers are expected to present a major research finding of broad significance for the biological sciences, or mathematical biology. Mathematical Biosciences welcomes original research articles, letters, reviews and perspectives.