N Long, D Gianola, G J M Rosa, K A Weigel, S Avendano
{"title":"Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers.","authors":"N Long, D Gianola, G J M Rosa, K A Weigel, S Avendano","doi":"10.1159/000317279","DOIUrl":null,"url":null,"abstract":"<p><p>In genome-wide association studies using single nucleotide polymorphisms (SNPs), typically thousands of SNPs are genotyped, whereas the number of phenotypes for which there is genomic information may be smaller. Atwo-step SNP (feature) selection method was developed, which consisted of filtering (using information gain), and wrapping (using naïve Bayesian classification). This was based on discretization of the continuous phenotypic values. The method was applied to chick early mortality rates (0-14 days of age) on progeny from 201 sires in a commercial broiler line, with the goal of identifying SNPs (over 5000) related to progeny mortality. Sires were clustered into two groups, low and high, according to two arbitrarily chosen mortality rate thresholds. By varying these thresholds, 11 different \"case-control\" samples were formed, and the SNP selection procedure was applied to each sample. To compare the 11 sets of chosen SNPs, predicted residual sum of squares (PRESS)from a linear model was used. Naive Bayesian classification accuracy was improved over the case without feature selection (from 50% to 90%). Seventeen SNPs in the best case-control group (with smallest PRESS) accounted for 31% of the variance among sire family mortality rates.</p>","PeriodicalId":11190,"journal":{"name":"Developments in biologicals","volume":"132 ","pages":"373-376"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Developments in biologicals","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1159/000317279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In genome-wide association studies using single nucleotide polymorphisms (SNPs), typically thousands of SNPs are genotyped, whereas the number of phenotypes for which there is genomic information may be smaller. Atwo-step SNP (feature) selection method was developed, which consisted of filtering (using information gain), and wrapping (using naïve Bayesian classification). This was based on discretization of the continuous phenotypic values. The method was applied to chick early mortality rates (0-14 days of age) on progeny from 201 sires in a commercial broiler line, with the goal of identifying SNPs (over 5000) related to progeny mortality. Sires were clustered into two groups, low and high, according to two arbitrarily chosen mortality rate thresholds. By varying these thresholds, 11 different "case-control" samples were formed, and the SNP selection procedure was applied to each sample. To compare the 11 sets of chosen SNPs, predicted residual sum of squares (PRESS)from a linear model was used. Naive Bayesian classification accuracy was improved over the case without feature selection (from 50% to 90%). Seventeen SNPs in the best case-control group (with smallest PRESS) accounted for 31% of the variance among sire family mortality rates.