{"title":"Maximum Likelihood Quantization of Genomic Features Using Dynamic Programming","authors":"Mingzhou Song, R. Haralick, S. Boissinot","doi":"10.1109/ICMLA.2007.36","DOIUrl":null,"url":null,"abstract":"Dynamic programming is introduced to quantize a continuous random variable into a discrete random variable. Quantization is often useful before statistical analysis or reconstruction of large network models among multiple random variables. The quantization, through dynamic programming, finds the optimal discrete representation of the original probability density function of a random variable by maximizing the likelihood for the observed data. This algorithm is highly applicable to study genomic features such as the recombination rate across the chromosomes and the statistical properties of non-coding elements such as LINE1. In particular, the recombination rate obtained by quantization is studied for LINE1 elements that are grouped also using quantization by length. The exact and density-preserving quantization approach provides an alternative superior to the inexact and distance-based k-means clustering algorithm for discretization of a single variable.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Dynamic programming is introduced to quantize a continuous random variable into a discrete random variable. Quantization is often useful before statistical analysis or reconstruction of large network models among multiple random variables. The quantization, through dynamic programming, finds the optimal discrete representation of the original probability density function of a random variable by maximizing the likelihood for the observed data. This algorithm is highly applicable to study genomic features such as the recombination rate across the chromosomes and the statistical properties of non-coding elements such as LINE1. In particular, the recombination rate obtained by quantization is studied for LINE1 elements that are grouped also using quantization by length. The exact and density-preserving quantization approach provides an alternative superior to the inexact and distance-based k-means clustering algorithm for discretization of a single variable.