Mario Banuelos, Lasith Adhikari, R. Almanza, Andrew Fujikawa, Jonathan Sahagun, Katharine Sanderson, M. Spence, Suzanne S. Sindi, Roummel F. Marcia
{"title":"稀疏基因组变异信号检测的非凸正则化","authors":"Mario Banuelos, Lasith Adhikari, R. Almanza, Andrew Fujikawa, Jonathan Sahagun, Katharine Sanderson, M. Spence, Suzanne S. Sindi, Roummel F. Marcia","doi":"10.1109/MeMeA.2017.7985889","DOIUrl":null,"url":null,"abstract":"Recent research suggests an overwhelming proportion of humans have genomic structural variants (SVs): rearrangements of regions in the genome such as inversions, insertions, deletions and duplications. The standard approach to detecting SVs in an unknown genome involves sequencing paired-reads from the genome in question, mapping them to a reference genome, and analyzing the resulting configuration of fragments for evidence of rearrangements. Because SVs occur relatively infrequently in the human genome, and erroneous read-mappings may suggest the presence of an SV, approaches to SV detection typically suffer from high false-positive rates. Our approach aims to more accurately distinguish true from false SVs in two ways: First, we solve a constrained optimization equation consisting of a negative Poisson log-likelihood objective function with an additive penalty term that promotes sparsity. Second, we analyze multiple related individuals simultaneously and enforce familial constraints. That is, we require any SVs predicted in children to be present in one of their parents. Our problem formulation decreases the false positive rate despite a large amount of error from both DNA sequencing and mapping. By incorporating additional information, we improve our model formulation and increase the accuracy of SV prediction methods.","PeriodicalId":235051,"journal":{"name":"2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Nonconvex regularization for sparse genomic variant signal detection\",\"authors\":\"Mario Banuelos, Lasith Adhikari, R. Almanza, Andrew Fujikawa, Jonathan Sahagun, Katharine Sanderson, M. Spence, Suzanne S. Sindi, Roummel F. Marcia\",\"doi\":\"10.1109/MeMeA.2017.7985889\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent research suggests an overwhelming proportion of humans have genomic structural variants (SVs): rearrangements of regions in the genome such as inversions, insertions, deletions and duplications. The standard approach to detecting SVs in an unknown genome involves sequencing paired-reads from the genome in question, mapping them to a reference genome, and analyzing the resulting configuration of fragments for evidence of rearrangements. Because SVs occur relatively infrequently in the human genome, and erroneous read-mappings may suggest the presence of an SV, approaches to SV detection typically suffer from high false-positive rates. Our approach aims to more accurately distinguish true from false SVs in two ways: First, we solve a constrained optimization equation consisting of a negative Poisson log-likelihood objective function with an additive penalty term that promotes sparsity. Second, we analyze multiple related individuals simultaneously and enforce familial constraints. That is, we require any SVs predicted in children to be present in one of their parents. Our problem formulation decreases the false positive rate despite a large amount of error from both DNA sequencing and mapping. By incorporating additional information, we improve our model formulation and increase the accuracy of SV prediction methods.\",\"PeriodicalId\":235051,\"journal\":{\"name\":\"2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MeMeA.2017.7985889\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MeMeA.2017.7985889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Nonconvex regularization for sparse genomic variant signal detection
Recent research suggests an overwhelming proportion of humans have genomic structural variants (SVs): rearrangements of regions in the genome such as inversions, insertions, deletions and duplications. The standard approach to detecting SVs in an unknown genome involves sequencing paired-reads from the genome in question, mapping them to a reference genome, and analyzing the resulting configuration of fragments for evidence of rearrangements. Because SVs occur relatively infrequently in the human genome, and erroneous read-mappings may suggest the presence of an SV, approaches to SV detection typically suffer from high false-positive rates. Our approach aims to more accurately distinguish true from false SVs in two ways: First, we solve a constrained optimization equation consisting of a negative Poisson log-likelihood objective function with an additive penalty term that promotes sparsity. Second, we analyze multiple related individuals simultaneously and enforce familial constraints. That is, we require any SVs predicted in children to be present in one of their parents. Our problem formulation decreases the false positive rate despite a large amount of error from both DNA sequencing and mapping. By incorporating additional information, we improve our model formulation and increase the accuracy of SV prediction methods.