稀疏基因组变异信号检测的非凸正则化

2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA) Pub Date : 2017-05-01 DOI:10.1109/MeMeA.2017.7985889

Mario Banuelos, Lasith Adhikari, R. Almanza, Andrew Fujikawa, Jonathan Sahagun, Katharine Sanderson, M. Spence, Suzanne S. Sindi, Roummel F. Marcia

{"title":"稀疏基因组变异信号检测的非凸正则化","authors":"Mario Banuelos, Lasith Adhikari, R. Almanza, Andrew Fujikawa, Jonathan Sahagun, Katharine Sanderson, M. Spence, Suzanne S. Sindi, Roummel F. Marcia","doi":"10.1109/MeMeA.2017.7985889","DOIUrl":null,"url":null,"abstract":"Recent research suggests an overwhelming proportion of humans have genomic structural variants (SVs): rearrangements of regions in the genome such as inversions, insertions, deletions and duplications. The standard approach to detecting SVs in an unknown genome involves sequencing paired-reads from the genome in question, mapping them to a reference genome, and analyzing the resulting configuration of fragments for evidence of rearrangements. Because SVs occur relatively infrequently in the human genome, and erroneous read-mappings may suggest the presence of an SV, approaches to SV detection typically suffer from high false-positive rates. Our approach aims to more accurately distinguish true from false SVs in two ways: First, we solve a constrained optimization equation consisting of a negative Poisson log-likelihood objective function with an additive penalty term that promotes sparsity. Second, we analyze multiple related individuals simultaneously and enforce familial constraints. That is, we require any SVs predicted in children to be present in one of their parents. Our problem formulation decreases the false positive rate despite a large amount of error from both DNA sequencing and mapping. By incorporating additional information, we improve our model formulation and increase the accuracy of SV prediction methods.","PeriodicalId":235051,"journal":{"name":"2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Nonconvex regularization for sparse genomic variant signal detection\",\"authors\":\"Mario Banuelos, Lasith Adhikari, R. Almanza, Andrew Fujikawa, Jonathan Sahagun, Katharine Sanderson, M. Spence, Suzanne S. Sindi, Roummel F. Marcia\",\"doi\":\"10.1109/MeMeA.2017.7985889\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent research suggests an overwhelming proportion of humans have genomic structural variants (SVs): rearrangements of regions in the genome such as inversions, insertions, deletions and duplications. The standard approach to detecting SVs in an unknown genome involves sequencing paired-reads from the genome in question, mapping them to a reference genome, and analyzing the resulting configuration of fragments for evidence of rearrangements. Because SVs occur relatively infrequently in the human genome, and erroneous read-mappings may suggest the presence of an SV, approaches to SV detection typically suffer from high false-positive rates. Our approach aims to more accurately distinguish true from false SVs in two ways: First, we solve a constrained optimization equation consisting of a negative Poisson log-likelihood objective function with an additive penalty term that promotes sparsity. Second, we analyze multiple related individuals simultaneously and enforce familial constraints. That is, we require any SVs predicted in children to be present in one of their parents. Our problem formulation decreases the false positive rate despite a large amount of error from both DNA sequencing and mapping. By incorporating additional information, we improve our model formulation and increase the accuracy of SV prediction methods.\",\"PeriodicalId\":235051,\"journal\":{\"name\":\"2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MeMeA.2017.7985889\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MeMeA.2017.7985889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

最近的研究表明，绝大多数人类都有基因组结构变异(SVs):基因组中区域的重排，如倒置、插入、缺失和重复。在未知基因组中检测SVs的标准方法包括对有问题的基因组进行配对测序，将其映射到参考基因组，并分析由此产生的片段结构，以寻找重排的证据。由于SV在人类基因组中出现的频率相对较低，错误的读取映射可能表明存在SV，因此检测SV的方法通常存在较高的假阳性率。我们的方法旨在通过两种方式更准确地区分真假SVs:首先，我们求解一个由负泊松对数似然目标函数组成的约束优化方程，该函数具有促进稀疏性的加性惩罚项。其次，我们同时分析多个相关个体，并实施家族约束。也就是说，我们需要在孩子身上预测到的任何SVs都存在于他们的父母中。我们的问题配方降低了假阳性率，尽管DNA测序和图谱绘制都存在大量错误。通过加入额外的信息，我们改进了我们的模型公式，提高了SV预测方法的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Nonconvex regularization for sparse genomic variant signal detection

Recent research suggests an overwhelming proportion of humans have genomic structural variants (SVs): rearrangements of regions in the genome such as inversions, insertions, deletions and duplications. The standard approach to detecting SVs in an unknown genome involves sequencing paired-reads from the genome in question, mapping them to a reference genome, and analyzing the resulting configuration of fragments for evidence of rearrangements. Because SVs occur relatively infrequently in the human genome, and erroneous read-mappings may suggest the presence of an SV, approaches to SV detection typically suffer from high false-positive rates. Our approach aims to more accurately distinguish true from false SVs in two ways: First, we solve a constrained optimization equation consisting of a negative Poisson log-likelihood objective function with an additive penalty term that promotes sparsity. Second, we analyze multiple related individuals simultaneously and enforce familial constraints. That is, we require any SVs predicted in children to be present in one of their parents. Our problem formulation decreases the false positive rate despite a large amount of error from both DNA sequencing and mapping. By incorporating additional information, we improve our model formulation and increase the accuracy of SV prediction methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Symposium on Medical Measurements and Applications (MeMeA)

自引率

0.00%

发文量