{"title":"Copy number variation detection based on constraint least squares","authors":"Xiaopu Wang, Xueqin Wang, Aijun Zhang, Canhong Wen","doi":"10.4310/23-sii814","DOIUrl":null,"url":null,"abstract":"Copy number variations (CNVs) are a form of structural variation of a DNA sequence, including amplification and deletion of a particular DNA segment on chromosomes. Due to the huge amount of data in every DNA sequence, there is a great need for a computationally fast algorithm that accurately identifies CNVs. In this paper, we formulate the detection of CNVs as a constraint least squares problem and show that circular binary segmentation is a greedy approach to solving this problem. To solve this problem with high accuracy and efficiency, we first derived a necessary optimality condition for its solution based on the alternating minimization technique and then developed a computationally efficient algorithm named AMIAS. The performance of our method was tested on both simulated data and two realworld applications using genomic data from diagnosed primal glioblastoma and the HapMap project. Our proposed method has competitive performance in identifying CNVs with high-throughput genotypic data.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"2 4","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Its Interface","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.4310/23-sii814","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Copy number variations (CNVs) are a form of structural variation of a DNA sequence, including amplification and deletion of a particular DNA segment on chromosomes. Due to the huge amount of data in every DNA sequence, there is a great need for a computationally fast algorithm that accurately identifies CNVs. In this paper, we formulate the detection of CNVs as a constraint least squares problem and show that circular binary segmentation is a greedy approach to solving this problem. To solve this problem with high accuracy and efficiency, we first derived a necessary optimality condition for its solution based on the alternating minimization technique and then developed a computationally efficient algorithm named AMIAS. The performance of our method was tested on both simulated data and two realworld applications using genomic data from diagnosed primal glioblastoma and the HapMap project. Our proposed method has competitive performance in identifying CNVs with high-throughput genotypic data.
期刊介绍:
Exploring the interface between the field of statistics and other disciplines, including but not limited to: biomedical sciences, geosciences, computer sciences, engineering, and social and behavioral sciences. Publishes high-quality articles in broad areas of statistical science, emphasizing substantive problems, sound statistical models and methods, clear and efficient computational algorithms, and insightful discussions of the motivating problems.