Ning Hao, Yue Selena Niu, Feifei Xiao, Heping Zhang
{"title":"A super scalable algorithm for short segment detection.","authors":"Ning Hao, Yue Selena Niu, Feifei Xiao, Heping Zhang","doi":"10.1007/s12561-020-09278-z","DOIUrl":null,"url":null,"abstract":"<p><p>In many applications such as copy number variant (CNV) detection, the goal is to identify short segments on which the observations have different means or medians from the background. Those segments are usually short and hidden in a long sequence, and hence are very challenging to find. We study a super scalable short segment (4S) detection algorithm in this paper. This nonparametric method clusters the locations where the observations exceed a threshold for segment detection. It is computationally efficient and does not rely on Gaussian noise assumption. Moreover, we develop a framework to assign significance levels for detected segments. We demonstrate the advantages of our proposed method by theoretical, simulation, and real data studies.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":"13 1","pages":"18-33"},"PeriodicalIF":0.4000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s12561-020-09278-z","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Biosciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12561-020-09278-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/4/18 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 1
Abstract
In many applications such as copy number variant (CNV) detection, the goal is to identify short segments on which the observations have different means or medians from the background. Those segments are usually short and hidden in a long sequence, and hence are very challenging to find. We study a super scalable short segment (4S) detection algorithm in this paper. This nonparametric method clusters the locations where the observations exceed a threshold for segment detection. It is computationally efficient and does not rely on Gaussian noise assumption. Moreover, we develop a framework to assign significance levels for detected segments. We demonstrate the advantages of our proposed method by theoretical, simulation, and real data studies.
期刊介绍:
Statistics in Biosciences (SIBS) is published three times a year in print and electronic form. It aims at development and application of statistical methods and their interface with other quantitative methods, such as computational and mathematical methods, in biological and life science, health science, and biopharmaceutical and biotechnological science.
SIBS publishes scientific papers and review articles in four sections, with the first two sections as the primary sections. Original Articles publish novel statistical and quantitative methods in biosciences. The Bioscience Case Studies and Practice Articles publish papers that advance statistical practice in biosciences, such as case studies, innovative applications of existing methods that further understanding of subject-matter science, evaluation of existing methods and data sources. Review Articles publish papers that review an area of statistical and quantitative methodology, software, and data sources in biosciences. Commentaries provide perspectives of research topics or policy issues that are of current quantitative interest in biosciences, reactions to an article published in the journal, and scholarly essays. Substantive science is essential in motivating and demonstrating the methodological development and use for an article to be acceptable. Articles published in SIBS share the goal of promoting evidence-based real world practice and policy making through effective and timely interaction and communication of statisticians and quantitative researchers with subject-matter scientists in biosciences.