A linear-time algorithm that avoids inverses and computes Jackknife (leave-one-out) products like convolutions or other operators in commutative semigroups.

IF 1.5 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Algorithms for Molecular Biology Pub Date : 2020-09-19 eCollection Date: 2020-01-01 DOI:10.1186/s13015-020-00178-x

John L Spouge, Joseph M Ziegelbauer, Mileidy Gonzalez

{"title":"A linear-time algorithm that avoids inverses and computes Jackknife (leave-one-out) products like convolutions or other operators in commutative semigroups.","authors":"John L Spouge, Joseph M Ziegelbauer, Mileidy Gonzalez","doi":"10.1186/s13015-020-00178-x","DOIUrl":null,"url":null,"abstract":"Background: Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given <math><mi>n</mi></math> elements <math> <mrow><msub><mi>g</mi> <mn>0</mn></msub> <mo>,</mo> <msub><mi>g</mi> <mn>1</mn></msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub><mi>g</mi> <mrow><mi>n</mi> <mo>-</mo> <mn>1</mn></mrow> </msub> </mrow> </math> in a set <math><mi>G</mi></math> with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products <math> <mrow> <msub> <mover><mrow><mi>g</mi></mrow> <mrow><mo>¯</mo></mrow> </mover> <mi>j</mi></msub> <mo>=</mo> <msub><mi>g</mi> <mn>0</mn></msub> <msub><mi>g</mi> <mn>1</mn></msub> <mo>⋯</mo> <msub><mi>g</mi> <mrow><mi>j</mi> <mo>-</mo> <mn>1</mn></mrow> </msub> <msub><mi>g</mi> <mrow><mi>j</mi> <mo>+</mo> <mn>1</mn></mrow> </msub> <mo>⋯</mo> <msub><mi>g</mi> <mrow><mi>n</mi> <mo>-</mo> <mn>1</mn></mrow> </msub> </mrow> </math> ( <math><mrow><mn>0</mn> <mo>≤</mo> <mi>j</mi> <mo><</mo> <mi>n</mi></mrow> </math> ).Results: This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like <math> <mrow><msub><mi>g</mi> <mfenced><mrow><mi>i</mi> <mo>,</mo> <mi>j</mi></mrow> </mfenced> </msub> <mo>=</mo> <msub><mi>g</mi> <mi>i</mi></msub> <msub><mi>g</mi> <mrow><mi>i</mi> <mo>+</mo> <mn>1</mn></mrow> </msub> <mo>⋯</mo> <msub><mi>g</mi> <mrow><mi>j</mi> <mo>-</mo> <mn>1</mn></mrow> </msub> </mrow> </math> ; its novel downward phase mirrors the upward phase while exploiting the symmetry of <math><msub><mi>g</mi> <mi>j</mi></msub> </math> and its complement <math> <msub> <mover><mrow><mi>g</mi></mrow> <mrow><mo>¯</mo></mrow> </mover> <mi>j</mi></msub> </math> . The algorithm requires storage for <math><mrow><mn>2</mn> <mi>n</mi></mrow> </math> elements of <math><mi>G</mi></math> and only about <math><mrow><mn>3</mn> <mi>n</mi></mrow> </math> products. In contrast, the standard segment tree algorithms require about <math><mi>n</mi></math> products for construction and <math> <mrow><msub><mo>log</mo> <mn>2</mn></msub> <mi>n</mi></mrow> </math> products for calculating each <math> <msub> <mover><mrow><mi>g</mi></mrow> <mrow><mo>¯</mo></mrow> </mover> <mi>j</mi></msub> </math> , i.e., about <math><mrow><mi>n</mi> <msub><mo>log</mo> <mn>2</mn></msub> <mi>n</mi></mrow> </math> products in total; and a naïve quadratic algorithm using <math><mrow><mi>n</mi> <mo>-</mo> <mn>2</mn></mrow> </math> element-by-element products to compute each <math> <msub> <mover><mrow><mi>g</mi></mrow> <mrow><mo>¯</mo></mrow> </mover> <mi>j</mi></msub> </math> requires <math><mrow><mi>n</mi> <mfenced><mrow><mi>n</mi> <mo>-</mo> <mn>2</mn></mrow> </mfenced> </mrow> </math> products.Conclusions: In the herpesvirus application, the Jackknife Product algorithm required 15 min; standard segment tree algorithms would have taken an estimated 3 h; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"15 ","pages":"17"},"PeriodicalIF":1.5000,"publicationDate":"2020-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-020-00178-x","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-020-00178-x","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given $n$ elements $g_{0}, g_{1}, \dots, g_{n - 1}$ in a set $G$ with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products ${\bar{g}}_{j} = g_{0} g_{1} \dots g_{j - 1} g_{j + 1} \dots g_{n - 1}$ ( $0 \leq j < n$ ).

Results: This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like $g_{(i, j)} = g_{i} g_{i + 1} \dots g_{j - 1}$ ; its novel downward phase mirrors the upward phase while exploiting the symmetry of $g_{j}$ and its complement ${\bar{g}}_{j}$ . The algorithm requires storage for $2 n$ elements of $G$ and only about $3 n$ products. In contrast, the standard segment tree algorithms require about $n$ products for construction and ${log}_{2} n$ products for calculating each ${\bar{g}}_{j}$ , i.e., about $n {log}_{2} n$ products in total; and a naïve quadratic algorithm using $n - 2$ element-by-element products to compute each ${\bar{g}}_{j}$ requires $n (n - 2)$ products.

Conclusions: In the herpesvirus application, the Jackknife Product algorithm required 15 min; standard segment tree algorithms would have taken an estimated 3 h; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.

Abstract Image

查看原文本刊更多论文

一种线性时间算法，它避免了逆运算，并计算可交换半群中的折刀(留一)积，如卷积或其他算子。

背景:关于人环状rna上疱疹病毒microRNA基序的数据提出了以下统计学问题。考虑独立的随机计数，不一定是相同分布的。以总和为条件，决定其中一个计数是否异常大。p值的精确计算导致了一个特定的算法问题。给定集合g中的n个元素g 0, g 1，…，g n - 1，具有闭包性和结合性以及无逆的交换积，计算折刀(留一)积g¯j = g 0 g 1⋯g j - 1 g j + 1⋯g n - 1(0≤j n)。结果:给出了一种线性时间折刀积算法。它的向上阶段构建了一个标准的分段树，用于计算分段乘积，如g1, j = g i g i + 1⋯g j - 1;它新颖的向下相位反映了向上相位，同时利用了gj和它的补函数g¯j的对称性。该算法只需要存储2n个G元素和3n个乘积。相比之下，标准的段树算法需要大约n个乘积来构建，计算每个g¯j需要大约log 2n个乘积，即总共需要大约n log 2n个乘积;而naïve二次算法使用n - 2个逐元素乘积来计算每个g¯j需要n - n - 2个乘积。结论:在疱疹病毒应用中，Jackknife Product算法需要15 min;标准的片段树算法估计需要3小时;而二次算法，估计需要1个月。折刀产品算法在生物信息学和统计学中有许多可能的用途。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Algorithms for Molecular Biology 生物-生化研究方法

CiteScore

2.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.