更快的计算左有界最短唯一子串。

IF 1.7 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Algorithms for Molecular Biology Pub Date : 2025-06-20 DOI:10.1186/s13015-025-00287-5

Larissa L M Aguiar, Felipe A Louza

{"title":"更快的计算左有界最短唯一子串。","authors":"Larissa L M Aguiar, Felipe A Louza","doi":"10.1186/s13015-025-00287-5","DOIUrl":null,"url":null,"abstract":"Finding shortest unique substrings (SUS) is a fundamental problem in string processing with applications in bioinformatics. In this paper, we present an algorithm for solving a variant of the SUS problem, the left-bounded shortest unique substrings (LSUS). This variant is particularly important in applications such as PCR primer design. Our algorithm runs in O(n) time using 2n memory words plus n bytes for an input string of length n. Experimental results with real and artificial datasets show that our algorithm is the fastest alternative in practice, being two times faster (on the average) than related works, while using a similar peak memory footprint.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"11"},"PeriodicalIF":1.7000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12181909/pdf/","citationCount":"0","resultStr":"{\"title\":\"Faster computation of left-bounded shortest unique substrings.\",\"authors\":\"Larissa L M Aguiar, Felipe A Louza\",\"doi\":\"10.1186/s13015-025-00287-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding shortest unique substrings (SUS) is a fundamental problem in string processing with applications in bioinformatics. In this paper, we present an algorithm for solving a variant of the SUS problem, the left-bounded shortest unique substrings (LSUS). This variant is particularly important in applications such as PCR primer design. Our algorithm runs in O(n) time using 2n memory words plus n bytes for an input string of length n. Experimental results with real and artificial datasets show that our algorithm is the fastest alternative in practice, being two times faster (on the average) than related works, while using a similar peak memory footprint.\",\"PeriodicalId\":50823,\"journal\":{\"name\":\"Algorithms for Molecular Biology\",\"volume\":\"20 1\",\"pages\":\"11\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12181909/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms for Molecular Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13015-025-00287-5\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-025-00287-5","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

寻找最短唯一子串（SUS）是生物信息学中字符串处理的一个基本问题。本文提出了一种求解SUS问题的变体——左有界最短唯一子串（LSUS）的算法。这种变体在PCR引物设计等应用中尤为重要。对于长度为n的输入字符串，我们的算法使用2n个存储字加上n个字节，在O(n)时间内运行。真实和人工数据集的实验结果表明，我们的算法在实践中是最快的替代方案，在使用相似的峰值内存占用时，（平均）比相关工作快两倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Faster computation of left-bounded shortest unique substrings.

查看原文本刊更多论文

Faster computation of left-bounded shortest unique substrings.

Finding shortest unique substrings (SUS) is a fundamental problem in string processing with applications in bioinformatics. In this paper, we present an algorithm for solving a variant of the SUS problem, the left-bounded shortest unique substrings (LSUS). This variant is particularly important in applications such as PCR primer design. Our algorithm runs in O(n) time using 2n memory words plus n bytes for an input string of length n. Experimental results with real and artificial datasets show that our algorithm is the fastest alternative in practice, being two times faster (on the average) than related works, while using a similar peak memory footprint.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Algorithms for Molecular Biology 生物-生化研究方法

CiteScore

2.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.