{"title":"一种新的海量数据集相似性搜索的成对序列比对算法。","authors":"Yosef Masoudi-Sobhanzadeh, Yadollah Omidi","doi":"10.1093/bib/bbaf512","DOIUrl":null,"url":null,"abstract":"<p><p>Advances in sequencing technologies have resulted in the production of a huge volume of data. Since the pairwise sequence alignment plays an essential role in comparing sequencing data, various algorithms have been developed. Among the previously suggested algorithms, the basic local alignment search tool (BLAST) is currently employed in a wide range of biological applications, largely due to its low time and memory complexity. However, not only BLAST but also other improved sequence alignment algorithms may fail to produce accurate results, therefore, more efficient algorithms can be highly advantageous. In the present study, we introduce a novel algorithm for sequence alignment (NASA) consisting of preprocessing and aligning steps. In the preprocessing step, the positions of residues are determined within a provided nucleotide or peptide sequence, resulting in seeking only informative regions. In the aligning step, based on a constant number of comparisons, the sequence similarity score is calculated between two sequences in a linear time and memory orders. To evaluate NASA, a large volume of sequencing data was analyzed and the outcomes were compared with other algorithms. The results showed that NASA outperforms other basic algorithms in terms of the elapsed time, required memory, system resource utilization, and alignment score precision. Collectively, NASA might be a promising method for retrieving similar sequences from large datasets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12476838/pdf/","citationCount":"0","resultStr":"{\"title\":\"A novel pairwise sequence alignment algorithm for similarity search in massive datasets.\",\"authors\":\"Yosef Masoudi-Sobhanzadeh, Yadollah Omidi\",\"doi\":\"10.1093/bib/bbaf512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Advances in sequencing technologies have resulted in the production of a huge volume of data. Since the pairwise sequence alignment plays an essential role in comparing sequencing data, various algorithms have been developed. Among the previously suggested algorithms, the basic local alignment search tool (BLAST) is currently employed in a wide range of biological applications, largely due to its low time and memory complexity. However, not only BLAST but also other improved sequence alignment algorithms may fail to produce accurate results, therefore, more efficient algorithms can be highly advantageous. In the present study, we introduce a novel algorithm for sequence alignment (NASA) consisting of preprocessing and aligning steps. In the preprocessing step, the positions of residues are determined within a provided nucleotide or peptide sequence, resulting in seeking only informative regions. In the aligning step, based on a constant number of comparisons, the sequence similarity score is calculated between two sequences in a linear time and memory orders. To evaluate NASA, a large volume of sequencing data was analyzed and the outcomes were compared with other algorithms. The results showed that NASA outperforms other basic algorithms in terms of the elapsed time, required memory, system resource utilization, and alignment score precision. Collectively, NASA might be a promising method for retrieving similar sequences from large datasets.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 5\",\"pages\":\"\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12476838/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf512\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf512","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
A novel pairwise sequence alignment algorithm for similarity search in massive datasets.
Advances in sequencing technologies have resulted in the production of a huge volume of data. Since the pairwise sequence alignment plays an essential role in comparing sequencing data, various algorithms have been developed. Among the previously suggested algorithms, the basic local alignment search tool (BLAST) is currently employed in a wide range of biological applications, largely due to its low time and memory complexity. However, not only BLAST but also other improved sequence alignment algorithms may fail to produce accurate results, therefore, more efficient algorithms can be highly advantageous. In the present study, we introduce a novel algorithm for sequence alignment (NASA) consisting of preprocessing and aligning steps. In the preprocessing step, the positions of residues are determined within a provided nucleotide or peptide sequence, resulting in seeking only informative regions. In the aligning step, based on a constant number of comparisons, the sequence similarity score is calculated between two sequences in a linear time and memory orders. To evaluate NASA, a large volume of sequencing data was analyzed and the outcomes were compared with other algorithms. The results showed that NASA outperforms other basic algorithms in terms of the elapsed time, required memory, system resource utilization, and alignment score precision. Collectively, NASA might be a promising method for retrieving similar sequences from large datasets.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.