Benchmarking the Base Randomization Algorithm as a Possible Tool for the Initial Step of Generating a Virtual RNA Aptamers Library.

IF 3.1 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
BioTech Pub Date : 2025-09-12 DOI:10.3390/biotech14030072
Kabelo P Mokgopa, Shina D Oloniiju, Kevin A Lobb, Tendamudzimu Tshiwawa
{"title":"Benchmarking the Base Randomization Algorithm as a Possible Tool for the Initial Step of Generating a Virtual RNA Aptamers Library.","authors":"Kabelo P Mokgopa, Shina D Oloniiju, Kevin A Lobb, Tendamudzimu Tshiwawa","doi":"10.3390/biotech14030072","DOIUrl":null,"url":null,"abstract":"<p><p>While databases are emerging across various domains, from small molecules to genomics and proteins, aptamer databases remain scarce, if not entirely absent. Such databases could serve as a comprehensive resource for advancing research, innovation, and the applications of aptamer technology across multiple fields. This advancement would likely lead to improvements in healthcare, environmental monitoring, and biotechnology. Furthermore, the establishment of aptamer databases would facilitate molecular modelling and machine learning, opening doors to further advancements in understanding and utilizing aptamers. Against this backdrop, in this study, we present and benchmark the Base Randomization Algorithm (BRA) as a potential solution to the scarcity of aptamer databases. Through statistical analysis, we examine key factors such as minimum free energy (MFE), base compositions, and base arrangements. Notably, sequences generated using the BRA exhibit a Gaussian distribution pattern. We also examine the details of how each base within a sequence is chosen using mathematical principles, ensuring that the sequences are valid and optimized statistically. Additionally, we explore how the length of the randomized generated sequences can affect the folding of their structures at both the secondary and tertiary levels. Based on composition analysis, we propose that the base mean of the dataset can be approximated as x¯B≈Px × N, for dataset of sequences with the same length and x¯B≈Px × M, where M is the median and N the mean, for a dataset with randomized length that follows a Gaussian distribution.</p>","PeriodicalId":34490,"journal":{"name":"BioTech","volume":"14 3","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12452754/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BioTech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/biotech14030072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

While databases are emerging across various domains, from small molecules to genomics and proteins, aptamer databases remain scarce, if not entirely absent. Such databases could serve as a comprehensive resource for advancing research, innovation, and the applications of aptamer technology across multiple fields. This advancement would likely lead to improvements in healthcare, environmental monitoring, and biotechnology. Furthermore, the establishment of aptamer databases would facilitate molecular modelling and machine learning, opening doors to further advancements in understanding and utilizing aptamers. Against this backdrop, in this study, we present and benchmark the Base Randomization Algorithm (BRA) as a potential solution to the scarcity of aptamer databases. Through statistical analysis, we examine key factors such as minimum free energy (MFE), base compositions, and base arrangements. Notably, sequences generated using the BRA exhibit a Gaussian distribution pattern. We also examine the details of how each base within a sequence is chosen using mathematical principles, ensuring that the sequences are valid and optimized statistically. Additionally, we explore how the length of the randomized generated sequences can affect the folding of their structures at both the secondary and tertiary levels. Based on composition analysis, we propose that the base mean of the dataset can be approximated as x¯B≈Px × N, for dataset of sequences with the same length and x¯B≈Px × M, where M is the median and N the mean, for a dataset with randomized length that follows a Gaussian distribution.

基准随机化算法作为生成虚拟RNA适体库初始步骤的可能工具。
虽然从小分子到基因组学和蛋白质等各个领域都出现了数据库,但适体数据库仍然很少,如果不是完全没有的话。这种数据库可以作为促进研究、创新和跨多个领域应用适宜技术的综合资源。这一进步可能会改善医疗保健、环境监测和生物技术。此外,适体数据库的建立将促进分子建模和机器学习,为进一步了解和利用适体打开大门。在此背景下,在本研究中,我们提出并对基本随机化算法(BRA)进行基准测试,作为适体数据库稀缺的潜在解决方案。通过统计分析,我们考察了最小自由能(MFE)、碱基组成和碱基排列等关键因素。值得注意的是,使用BRA生成的序列呈现高斯分布模式。我们还研究了如何使用数学原理选择序列中的每个碱基的细节,以确保序列是有效的并在统计上进行了优化。此外,我们还探讨了随机生成序列的长度如何影响其二级和三级结构的折叠。在组成分析的基础上,我们提出数据集的基均值可以近似为x¯B≈Px × N,对于相同长度序列的数据集,x¯B≈Px × M,其中M为中位数,N为均值,对于随机长度服从高斯分布的数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BioTech
BioTech Immunology and Microbiology-Applied Microbiology and Biotechnology
CiteScore
3.70
自引率
0.00%
发文量
51
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信