An Evolutionary Statistics Toolkit for Simplified Sequence Analysis on Web with Client-Side Processing

Alper Karagöl, Taner Karagöl
{"title":"An Evolutionary Statistics Toolkit for Simplified Sequence Analysis on Web with Client-Side Processing","authors":"Alper Karagöl, Taner Karagöl","doi":"10.1101/2024.08.01.606148","DOIUrl":null,"url":null,"abstract":"We present the Evolutionary Statistics Toolkit, a user-friendly web-based platform designed for specialized analysis of genetic sequences, which integrates multiple evolutionary statistics. The toolkit focuses on a selection of specialized tools, including Tajima's D calculator with Site Frequency Spectrum (SFS), Shannon's Entropy (H), alignment re-formatting, HGSV to FASTA conversion, pair-wise frequency analysis, FASTA to SEQRES, RNA 2D structure alignment, Kyte-Doolittle hydrophilicity plot tool and kurtosis coefficient calculator. Tajima's D is calculated using the reference formula: D = (π - θ<sub>W</sub>)/sqrt(V<sub>D</sub>), where π corresponds to the average number of differences, θ<sub>W</sub> is Watterson's estimator of θ, and V<sub>D</sub> is the variance of π - θ<sub>W</sub>. Shannon's Entropy is defined as H = -∑ p<sub>i</sub>* log<sub>2</sub>(p<sub>i</sub>), where p<sub>i</sub> is the probability of occurrence of each unique character (nucleotide or amino acid) in the sequence. The toolkit facilitates streamlined workflows for early researchers in evolutionary biology, genomics, and related fields. With comparing with existing codes, we propose it also emerges as an educational interactive website for beginners in evolutionary statistics. The source code for each tool in the toolkit is available through GitHub links provided on the website. This open-source approach allows users to inspect the code, suggest improvements, or further adapt the tools for their specific usage and research needs. This article describes the functionalities, and validation of each tool within the platform, along with comparison with accessible existing statistical utilities. The toolkit is freely accessible on: https://www.alperkaragol.com/toolkit","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.01.606148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We present the Evolutionary Statistics Toolkit, a user-friendly web-based platform designed for specialized analysis of genetic sequences, which integrates multiple evolutionary statistics. The toolkit focuses on a selection of specialized tools, including Tajima's D calculator with Site Frequency Spectrum (SFS), Shannon's Entropy (H), alignment re-formatting, HGSV to FASTA conversion, pair-wise frequency analysis, FASTA to SEQRES, RNA 2D structure alignment, Kyte-Doolittle hydrophilicity plot tool and kurtosis coefficient calculator. Tajima's D is calculated using the reference formula: D = (π - θW)/sqrt(VD), where π corresponds to the average number of differences, θW is Watterson's estimator of θ, and VD is the variance of π - θW. Shannon's Entropy is defined as H = -∑ pi* log2(pi), where pi is the probability of occurrence of each unique character (nucleotide or amino acid) in the sequence. The toolkit facilitates streamlined workflows for early researchers in evolutionary biology, genomics, and related fields. With comparing with existing codes, we propose it also emerges as an educational interactive website for beginners in evolutionary statistics. The source code for each tool in the toolkit is available through GitHub links provided on the website. This open-source approach allows users to inspect the code, suggest improvements, or further adapt the tools for their specific usage and research needs. This article describes the functionalities, and validation of each tool within the platform, along with comparison with accessible existing statistical utilities. The toolkit is freely accessible on: https://www.alperkaragol.com/toolkit
利用客户端处理简化网络序列分析的进化统计工具包
我们推出了进化统计工具包(Evolutionary Statistics Toolkit),这是一个用户友好型网络平台,专为遗传序列的专业分析而设计,集成了多种进化统计功能。该工具包侧重于精选的专业工具,包括田岛D计算器与位点频率谱(SFS)、香农熵(H)、排列重格式化、HGSV到FASTA转换、成对频率分析、FASTA到SEQRES、RNA二维结构排列、Kyte-Doolittle亲水性绘图工具和峰度系数计算器。田岛 D 使用参考公式计算:D = (π - θW)/sqrt(VD),其中 π 对应于平均差异数,θW 是 Watterson 对 θ 的估计值,VD 是 π - θW 的方差。香农熵的定义为 H = -∑ pi* log2(pi),其中 pi 是序列中每个唯一特征(核苷酸或氨基酸)出现的概率。该工具包为进化生物学、基因组学及相关领域的早期研究人员简化了工作流程。与现有代码相比,我们建议该工具包还可作为进化统计初学者的教育互动网站。工具包中每个工具的源代码都可以通过网站上提供的 GitHub 链接获取。这种开源方法允许用户检查代码、提出改进建议或进一步调整工具,以满足其特定的使用和研究需求。本文介绍了该平台中每个工具的功能和验证,以及与现有统计工具的比较。该工具包可在以下网址免费访问: https://www.alperkaragol.com/toolkit
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信