{"title":"利用客户端处理简化网络序列分析的进化统计工具包","authors":"Alper Karagöl, Taner Karagöl","doi":"10.1101/2024.08.01.606148","DOIUrl":null,"url":null,"abstract":"We present the Evolutionary Statistics Toolkit, a user-friendly web-based platform designed for specialized analysis of genetic sequences, which integrates multiple evolutionary statistics. The toolkit focuses on a selection of specialized tools, including Tajima's D calculator with Site Frequency Spectrum (SFS), Shannon's Entropy (H), alignment re-formatting, HGSV to FASTA conversion, pair-wise frequency analysis, FASTA to SEQRES, RNA 2D structure alignment, Kyte-Doolittle hydrophilicity plot tool and kurtosis coefficient calculator. Tajima's D is calculated using the reference formula: D = (π - θ<sub>W</sub>)/sqrt(V<sub>D</sub>), where π corresponds to the average number of differences, θ<sub>W</sub> is Watterson's estimator of θ, and V<sub>D</sub> is the variance of π - θ<sub>W</sub>. Shannon's Entropy is defined as H = -∑ p<sub>i</sub>* log<sub>2</sub>(p<sub>i</sub>), where p<sub>i</sub> is the probability of occurrence of each unique character (nucleotide or amino acid) in the sequence. The toolkit facilitates streamlined workflows for early researchers in evolutionary biology, genomics, and related fields. With comparing with existing codes, we propose it also emerges as an educational interactive website for beginners in evolutionary statistics. The source code for each tool in the toolkit is available through GitHub links provided on the website. This open-source approach allows users to inspect the code, suggest improvements, or further adapt the tools for their specific usage and research needs. This article describes the functionalities, and validation of each tool within the platform, along with comparison with accessible existing statistical utilities. The toolkit is freely accessible on: https://www.alperkaragol.com/toolkit","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Evolutionary Statistics Toolkit for Simplified Sequence Analysis on Web with Client-Side Processing\",\"authors\":\"Alper Karagöl, Taner Karagöl\",\"doi\":\"10.1101/2024.08.01.606148\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present the Evolutionary Statistics Toolkit, a user-friendly web-based platform designed for specialized analysis of genetic sequences, which integrates multiple evolutionary statistics. The toolkit focuses on a selection of specialized tools, including Tajima's D calculator with Site Frequency Spectrum (SFS), Shannon's Entropy (H), alignment re-formatting, HGSV to FASTA conversion, pair-wise frequency analysis, FASTA to SEQRES, RNA 2D structure alignment, Kyte-Doolittle hydrophilicity plot tool and kurtosis coefficient calculator. Tajima's D is calculated using the reference formula: D = (π - θ<sub>W</sub>)/sqrt(V<sub>D</sub>), where π corresponds to the average number of differences, θ<sub>W</sub> is Watterson's estimator of θ, and V<sub>D</sub> is the variance of π - θ<sub>W</sub>. Shannon's Entropy is defined as H = -∑ p<sub>i</sub>* log<sub>2</sub>(p<sub>i</sub>), where p<sub>i</sub> is the probability of occurrence of each unique character (nucleotide or amino acid) in the sequence. The toolkit facilitates streamlined workflows for early researchers in evolutionary biology, genomics, and related fields. With comparing with existing codes, we propose it also emerges as an educational interactive website for beginners in evolutionary statistics. The source code for each tool in the toolkit is available through GitHub links provided on the website. This open-source approach allows users to inspect the code, suggest improvements, or further adapt the tools for their specific usage and research needs. This article describes the functionalities, and validation of each tool within the platform, along with comparison with accessible existing statistical utilities. The toolkit is freely accessible on: https://www.alperkaragol.com/toolkit\",\"PeriodicalId\":501307,\"journal\":{\"name\":\"bioRxiv - Bioinformatics\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.08.01.606148\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.01.606148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Evolutionary Statistics Toolkit for Simplified Sequence Analysis on Web with Client-Side Processing
We present the Evolutionary Statistics Toolkit, a user-friendly web-based platform designed for specialized analysis of genetic sequences, which integrates multiple evolutionary statistics. The toolkit focuses on a selection of specialized tools, including Tajima's D calculator with Site Frequency Spectrum (SFS), Shannon's Entropy (H), alignment re-formatting, HGSV to FASTA conversion, pair-wise frequency analysis, FASTA to SEQRES, RNA 2D structure alignment, Kyte-Doolittle hydrophilicity plot tool and kurtosis coefficient calculator. Tajima's D is calculated using the reference formula: D = (π - θW)/sqrt(VD), where π corresponds to the average number of differences, θW is Watterson's estimator of θ, and VD is the variance of π - θW. Shannon's Entropy is defined as H = -∑ pi* log2(pi), where pi is the probability of occurrence of each unique character (nucleotide or amino acid) in the sequence. The toolkit facilitates streamlined workflows for early researchers in evolutionary biology, genomics, and related fields. With comparing with existing codes, we propose it also emerges as an educational interactive website for beginners in evolutionary statistics. The source code for each tool in the toolkit is available through GitHub links provided on the website. This open-source approach allows users to inspect the code, suggest improvements, or further adapt the tools for their specific usage and research needs. This article describes the functionalities, and validation of each tool within the platform, along with comparison with accessible existing statistical utilities. The toolkit is freely accessible on: https://www.alperkaragol.com/toolkit