STRAS:用于全基因组短串联重复序列注释和评分的 snakemake 管道。

IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY
Human Genetics Pub Date : 2024-06-01 Epub Date: 2024-03-20 DOI:10.1007/s00439-024-02662-5
Mengna Zhang
{"title":"STRAS:用于全基因组短串联重复序列注释和评分的 snakemake 管道。","authors":"Mengna Zhang","doi":"10.1007/s00439-024-02662-5","DOIUrl":null,"url":null,"abstract":"<p><p>High-throughput whole genome sequencing (WGS) is clinically used in finding single nucleotide variants and small indels. Several bioinformatics tools are developed to call short tandem repeats (STRs) copy numbers from WGS data, such as ExpansionHunter denovo, GangSTR and HipSTR. However, expansion disorders are rare and it is hard to find candidate expansions in single patient sequencing data with ~ 800,000 STRs calls. In this paper I describe a snakemake pipeline for genome-wide STRs Annotation and Score (STRAS) using a Random Forest (RF) model to predict pathogenicity. The predictor was validated by benchmark data from Clinvar and PUBMED. True positive rate was 93.8%. True negative rate was 98.0%.Precision was 98.6% and recall rate was 93.8%. F1-score was 0.961. Sensitivity was 93.8% and specificity was 99.6%. These results showed STRAS could be a useful tool for clinical researchers to find STR loci of interest and filter out neutral STRs. STRAS is freely available at https://github.com/fancheyu5/STRAS .</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STRAS:a snakemake pipeline for genome-wide short tandem repeats annotation and score.\",\"authors\":\"Mengna Zhang\",\"doi\":\"10.1007/s00439-024-02662-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>High-throughput whole genome sequencing (WGS) is clinically used in finding single nucleotide variants and small indels. Several bioinformatics tools are developed to call short tandem repeats (STRs) copy numbers from WGS data, such as ExpansionHunter denovo, GangSTR and HipSTR. However, expansion disorders are rare and it is hard to find candidate expansions in single patient sequencing data with ~ 800,000 STRs calls. In this paper I describe a snakemake pipeline for genome-wide STRs Annotation and Score (STRAS) using a Random Forest (RF) model to predict pathogenicity. The predictor was validated by benchmark data from Clinvar and PUBMED. True positive rate was 93.8%. True negative rate was 98.0%.Precision was 98.6% and recall rate was 93.8%. F1-score was 0.961. Sensitivity was 93.8% and specificity was 99.6%. These results showed STRAS could be a useful tool for clinical researchers to find STR loci of interest and filter out neutral STRs. STRAS is freely available at https://github.com/fancheyu5/STRAS .</p>\",\"PeriodicalId\":13175,\"journal\":{\"name\":\"Human Genetics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Human Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s00439-024-02662-5\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/3/20 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00439-024-02662-5","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/20 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

高通量全基因组测序(WGS)在临床上用于发现单核苷酸变异和小的嵌合体。目前已开发出几种生物信息学工具,用于从 WGS 数据中调用短串联重复序列(STR)拷贝数,如 ExpansionHunter denovo、GangSTR 和 HipSTR。然而,扩增性疾病非常罕见,很难在约 800,000 个 STRs 调用的单个患者测序数据中找到候选扩增。在本文中,我介绍了一种使用随机森林(RF)模型预测致病性的全基因组 STRs 注释和评分(STRAS)的 snakemake 管道。该预测模型通过来自 Clinvar 和 PUBMED 的基准数据进行了验证。真阳性率为 93.8%。精确率为 98.6%,召回率为 93.8%。F1 分数为 0.961。灵敏度为 93.8%,特异性为 99.6%。这些结果表明,STRAS 是临床研究人员寻找感兴趣的 STR 位点和过滤中性 STR 的有用工具。STRAS 可在 https://github.com/fancheyu5/STRAS 免费获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
STRAS:a snakemake pipeline for genome-wide short tandem repeats annotation and score.

High-throughput whole genome sequencing (WGS) is clinically used in finding single nucleotide variants and small indels. Several bioinformatics tools are developed to call short tandem repeats (STRs) copy numbers from WGS data, such as ExpansionHunter denovo, GangSTR and HipSTR. However, expansion disorders are rare and it is hard to find candidate expansions in single patient sequencing data with ~ 800,000 STRs calls. In this paper I describe a snakemake pipeline for genome-wide STRs Annotation and Score (STRAS) using a Random Forest (RF) model to predict pathogenicity. The predictor was validated by benchmark data from Clinvar and PUBMED. True positive rate was 93.8%. True negative rate was 98.0%.Precision was 98.6% and recall rate was 93.8%. F1-score was 0.961. Sensitivity was 93.8% and specificity was 99.6%. These results showed STRAS could be a useful tool for clinical researchers to find STR loci of interest and filter out neutral STRs. STRAS is freely available at https://github.com/fancheyu5/STRAS .

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Human Genetics
Human Genetics 生物-遗传学
CiteScore
10.80
自引率
3.80%
发文量
94
审稿时长
1 months
期刊介绍: Human Genetics is a monthly journal publishing original and timely articles on all aspects of human genetics. The Journal particularly welcomes articles in the areas of Behavioral genetics, Bioinformatics, Cancer genetics and genomics, Cytogenetics, Developmental genetics, Disease association studies, Dysmorphology, ELSI (ethical, legal and social issues), Evolutionary genetics, Gene expression, Gene structure and organization, Genetics of complex diseases and epistatic interactions, Genetic epidemiology, Genome biology, Genome structure and organization, Genotype-phenotype relationships, Human Genomics, Immunogenetics and genomics, Linkage analysis and genetic mapping, Methods in Statistical Genetics, Molecular diagnostics, Mutation detection and analysis, Neurogenetics, Physical mapping and Population Genetics. Articles reporting animal models relevant to human biology or disease are also welcome. Preference will be given to those articles which address clinically relevant questions or which provide new insights into human biology. Unless reporting entirely novel and unusual aspects of a topic, clinical case reports, cytogenetic case reports, papers on descriptive population genetics, articles dealing with the frequency of polymorphisms or additional mutations within genes in which numerous lesions have already been described, and papers that report meta-analyses of previously published datasets will normally not be accepted. The Journal typically will not consider for publication manuscripts that report merely the isolation, map position, structure, and tissue expression profile of a gene of unknown function unless the gene is of particular interest or is a candidate gene involved in a human trait or disorder.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信