GAAP: A GUI-based Genome Assembly and Annotation Package.

IF 1.8 4区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Deepak Singla, Inderjit Singh Yadav
{"title":"GAAP: A GUI-based Genome Assembly and Annotation Package.","authors":"Deepak Singla,&nbsp;Inderjit Singh Yadav","doi":"10.2174/1389202923666220128155537","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Background</i>:</b> Next-generation sequencing (NGS) technologies are being continuously used for high-throughput sequencing data generation that requires easy-to-use GUI-based data analysis software. These kinds of software could be used in-parallel with sequencing for the automatic data analysis. At present, very few software are available for use and most of them are commercial, thus creating a gap between data generation and data analysis. <b><i>Methods</i>:</b> GAAP is developed on the NodeJS platform that uses HTML, JavaScript as the front-end for communication with users. We have implemented FastQC and trimmomatic tool for quality checking and control. Velvet and Prodigal are integrated for genome assembly and gene prediction. The annotation will be done with the help of remote NCBI Blast and IPR-Scan. In the back- end, we have used PERL and JavaScript for the processing of data. To evaluate the performance of GAAP, we have assembled a viral (SRR11621811), bacterial (SRR17153353) and human genome (SRR16845439). <b><i>Results</i>:</b> We have used GAAP software to assemble, and annotate a COVID-19 genome on a desktop computer that resulted in a single contig of 27994bp with 99.57% reference genome coverage. This assembly predicted 11 genes, of which 10 were annotated using annotation module of GAAP. We have also assembled a bacterial and human genome 138 and 194281 contigs with N50 value 100399 and 610, respectively. <b><i>Conclusion</i>:</b> In this study, we have developed freely available, platform-independent genome assembly and annotation (GAAP) software (www.deepaklab.com/gaap). The software itself acts as a complete data analysis package with quality check, quality control, <i>de-novo</i> genome assembly, gene prediction and annotation (Blast, PFAM, GO-Term, pathway and enzyme mapping) modules.</p>","PeriodicalId":10803,"journal":{"name":"Current Genomics","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e7/4f/CG-23-77.PMC9878834.pdf","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1389202923666220128155537","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 2

Abstract

Background: Next-generation sequencing (NGS) technologies are being continuously used for high-throughput sequencing data generation that requires easy-to-use GUI-based data analysis software. These kinds of software could be used in-parallel with sequencing for the automatic data analysis. At present, very few software are available for use and most of them are commercial, thus creating a gap between data generation and data analysis. Methods: GAAP is developed on the NodeJS platform that uses HTML, JavaScript as the front-end for communication with users. We have implemented FastQC and trimmomatic tool for quality checking and control. Velvet and Prodigal are integrated for genome assembly and gene prediction. The annotation will be done with the help of remote NCBI Blast and IPR-Scan. In the back- end, we have used PERL and JavaScript for the processing of data. To evaluate the performance of GAAP, we have assembled a viral (SRR11621811), bacterial (SRR17153353) and human genome (SRR16845439). Results: We have used GAAP software to assemble, and annotate a COVID-19 genome on a desktop computer that resulted in a single contig of 27994bp with 99.57% reference genome coverage. This assembly predicted 11 genes, of which 10 were annotated using annotation module of GAAP. We have also assembled a bacterial and human genome 138 and 194281 contigs with N50 value 100399 and 610, respectively. Conclusion: In this study, we have developed freely available, platform-independent genome assembly and annotation (GAAP) software (www.deepaklab.com/gaap). The software itself acts as a complete data analysis package with quality check, quality control, de-novo genome assembly, gene prediction and annotation (Blast, PFAM, GO-Term, pathway and enzyme mapping) modules.

Abstract Image

GAAP:一个基于gui的基因组组装和注释包。
背景:下一代测序(NGS)技术正在不断用于高通量测序数据生成,这需要易于使用的基于gui的数据分析软件。这些软件可以与测序并行使用,用于自动数据分析。目前可供使用的软件很少,而且大多是商业软件,这就造成了数据生成和数据分析之间的差距。方法:GAAP在NodeJS平台上开发,以HTML、JavaScript为前端与用户通信。我们已经实施了FastQC和trimmomatic工具进行质量检查和控制。将Velvet和Prodigal集成用于基因组组装和基因预测。注释将在远程NCBI Blast和知识产权扫描的帮助下完成。在后台,我们使用PERL和JavaScript对数据进行处理。为了评估GAAP的性能,我们组装了病毒(SRR11621811)、细菌(SRR17153353)和人类基因组(SRR16845439)。结果:我们使用GAAP软件在台式计算机上组装并注释了COVID-19基因组,结果得到27994bp的单个基因组,参考基因组覆盖率为99.57%。该组合预测了11个基因,其中10个基因使用GAAP的注释模块进行了注释。我们还组装了细菌和人类基因组的138和194281个contigs, N50值分别为100399和610。结论:在这项研究中,我们开发了免费的、与平台无关的基因组组装和注释(GAAP)软件(www.deepaklab.com/gaap)。该软件本身作为一个完整的数据分析包,具有质量检查,质量控制,从头基因组组装,基因预测和注释(Blast, PFAM, GO-Term,途径和酶制图)模块。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Current Genomics
Current Genomics 生物-生化与分子生物学
CiteScore
5.20
自引率
0.00%
发文量
29
审稿时长
>0 weeks
期刊介绍: Current Genomics is a peer-reviewed journal that provides essential reading about the latest and most important developments in genome science and related fields of research. Systems biology, systems modeling, machine learning, network inference, bioinformatics, computational biology, epigenetics, single cell genomics, extracellular vesicles, quantitative biology, and synthetic biology for the study of evolution, development, maintenance, aging and that of human health, human diseases, clinical genomics and precision medicine are topics of particular interest. The journal covers plant genomics. The journal will not consider articles dealing with breeding and livestock. Current Genomics publishes three types of articles including: i) Research papers from internationally-recognized experts reporting on new and original data generated at the genome scale level. Position papers dealing with new or challenging methodological approaches, whether experimental or mathematical, are greatly welcome in this section. ii) Authoritative and comprehensive full-length or mini reviews from widely recognized experts, covering the latest developments in genome science and related fields of research such as systems biology, statistics and machine learning, quantitative biology, and precision medicine. Proposals for mini-hot topics (2-3 review papers) and full hot topics (6-8 review papers) guest edited by internationally-recognized experts are welcome in this section. Hot topic proposals should not contain original data and they should contain articles originating from at least 2 different countries. iii) Opinion papers from internationally recognized experts addressing contemporary questions and issues in the field of genome science and systems biology and basic and clinical research practices.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信