{"title":"A dedicated database system for handling multi-level data in systems biology.","authors":"Natapol Pornputtapong, Kwanjeera Wanichthanarak, Avlant Nilsson, Intawat Nookaew, Jens Nielsen","doi":"10.1186/1751-0473-9-17","DOIUrl":"https://doi.org/10.1186/1751-0473-9-17","url":null,"abstract":"<p><strong>Background: </strong>Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging.</p><p><strong>Methods: </strong>To overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase.</p><p><strong>Results and conclusion: </strong>In this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 ","pages":"17"},"PeriodicalIF":0.0,"publicationDate":"2014-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-17","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32529048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel J. Park, T. Nguyen-Dumont, Sori Kang, Karin M. Verspoor, B. Pope
{"title":"Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database","authors":"Daniel J. Park, T. Nguyen-Dumont, Sori Kang, Karin M. Verspoor, B. Pope","doi":"10.1186/1751-0473-9-15","DOIUrl":"https://doi.org/10.1186/1751-0473-9-15","url":null,"abstract":"","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"15 - 15"},"PeriodicalIF":0.0,"publicationDate":"2014-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65725248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew I Bellgard, Lee Render, Maciej Radochonski, Adam Hunter
{"title":"Second generation registry framework.","authors":"Matthew I Bellgard, Lee Render, Maciej Radochonski, Adam Hunter","doi":"10.1186/1751-0473-9-14","DOIUrl":"https://doi.org/10.1186/1751-0473-9-14","url":null,"abstract":"<p><strong>Background: </strong>Information management systems are essential to capture data be it for public health and human disease, sustainable agriculture, or plant and animal biosecurity. In public health, the term patient registry is often used to describe information management systems that are used to record and track phenotypic data of patients. Appropriate design, implementation and deployment of patient registries enables rapid decision making and ongoing data mining ultimately leading to improved patient outcomes. A major bottleneck encountered is the static nature of these registries. That is, software developers are required to work with stakeholders to determine requirements, design the system, implement the required data fields and functionality for each patient registry. Additionally, software developer time is required for ongoing maintenance and customisation. It is desirable to deploy a sophisticated registry framework that can allow scientists and registry curators possessing standard computing skills to dynamically construct a complete patient registry from scratch and customise it for their specific needs with little or no need to engage a software developer at any stage.</p><p><strong>Results: </strong>This paper introduces our second generation open source registry framework which builds on our previous rare disease registry framework (RDRF). This second generation RDRF is a new approach as it empowers registry administrators to construct one or more patient registries without software developer effort. New data elements for a diverse range of phenotypic and genotypic measurements can be defined at any time. Defined data elements can then be utilised in any of the created registries. Fine grained, multi-level user and workgroup access can be applied to each data element to ensure appropriate access and data privacy. We introduce the concept of derived data elements to assist the data element standards communities on how they might be best categorised.</p><p><strong>Conclusions: </strong>We introduce the second generation RDRF that enables the user-driven dynamic creation of patient registries. We believe this second generation RDRF is a novel approach to patient registry design, implementation and deployment and a significant advance on existing registry systems.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 ","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2014-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32469550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neda Zamani, Görel Sundström, Marc P Höppner, Manfred G Grabherr
{"title":"Modular and configurable optimal sequence alignment software: Cola.","authors":"Neda Zamani, Görel Sundström, Marc P Höppner, Manfred G Grabherr","doi":"10.1186/1751-0473-9-12","DOIUrl":"https://doi.org/10.1186/1751-0473-9-12","url":null,"abstract":"<p><strong>Background: </strong>The fundamental challenge in optimally aligning homologous sequences is to define a scoring scheme that best reflects the underlying biological processes. Maximising the overall number of matches in the alignment does not always reflect the patterns by which nucleotides mutate. Efficiently implemented algorithms that can be parameterised to accommodate more complex non-linear scoring schemes are thus desirable.</p><p><strong>Results: </strong>We present Cola, alignment software that implements different optimal alignment algorithms, also allowing for scoring contiguous matches of nucleotides in a nonlinear manner. The latter places more emphasis on short, highly conserved motifs, and less on the surrounding nucleotides, which can be more diverged. To illustrate the differences, we report results from aligning 14,100 sequences from 3' untranslated regions of human genes to 25 of their mammalian counterparts, where we found that a nonlinear scoring scheme is more consistent than a linear scheme in detecting short, conserved motifs.</p><p><strong>Conclusions: </strong>Cola is freely available under LPGL from https://github.com/nedaz/cola.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 ","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2014-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-12","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32466634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Izaskun Mallona, Anna Díez-Villanueva, Miguel A Peinado
{"title":"Methylation plotter: a web tool for dynamic visualization of DNA methylation data.","authors":"Izaskun Mallona, Anna Díez-Villanueva, Miguel A Peinado","doi":"10.1186/1751-0473-9-11","DOIUrl":"https://doi.org/10.1186/1751-0473-9-11","url":null,"abstract":"<p><p>Methylation plotter is a Web tool that allows the visualization of methylation data in a user-friendly manner and with publication-ready quality. The user is asked to introduce a file containing the methylation status of a genomic region. This file can contain up to 100 samples and 100 CpGs. Optionally, the user can assign a group for each sample (i.e. whether a sample is a tumoral or normal tissue). After the data upload, the tool produces different graphical representations of the results following the most commonly used styles to display this type of data. They include an interactive plot that summarizes the status of every CpG site and for every sample in lollipop or grid styles. Methylation values ranging from 0 (unmethylated) to 1 (fully methylated) are represented using a gray color gradient. A practical feature of the tool allows the user to choose from different types of arrangement of the samples in the display: for instance, sorting by overall methylation level, by group, by unsupervised clustering or just following the order in which data were entered. In addition to the detailed plot, Methylation plotter produces a methylation profile plot that summarizes the status of the scrutinized region, a boxplot that sums up the differences between groups (if any) and a dendrogram that classifies the data by unsupervised clustering. Coupled with this analysis, descriptive statistics and testing for differences at both CpG and group levels are provided. The implementation is based in R/shiny, providing a highly dynamic user interface that generates quality graphics without the need of writing R code. Methylation plotter is freely available at http://gattaca.imppc.org:3838/methylation_plotter/. </p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 ","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2014-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-11","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32700092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"H3Africa: a tipping point for a revolution in bioinformatics, genomics and health research in Africa.","authors":"Moses P Adoga, Segun A Fatumo, Simon M Agwale","doi":"10.1186/1751-0473-9-10","DOIUrl":"https://doi.org/10.1186/1751-0473-9-10","url":null,"abstract":"<p><strong>Background: </strong>A multi-million dollar research initiative involving the National Institutes of Health (NIH), Wellcome Trust and African scientists has been launched. The initiative, referred to as H3Africa, is an acronym that stands for Human Heredity and Health in Africa. Here, we outline what this initiative is set to achieve and the latest commitments of the key players as at October 2013.</p><p><strong>Findings: </strong>The initiative has so far been awarded over $74 million in research grants. During the first set of awards announced in 2012, the NIH granted $5 million a year for a period of five years, while the Wellcome Trust doled out at least $12 million over the period to the research consortium. This was in addition to Wellcome Trust's provision of administrative support, scientific consultation and advanced training, all in collaboration with the African Society for Human Genetics. In addition, during the second set of awards announced in October 2013, the NIH awarded to the laudable initiative 10 new grants of up to $17 million over the next four years.</p><p><strong>Conclusions: </strong>H3Africa is poised to transform the face of research in genomics, bioinformatics and health in Africa. The capacity of African scientists will be enhanced through training and the better research facilities that will be acquired. Research collaborations between Africa and the West will grow and all stakeholders, including the funding partners, African scientists, scientists across the globe, physicians and patients will be the eventual winners.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 ","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2014-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32344360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard A Erickson, Wayne E Thogmartin, Jennifer A Szymanski
{"title":"BatTool: an R package with GUI for assessing the effect of White-nose syndrome and other take events on Myotis spp. of bats.","authors":"Richard A Erickson, Wayne E Thogmartin, Jennifer A Szymanski","doi":"10.1186/1751-0473-9-9","DOIUrl":"https://doi.org/10.1186/1751-0473-9-9","url":null,"abstract":"<p><strong>Background: </strong>Myotis species of bats such as the Indiana Bat and Little Brown Bat are facing population declines because of White-nose syndrome (WNS). These species also face threats from anthropogenic activities such as wind energy development. Population models may be used to provide insights into threats facing these species. We developed a population model, BatTool, as an R package to help decision makers and natural resource managers examine factors influencing the dynamics of these species. The R package includes two components: 1) a deterministic and stochastic model that are accessible from the command line and 2) a graphical user interface (GUI).</p><p><strong>Results: </strong>BatTool is an R package allowing natural resource managers and decision makers to understand Myotis spp. population dynamics. Through the use of a GUI, the model allows users to understand how WNS and other take events may affect the population. The results are saved both graphically and as data files. Additionally, R-savvy users may access the population functions through the command line and reuse the code as part of future research. This R package could also be used as part of a population dynamics or wildlife management course.</p><p><strong>Conclusions: </strong>BatTool provides access to a Myotis spp. population model. This tool can help natural resource managers and decision makers with the Endangered Species Act deliberations for these species and with issuing take permits as part of regulatory decision making. The tool is available online as part of this publication.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 ","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32447468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuming Chen, Sari S Khaleel, Hongzhan Huang, Cathy H Wu
{"title":"Software for pre-processing Illumina next-generation sequencing short read sequences.","authors":"Chuming Chen, Sari S Khaleel, Hongzhan Huang, Cathy H Wu","doi":"10.1186/1751-0473-9-8","DOIUrl":"https://doi.org/10.1186/1751-0473-9-8","url":null,"abstract":"<p><strong>Background: </strong>When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets.</p><p><strong>Methods: </strong>We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7.</p><p><strong>Results: </strong>Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness.</p><p><strong>Conclusions: </strong>Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 ","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2014-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32447467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesus Enrique Herrera-Galeano, Kenneth G Frey, Regina Z Cer, Alfred J Mateczun, Kimberly A Bishop-Lilly, Vishwesh P Mokashi
{"title":"BLASTPLOT: a PERL module to plot next generation sequencing NCBI-BLAST results.","authors":"Jesus Enrique Herrera-Galeano, Kenneth G Frey, Regina Z Cer, Alfred J Mateczun, Kimberly A Bishop-Lilly, Vishwesh P Mokashi","doi":"10.1186/1751-0473-9-7","DOIUrl":"https://doi.org/10.1186/1751-0473-9-7","url":null,"abstract":"<p><strong>Background: </strong>The development of Next Generation Sequencing (NGS) during the last decade has created an unprecedented amount of sequencing data, as well as the ability to rapidly sequence specimens of interest. Read-based BLAST analysis of NGS data is a common procedure especially in the case of metagenomic samples. However, coverage is usually not enough to allow for de novo assembly. This type of read-based analysis often creates the question of how the reads that align to the same sequence are distributed. The same question applies to preparation of primers or probes for microarray experiments. Although there are several packages that allow the visualization of DNA segments in relation to a reference, in most cases they require the visualization of one reference at a time and the capture of screen shots for each segment. Such a procedure could be tedious and time consuming. The field is in need of a solution that automates the capture of coverage plots for all the segments of interest.</p><p><strong>Results: </strong>We have created BLASTPLOT, a PERL module to quickly plot the BLAST results from short sequences (primers, probes, reads) against reference targets.</p><p><strong>Conclusions: </strong>BLASTPLOT is a simple to use PERL module that allows the generation of PNG graphs for all the reference sequences associated with a BLAST result set.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2014-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32223449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying large sets of unrelated individuals and unrelated markers.","authors":"Kuruvilla Joseph Abraham, Clara Diaz","doi":"10.1186/1751-0473-9-6","DOIUrl":"https://doi.org/10.1186/1751-0473-9-6","url":null,"abstract":"<p><strong>Background: </strong>Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses frequently assume that the participating individuals or animals are mutually unrelated which may not be the case in large samples, leading to erroneous conclusions. In order to retain as much data as possible while minimizing the risk of false positives it is useful to identify a large subset of relatively unrelated individuals in the population. This can be done using a heuristic for finding a large set of independent of nodes in an undirected graph. We describe a fast randomized heuristic for this purpose. The same methodology can also be used for identifying a suitable set of markers for analyzing population stratification, and other instances where a rapid heuristic for maximal independent sets in large graphs is needed.</p><p><strong>Results: </strong>We present FastIndep, a fast random heuristic algorithm for finding a maximal independent set of nodes in an arbitrary undirected graph along with an efficient implementation in C++. On a 64 bit Linux or MacOS platform the execution time is a few minutes, even with a graph of several thousand nodes. The algorithm can discover multiple solutions of the same cardinality. FastIndep can be used to discover unlinked markers, and unrelated individuals in populations.</p><p><strong>Conclusions: </strong>The methods presented here provide a quick and efficient method for identifying sets of unrelated individuals in large populations and unlinked markers in marker panels. The C++ source code and instructions along with utilities for generating the input files in the appropriate format are available at http://taurus.ansci.iastate.edu/wiki/people/jabr/Joseph_Abraham.html.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32181414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}