{"title":"TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data.","authors":"Lindsay V Clark, Erik J Sacks","doi":"10.1186/s13029-016-0057-7","DOIUrl":"https://doi.org/10.1186/s13029-016-0057-7","url":null,"abstract":"<p><strong>Background: </strong>In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult.</p><p><strong>Results: </strong>We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files.</p><p><strong>Conclusions: </strong>TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0057-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34662025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new tool for prioritization of sequence variants from whole exome sequencing data.","authors":"Brigitte Glanzmann, Hendri Herbst, Craig J Kinnear, Marlo Möller, Junaid Gamieldien, Soraya Bardien","doi":"10.1186/s13029-016-0056-8","DOIUrl":"https://doi.org/10.1186/s13029-016-0056-8","url":null,"abstract":"<p><strong>Background: </strong>Whole exome sequencing (WES) has provided a means for researchers to gain access to a highly enriched subset of the human genome in which to search for variants that are likely to be pathogenic and possibly provide important insights into disease mechanisms. In developing countries, bioinformatics capacity and expertise is severely limited and wet bench scientists are required to take on the challenging task of understanding and implementing the barrage of bioinformatics tools that are available to them.</p><p><strong>Results: </strong>We designed a novel method for the filtration of WES data called TAPER™ (Tool for Automated selection and Prioritization for Efficient Retrieval of sequence variants).</p><p><strong>Conclusions: </strong>TAPER™ implements a set of logical steps by which to prioritize candidate variants that could be associated with disease and this is aimed for implementation in biomedical laboratories with limited bioinformatics capacity. TAPER™ is free, can be setup on a Windows operating system (from Windows 7 and above) and does not require any programming knowledge. In summary, we have developed a freely available tool that simplifies variant prioritization from WES data in order to facilitate discovery of disease-causing genes.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0056-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34634488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Log::ProgramInfo: A Perl module to collect and log data for bioinformatics pipelines.","authors":"John M Macdonald, Paul C Boutros","doi":"10.1186/s13029-016-0055-9","DOIUrl":"10.1186/s13029-016-0055-9","url":null,"abstract":"<p><strong>Background: </strong>To reproduce and report a bioinformatics analysis, it is important to be able to determine the environment in which a program was run. It can also be valuable when trying to debug why different executions are giving unexpectedly different results.</p><p><strong>Results: </strong>Log::ProgramInfo is a Perl module that writes a log file at the termination of execution of the enclosing program, to document useful execution characteristics. This log file can be used to re-create the environment in order to reproduce an earlier execution. It can also be used to compare the environments of two executions to determine whether there were any differences that might affect (or explain) their operation.</p><p><strong>Availability: </strong>The source is available on CPAN (Macdonald and Boutros, Log-ProgramInfo. http://search.cpan.org/~boutroslb/Log-ProgramInfo/).</p><p><strong>Conclusion: </strong>Using Log::ProgramInfo in programs creating result data for publishable research, and including the Log::ProgramInfo output log as part of the publication of that research is a valuable method to assist others to duplicate the programming environment as a precursor to validating and/or extending that research.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2016-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4919834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34613905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caleb F Davis, Deborah I Ritter, David A Wheeler, Hongmei Wang, Yan Ding, Shannon P Dugan, Matthew N Bainbridge, Donna M Muzny, Pulivarthi H Rao, Tsz-Kwong Man, Sharon E Plon, Richard A Gibbs, Ching C Lau
{"title":"SV-STAT accurately detects structural variation via alignment to reference-based assemblies.","authors":"Caleb F Davis, Deborah I Ritter, David A Wheeler, Hongmei Wang, Yan Ding, Shannon P Dugan, Matthew N Bainbridge, Donna M Muzny, Pulivarthi H Rao, Tsz-Kwong Man, Sharon E Plon, Richard A Gibbs, Ching C Lau","doi":"10.1186/s13029-016-0051-0","DOIUrl":"https://doi.org/10.1186/s13029-016-0051-0","url":null,"abstract":"<p><strong>Background: </strong>Genomic deletions, inversions, and other rearrangements known collectively as structural variations (SVs) are implicated in many human disorders. Technologies for sequencing DNA provide a potentially rich source of information in which to detect breakpoints of structural variations at base-pair resolution. However, accurate prediction of SVs remains challenging, and existing informatics tools predict rearrangements with significant rates of false positives or negatives.</p><p><strong>Results: </strong>To address this challenge, we developed 'Structural Variation detection by STAck and Tail' (SV-STAT) which implements a novel scoring metric. The software uses this statistic to quantify evidence for structural variation in genomic regions suspected of harboring rearrangements. To demonstrate SV-STAT, we used targeted and genome-wide approaches. First, we applied a custom capture array followed by Roche/454 and SV-STAT to three pediatric B-lineage acute lymphoblastic leukemias, identifying five structural variations joining known and novel breakpoint regions. Next, we detected SVs genome-wide in paired-end Illumina data collected from additional tumor samples. SV-STAT showed predictive accuracy as high as or higher than leading alternatives. The software is freely available under the terms of the GNU General Public License version 3 at https://gitorious.org/svstat/svstat.</p><p><strong>Conclusions: </strong>SV-STAT works across multiple sequencing chemistries, paired and single-end technologies, targeted or whole-genome strategies, and it complements existing SV-detection software. The method is a significant advance towards accurate detection and genotyping of genomic rearrangements from DNA sequencing data.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2016-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0051-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34601128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Gabriella de Oliveira Sardinha, Ceres Nunes de Resende Oyama, Armando de Mendonça Maroja, Ivan F Costa
{"title":"Implementation and clinical application of a deformation method for fast simulation of biological tissue formed by fibers and fluid.","authors":"Ana Gabriella de Oliveira Sardinha, Ceres Nunes de Resende Oyama, Armando de Mendonça Maroja, Ivan F Costa","doi":"10.1186/s13029-016-0054-x","DOIUrl":"https://doi.org/10.1186/s13029-016-0054-x","url":null,"abstract":"<p><strong>Background: </strong>The aim of this paper is to provide a general discussion, algorithm, and actual working programs of the deformation method for fast simulation of biological tissue formed by fibers and fluid. In order to demonstrate the benefit of the clinical applications software, we successfully used our computational program to deform a 3D breast image acquired from patients, using a 3D scanner, in a real hospital environment.</p><p><strong>Results: </strong>The method implements a quasi-static solution for elastic global deformations of objects. Each pair of vertices of the surface is connected and defines an elastic fiber. The set of all the elastic fibers defines a mesh of smaller size than the volumetric meshes, allowing for simulation of complex objects with less computational effort. The behavior similar to the stress tensor is obtained by the volume conservation equation that mixes the 3D coordinates. Step by step, we show the computational implementation of this approach.</p><p><strong>Conclusions: </strong>As an example, a 2D rectangle formed by only 4 vertices is solved and, for this simple geometry, all intermediate results are shown. On the other hand, actual implementations of these ideas in the form of working computer routines are provided for general 3D objects, including a clinical application.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2016-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0054-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34312216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MM2S: personalized diagnosis of medulloblastoma patients and model systems.","authors":"Deena M A Gendoo, Benjamin Haibe-Kains","doi":"10.1186/s13029-016-0053-y","DOIUrl":"https://doi.org/10.1186/s13029-016-0053-y","url":null,"abstract":"<p><strong>Background: </strong>Medulloblastoma (MB) is a highly malignant and heterogeneous brain tumour that is the most common cause of cancer-related deaths in children. Increasing availability of genomic data over the last decade had resulted in improvement of human subtype classification methods, and the parallel development of MB mouse models towards identification of subtype-specific disease origins and signaling pathways. Despite these advances, MB classification schemes remained inadequate for personalized prediction of MB subtypes for individual patient samples and across model systems. To address this issue, we developed the Medullo-Model to Subtypes ( MM2S ) classifier, a new method enabling classification of individual gene expression profiles from MB samples (patient samples, mouse models, and cell lines) against well-established molecular subtypes [Genomics 106:96-106, 2015]. We demonstrated the accuracy and flexibility of MM2S in the largest meta-analysis of human patients and mouse models to date. Here, we present a new functional package that provides an easy-to-use and fully documented implementation of the MM2S method, with additional functionalities that allow users to obtain graphical and tabular summaries of MB subtype predictions for single samples and across sample replicates. The flexibility of the MM2S package promotes incorporation of MB predictions into large Medulloblastoma-driven analysis pipelines, making this tool suitable for use by researchers.</p><p><strong>Results: </strong>The MM2S package is applied in two case studies involving human primary patient samples, as well as sample replicates of the GTML mouse model. We highlight functions that are of use for species-specific MB classification, across individual samples and sample replicates. We emphasize on the range of functions that can be used to derive both singular and meta-centric views of MB predictions, across samples and across MB subtypes.</p><p><strong>Conclusions: </strong>Our MM2S package can be used to generate predictions without having to rely on an external web server or additional sources. Our open-source package facilitates and extends the MM2S algorithm in diverse computational and bioinformatics contexts. The package is available on CRAN, at the following URL: https://cran.r-project.org/web/packages/MM2S/, as well as on Github at the following URLs: https://github.com/DGendoo and https://github.com/bhklab.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0053-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34307296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A flexible tool to plot a genomic map for single nucleotide polymorphisms","authors":"Fuquan Zhang","doi":"10.1186/s13029-016-0052-z","DOIUrl":"https://doi.org/10.1186/s13029-016-0052-z","url":null,"abstract":"","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0052-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65752531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samson S Kiware, Tanya L Russell, Zacharia J Mtema, Alpha D Malishee, Prosper Chaki, Dickson Lwetoijera, Javan Chanda, Dingani Chinula, Silas Majambere, John E Gimnig, Thomas A Smith, Gerry F Killeen
{"title":"A generic schema and data collection forms applicable to diverse entomological studies of mosquitoes.","authors":"Samson S Kiware, Tanya L Russell, Zacharia J Mtema, Alpha D Malishee, Prosper Chaki, Dickson Lwetoijera, Javan Chanda, Dingani Chinula, Silas Majambere, John E Gimnig, Thomas A Smith, Gerry F Killeen","doi":"10.1186/s13029-016-0050-1","DOIUrl":"10.1186/s13029-016-0050-1","url":null,"abstract":"<p><strong>Background: </strong>Standardized schemas, databases, and public data repositories are needed for the studies of malaria vectors that encompass a remarkably diverse array of designs and rapidly generate large data volumes, often in resource-limited tropical settings lacking specialized software or informatics support.</p><p><strong>Results: </strong>Data from the majority of mosquito studies conformed to a generic schema, with data collection forms recording the experimental design, sorting of collections, details of sample pooling or subdivision, and additional observations. Generically applicable forms with standardized attribute definitions enabled rigorous, consistent data and sample management with generic software and minimal expertise. Forms use now includes 20 experiments, 8 projects, and 15 users at 3 research and control institutes in 3 African countries, resulting in 11 peer-reviewed publications.</p><p><strong>Conclusion: </strong>We have designed generic data schema that can be used to develop paper or electronic based data collection forms depending on the availability of resources. We have developed paper-based data collection forms that can be used to collect data from majority of entomological studies across multiple study areas using standardized data formats. Data recorded on these forms with standardized formats can be entered and linked with any relational database software. These informatics tools are recommended because they ensure that medical entomologists save time, improve data quality, and data collected and shared across multiple studies is in standardized formats hence increasing research outputs.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"11 ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2016-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4809029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9832699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conserved antigenic sites between MERS-CoV and Bat-coronavirus are revealed through sequence analysis","authors":"Refat Sharmin, A. B. Islam","doi":"10.1186/s13029-016-0049-7","DOIUrl":"https://doi.org/10.1186/s13029-016-0049-7","url":null,"abstract":"","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0049-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65752489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of the Rank-Weighted Co-localization (RWC) algorithm in multiple image analysis platforms for quantitative analysis of microscopy images","authors":"Vasanth R. Singan, J. Simpson","doi":"10.1186/s13029-016-0048-8","DOIUrl":"https://doi.org/10.1186/s13029-016-0048-8","url":null,"abstract":"","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-016-0048-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65752472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}