Source Code for Biology and Medicine最新文献

2DKD: a toolkit for content-based local image search. 2DKD：基于内容的本地图像搜索工具包。

Source Code for Biology and Medicine Pub Date : 2020-02-10 eCollection Date: 2020-01-01 DOI: 10.1186/s13029-020-0077-1

Julian S DeVille, Daisuke Kihara, Atilla Sit

{"title":"2DKD: a toolkit for content-based local image search.","authors":"Julian S DeVille, Daisuke Kihara, Atilla Sit","doi":"10.1186/s13029-020-0077-1","DOIUrl":"10.1186/s13029-020-0077-1","url":null,"abstract":"Background: Direct comparison of 2D images is computationally inefficient due to the need for translation, rotation, and scaling of the images to evaluate their similarity. In many biological applications, such as digital pathology and cryo-EM, often identifying specific local regions of images is of particular interest. Therefore, finding invariant descriptors that can efficiently retrieve local image patches or subimages becomes necessary.Results: We present a software package called Two-Dimensional Krawtchouk Descriptors that allows to perform local subimage search in 2D images. The new toolkit uses only a small number of invariant descriptors per image for efficient local image retrieval. This enables querying an image and comparing similar patterns locally across a potentially large database. We show that these descriptors appear to be useful for searching local patterns or small particles in images and demonstrate some test cases that can be helpful for both assembly software developers and their users.Conclusions: Local image comparison and subimage search can prove cumbersome in both computational complexity and runtime, due to factors such as the rotation, scaling, and translation of the object in question. By using the 2DKD toolkit, relatively few descriptors are developed to describe a given image, and this can be achieved with minimal memory usage.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"15 ","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2020-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7011505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37649148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computing and graphing probability values of pearson distributions: a SAS/IML macro. 计算和绘制皮尔逊分布的概率值:一个SAS/IML宏。

Source Code for Biology and Medicine Pub Date : 2019-12-20 eCollection Date: 2019-01-01 DOI: 10.1186/s13029-019-0076-2

Qing Yang, Xinming An, Wei Pan

{"title":"Computing and graphing probability values of pearson distributions: a SAS/IML macro.","authors":"Qing Yang, Xinming An, Wei Pan","doi":"10.1186/s13029-019-0076-2","DOIUrl":"https://doi.org/10.1186/s13029-019-0076-2","url":null,"abstract":"Background: Any empirical data can be approximated to one of Pearson distributions using the first four moments of the data (Elderton WP, Johnson NL. Systems of Frequency Curves. 1969; Pearson K. Philos Trans R Soc Lond Ser A. 186:343-414 1895; Solomon H, Stephens MA. J Am Stat Assoc. 73(361):153-60 1978). Thus, Pearson distributions made statistical analysis possible for data with unknown distributions. There are both extant, old-fashioned in-print tables (Pearson ES, Hartley HO. Biometrika Tables for Statisticians, vol. II. 1972) and contemporary computer programs (Amos DE, Daniel SL. Tables of percentage points of standardized pearson distributions. 1971; Bouver H, Bargmann RE. Tables of the standardized percentage points of the pearson system of curves in terms of β 1 and β 2. 1974; Bowman KO, Shenton LR. Biometrika. 66(1):147-51 1979; Davis CS, Stephens MA. Appl Stat. 32(3):322-7 1983; Pan W. J Stat Softw. 31(Code Snippet 2):1-6 2009) available for obtaining percentage points of Pearson distributions corresponding to certain pre-specified percentages (or probability values; e.g., 1.0%, 2.5%, 5.0%, etc.), but they are little useful in statistical analysis because we have to rely on unwieldy second difference interpolation to calculate a probability value of a Pearson distribution corresponding to a given percentage point, such as an observed test statistic in hypothesis testing.Results: The present study develops a SAS/IML macro program to identify the appropriate type of Pearson distribution based on either input of dataset or the values of four moments and then compute and graph probability values of Pearson distributions for any given percentage points.Conclusions: The SAS macro program returns accurate approximations to Pearson distributions and can efficiently facilitate researchers to conduct statistical analysis on data with unknown distributions.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"14 ","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2019-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-019-0076-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37503171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

iPBAvizu: a PyMOL plugin for an efficient 3D protein structure superimposition approach iPBAvizu：用于高效3D蛋白质结构叠加方法的PyMOL插件

Source Code for Biology and Medicine Pub Date : 2019-11-02 DOI: 10.1186/s13029-019-0075-3

Guilhem Faure, A. Joseph, Pierrick Craveur, T. Narwani, N. Srinivasan, Jean-Christophe Gelly, Joseph Rebehmed, A. D. de Brevern

引用次数: 17

Social support for collaboration and group awareness in life science research teams. 生命科学研究团队协作与团队意识的社会支持

Source Code for Biology and Medicine Pub Date : 2019-07-08 eCollection Date: 2019-01-01 DOI: 10.1186/s13029-019-0074-4

Delfina Malandrino, Ilaria Manno, Alberto Negro, Andrea Petta, Luigi Serra, Concita Cantarella, Vittorio Scarano

{"title":"Social support for collaboration and group awareness in life science research teams.","authors":"Delfina Malandrino, Ilaria Manno, Alberto Negro, Andrea Petta, Luigi Serra, Concita Cantarella, Vittorio Scarano","doi":"10.1186/s13029-019-0074-4","DOIUrl":"10.1186/s13029-019-0074-4","url":null,"abstract":"Background: Next-generation sequencing (NGS) technologies have revolutionarily reshaped the landscape of '-omics' research areas. They produce a plethora of information requiring specific knowledge in sample preparation, analysis and characterization. Additionally, expertise and competencies are required when using bioinformatics tools and methods for efficient analysis, interpretation, and visualization of data. These skills are rarely covered in a single laboratory. More often the samples are isolated and purified in a first laboratory, sequencing is performed by a private company or a specialized lab, while the produced data are analyzed by a third group of researchers. In this scenario, the support, the communication, and the information sharing among researchers represent the key points to build a common knowledge and to meet the project objectives.Results: We present ElGalaxy, a system designed and developed to support collaboration and information sharing among researchers. Specifically, we integrated collaborative functionalities within an application usually adopted by Life Science researchers. ElGalaxy, therefore, is the result of the integration of Galaxy, i.e., a Workflow Management System, with Elgg, i.e., a Social Network Engine.Conclusions: ElGalaxy enables scientists, that work on the same experiment, to collaborate and share information, to discuss about methods, and to evaluate results of the individual steps, as well as of entire activities, performed during their experiments. ElGalaxy also allows a greater team awareness, especially when experiments are carried out with researchers which belong to different and distributed research centers.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6615102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46694800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MZPAQ: a FASTQ data compression tool. MZPAQ: FASTQ数据压缩工具。

Source Code for Biology and Medicine Pub Date : 2019-06-03 eCollection Date: 2019-01-01 DOI: 10.1186/s13029-019-0073-5

Achraf El Allali, Mariam Arshad

{"title":"MZPAQ: a FASTQ data compression tool.","authors":"Achraf El Allali, Mariam Arshad","doi":"10.1186/s13029-019-0073-5","DOIUrl":"https://doi.org/10.1186/s13029-019-0073-5","url":null,"abstract":"Background: Due to the technological progress in Next Generation Sequencing (NGS), the amount of genomic data that is produced daily has seen a tremendous increase. This increase has shifted the bottleneck of genomic projects from sequencing to computation and specifically storing, managing and analyzing the large amount of NGS data. Compression tools can reduce the physical storage used to save large amount of genomic data as well as the bandwidth used to transfer this data. Recently, DNA sequence compression has gained much attention among researchers.Results: In this paper, we study different techniques and algorithms used to compress genomic data. Most of these techniques take advantage of some properties that are unique to DNA sequences in order to improve the compression rate, and usually perform better than general-purpose compressors. By exploring the performance of available algorithms, we produce a powerful compression tool for NGS data called MZPAQ. Results show that MZPAQ outperforms state-of-the-art tools on all benchmark datasets obtained from a recent survey in terms of compression ratio. MZPAQ offers the best compression ratios regardless of the sequencing platform or the size of the data.Conclusions: Currently, MZPAQ's strength is its higher compression ratio as well as its compatibility with all major sequencing platforms. MZPAQ is more suitable when the size of compressed data is crucial, such as long-term storage and data transfer. More efforts will be made in the future to target other aspects such as compression speed and memory utilization.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"14 ","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-019-0073-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37308076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

IPCAPS: an R package for iterative pruning to capture population structure. IPCAPS:一个R包迭代修剪捕捉人口结构。

Source Code for Biology and Medicine Pub Date : 2019-03-20 eCollection Date: 2019-01-01 DOI: 10.1186/s13029-019-0072-6

Kridsadakorn Chaichoompu, Fentaw Abegaz, Sissades Tongsima, Philip James Shaw, Anavaj Sakuntabhai, Luísa Pereira, Kristel Van Steen

{"title":"IPCAPS: an R package for iterative pruning to capture population structure.","authors":"Kridsadakorn Chaichoompu, Fentaw Abegaz, Sissades Tongsima, Philip James Shaw, Anavaj Sakuntabhai, Luísa Pereira, Kristel Van Steen","doi":"10.1186/s13029-019-0072-6","DOIUrl":"https://doi.org/10.1186/s13029-019-0072-6","url":null,"abstract":"Background: Resolving population genetic structure is challenging, especially when dealing with closely related or geographically confined populations. Although Principal Component Analysis (PCA)-based methods and genomic variation with single nucleotide polymorphisms (SNPs) are widely used to describe shared genetic ancestry, improvements can be made especially when fine-scale population structure is the target.Results: This work presents an R package called IPCAPS, which uses SNP information for resolving possibly fine-scale population structure. The IPCAPS routines are built on the iterative pruning Principal Component Analysis (ipPCA) framework that systematically assigns individuals to genetically similar subgroups. In each iteration, our tool is able to detect and eliminate outliers, hereby avoiding severe misclassification errors.Conclusions: IPCAPS supports different measurement scales for variables used to identify substructure. Hence, panels of gene expression and methylation data can be accommodated as well. The tool can also be applied in patient sub-phenotyping contexts. IPCAPS is developed in R and is freely available from http://bio3.giga.ulg.ac.be/ipcaps.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"14 ","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2019-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-019-0072-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37111284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

eUTOPIA: solUTion for Omics data PreprocessIng and Analysis. eUTOPIA：Omics 数据预处理和分析解决方案。

Source Code for Biology and Medicine Pub Date : 2019-01-29 eCollection Date: 2019-01-01 DOI: 10.1186/s13029-019-0071-7

Veer Singh Marwah, Giovanni Scala, Pia Anneli Sofia Kinaret, Angela Serra, Harri Alenius, Vittorio Fortino, Dario Greco

{"title":"eUTOPIA: solUTion for Omics data PreprocessIng and Analysis.","authors":"Veer Singh Marwah, Giovanni Scala, Pia Anneli Sofia Kinaret, Angela Serra, Harri Alenius, Vittorio Fortino, Dario Greco","doi":"10.1186/s13029-019-0071-7","DOIUrl":"10.1186/s13029-019-0071-7","url":null,"abstract":"Background: Application of microarrays in omics technologies enables quantification of many biomolecules simultaneously. It is widely applied to observe the positive or negative effect on biomolecule activity in perturbed versus the steady state by quantitative comparison. Community resources, such as Bioconductor and CRAN, host tools based on R language that have become standard for high-throughput analytics. However, application of these tools is technically challenging for generic users and require specific computational skills. There is a need for intuitive and easy-to-use platform to process omics data, visualize, and interpret results.Results: We propose an integrated software solution, eUTOPIA, that implements a set of essential processing steps as a guided workflow presented to the user as an R Shiny application.Conclusions: eUTOPIA allows researchers to perform preprocessing and analysis of microarray data via a simple and intuitive graphical interface while using state of the art methods.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"14 ","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6352382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36937294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ProSave: an application for restoring quantitative data to manipulated subsets of protein lists. ProSave:用于将定量数据恢复到被操纵的蛋白质列表子集的应用程序。

Source Code for Biology and Medicine Pub Date : 2018-11-12 eCollection Date: 2018-01-01 DOI: 10.1186/s13029-018-0070-0

Daniel A Machlab, Gabriel Velez, Alexander G Bassuk, Vinit B Mahajan

{"title":"ProSave: an application for restoring quantitative data to manipulated subsets of protein lists.","authors":"Daniel A Machlab, Gabriel Velez, Alexander G Bassuk, Vinit B Mahajan","doi":"10.1186/s13029-018-0070-0","DOIUrl":"https://doi.org/10.1186/s13029-018-0070-0","url":null,"abstract":"Background: In proteomics studies, liquid chromatography tandem mass spectrometry data (LC-MS/MS) is quantified by spectral counts or by some measure of ion abundance. Downstream comparative analysis of protein content (e.g. Venn diagrams and network analysis) typically does not include this quantitative data and critical information is often lost. To avoid loss of spectral count data in comparative proteomic analyses, it is critical to implement a tool that can rapidly retrieve this information.Results: We developed ProSave, a free and user-friendly Java-based program that retrieves spectral count data from a curated list of proteins in a large proteomics dataset. ProSave allows for the management of LC-MS/MS datasets and rapidly retrieves spectral count information for a desired list of proteins.Conclusions: ProSave is open source and freely available at https://github.com/MahajanLab/ProSave. The user manual, implementation notes, and description of methodology and examples are available on the site.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"13 ","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2018-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-018-0070-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36691465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Simulating pedigrees ascertained for multiple disease-affected relatives. 模拟多个患病亲属的家谱。

Source Code for Biology and Medicine Pub Date : 2018-10-15 eCollection Date: 2018-01-01 DOI: 10.1186/s13029-018-0069-6

Christina Nieuwoudt, Samantha J Jones, Angela Brooks-Wilson, Jinko Graham

{"title":"Simulating pedigrees ascertained for multiple disease-affected relatives.","authors":"Christina Nieuwoudt, Samantha J Jones, Angela Brooks-Wilson, Jinko Graham","doi":"10.1186/s13029-018-0069-6","DOIUrl":"https://doi.org/10.1186/s13029-018-0069-6","url":null,"abstract":"Background: Studies that ascertain families containing multiple relatives affected by disease can be useful for identification of causal, rare variants from next-generation sequencing data.Results: We present the R package SimRVPedigree, which allows researchers to simulate pedigrees ascertained on the basis of multiple, affected relatives. By incorporating the ascertainment process in the simulation, SimRVPedigree allows researchers to better understand the within-family patterns of relationship amongst affected individuals and ages of disease onset.Conclusions: Through simulation, we show that affected members of a family segregating a rare disease variant tend to be more numerous and cluster in relationships more closely than those for sporadic disease. We also show that the family ascertainment process can lead to apparent anticipation in the age of onset. Finally, we use simulation to gain insight into the limit on the proportion of ascertained families segregating a causal variant. SimRVPedigree should be useful to investigators seeking insight into the family-based study design through simulation.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"13 ","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-018-0069-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36614519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity. SOV_refine:进一步细化了片段重叠评分的定义及其对蛋白质结构相似性的意义。

Source Code for Biology and Medicine Pub Date : 2018-04-20 eCollection Date: 2018-01-01 DOI: 10.1186/s13029-018-0068-7

Tong Liu, Zheng Wang

{"title":"SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity.","authors":"Tong Liu, Zheng Wang","doi":"10.1186/s13029-018-0068-7","DOIUrl":"https://doi.org/10.1186/s13029-018-0068-7","url":null,"abstract":"Background: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV's advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately.Results: A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance.Conclusions: The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"13 ","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2018-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-018-0068-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36058289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16