Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

Q4 Health Professions

International Journal of Bioinformatics Research and Applications Pub Date : 2014-01-01 DOI:10.1504/IJBRA.2014.060765

Khalid Mohammad Jaber, Rosni Abdullah, Nur'Aini Abdul Rashid

{"title":"Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.","authors":"Khalid Mohammad Jaber, Rosni Abdullah, Nur'Aini Abdul Rashid","doi":"10.1504/IJBRA.2014.060765","DOIUrl":null,"url":null,"abstract":"<p><p>In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time. </p>","PeriodicalId":35444,"journal":{"name":"International Journal of Bioinformatics Research and Applications","volume":"10 3","pages":"321-40"},"PeriodicalIF":0.0000,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJBRA.2014.060765","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJBRA.2014.060765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Health Professions","Score":null,"Total":0}

引用次数: 5

Abstract

In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.

查看原文本刊更多论文

基于混合分布式共享内存编程模型的快速决策树的大型dna -蛋白质序列数据库索引方法。

近年来，随着用户数量和查询率的不断增长，生物数据库的规模显著增加;以至于一些数据库已经达到了tb级的大小。因此，越来越需要以尽可能快的速度访问数据库。本文将决策树索引模型(PDTIM)并行化，在驻留数据库上采用分布式内存和共享内存的混合模式;通过消息传递接口(MPI)和POSIX线程(PThread)实现横向和纵向增长，以加快索引构建时间。PDTIM分别在1、2、3和4个线程上使用1、2、4和5个处理器实现。结果表明，与串行版本相比，混合技术提高了加速。从结果可以得出结论，就索引构建时间而言，所提出的PDTIM适用于大型数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Bioinformatics Research and Applications Health Professions-Health Information Management

CiteScore

0.60

自引率

0.00%

发文量

期刊介绍： Bioinformatics is an interdisciplinary research field that combines biology, computer science, mathematics and statistics into a broad-based field that will have profound impacts on all fields of biology. The emphasis of IJBRA is on basic bioinformatics research methods, tool development, performance evaluation and their applications in biology. IJBRA addresses the most innovative developments, research issues and solutions in bioinformatics and computational biology and their applications. Topics covered include Databases, bio-grid, system biology Biomedical image processing, modelling and simulation Bio-ontology and data mining, DNA assembly, clustering, mapping Computational genomics/proteomics Silico technology: computational intelligence, high performance computing E-health, telemedicine Gene expression, microarrays, identification, annotation Genetic algorithms, fuzzy logic, neural networks, data visualisation Hidden Markov models, machine learning, support vector machines Molecular evolution, phylogeny, modelling, simulation, sequence analysis Parallel algorithms/architectures, computational structural biology Phylogeny reconstruction algorithms, physiome, protein structure prediction Sequence assembly, search, alignment Signalling/computational biomedical data engineering Simulated annealing, statistical analysis, stochastic grammars.