Heba Khaled, Hossam El Deen Mostafa Faheem, Rania El Gohary
{"title":"Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.","authors":"Heba Khaled, Hossam El Deen Mostafa Faheem, Rania El Gohary","doi":"10.1504/ijdmb.2015.069710","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069710","url":null,"abstract":"<p><p>This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069710","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34125296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chih-Chung Yang, Wen-Shin Lin, Chien-Pang Lee, Yungho Leu
{"title":"Two stages weighted sampling strategy for detecting the relation between gene expression and disease.","authors":"Chih-Chung Yang, Wen-Shin Lin, Chien-Pang Lee, Yungho Leu","doi":"10.1504/ijdmb.2015.069417","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069417","url":null,"abstract":"<p><p>For microarray data analysis, most of them focus on selecting relevant genes and calculating the classification accuracy by the selected relevant genes. This paper wants to detect the relation between the gene expression levels and the classes of a cancer (or a disease) to assist researchers for initial diagnosis. The proposed method is called a Two Stages Weighted Sampling strategy (TSWS strategy). According to the results, the performance of TSWS strategy is better than other existing methods in terms of the classification accuracy and the number of selected relevant genes. Furthermore, TSWS strategy also can use to understand and detect the relation between the gene expression levels and the classes of a cancer (or a disease).</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069417","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34123512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Named entity recognition and classification in biomedical text using classifier ensemble.","authors":"Sriparna Saha, Asif Ekbal, Utpal Kumar Sikdar","doi":"10.1504/ijdmb.2015.067954","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067954","url":null,"abstract":"<p><p>Named Entity Recognition and Classification (NERC) is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm (GA) for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F- measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067954","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34145685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LMDS-based approach for efficient top-k local ligand-binding site search.","authors":"Sungchul Kim, Lee Sael, Hwanjo Yu","doi":"10.1504/ijdmb.2015.070066","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.070066","url":null,"abstract":"<p><p>In this work, we propose a LMDS-based binding-site search for improving the search speed of the Patch-Surfer method. Patch-Surfer is efficient in recognition of protein-ligand binding partners, further speedup is necessary to address multiple-user access. Futher speedup is realised by exploiting Landmark Multi-Dimensional Scaling (LMDS). It computes embedding coordinates for data points based on their distances from landmark points. When selecting the landmark points, we adopt two approaches--random and greedy selection. Our method approximately retrieves top-k results and the accuracy increases as we exploit more landmark points. Although two landmark selection approaches show comparable results, the greedy selection shows the best performance when the number of landmark points is large. Using our method, the searching time is reduced up to 99% and it retrieves almost 80% of exact top-k results. Additionally, LMDS-based binding-site search+ improves the retrieval accuracy from 80% to 95% while sacrificing the speedup ratio from 99% to 90% compared to Patch-Surfer.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.070066","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34192166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gene microarray data analysis using parallel point-symmetry-based clustering.","authors":"Anasua Sarkar, Ujjwal Maulik","doi":"10.1504/ijdmb.2015.067320","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067320","url":null,"abstract":"<p><p>Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pedro Ferreira, Nuno A Fonseca, Inês Dutra, Ryan Woods, Elizabeth Burnside
{"title":"Predicting malignancy from mammography findings and image-guided core biopsies.","authors":"Pedro Ferreira, Nuno A Fonseca, Inês Dutra, Ryan Woods, Elizabeth Burnside","doi":"10.1504/ijdmb.2015.067319","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067319","url":null,"abstract":"<p><p>The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a dataset consisting of 348 consecutive breast masses that underwent image guided core biopsy performed between October 2005 and December 2007 on 328 female subjects. We applied various algorithms with parameter variation to learn from the data. The tasks were to predict mass density and to predict malignancy. The best classifier that predicts mass density is based on a support vector machine and has accuracy of 81.3%. The expert correctly annotated 70% of the mass densities. The best classifier that predicts malignancy is also based on a support vector machine and has accuracy of 85.6%, with a positive predictive value of 85%. One important contribution of this work is that our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067319","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Faro, Daniela Giordano, Francesco Maiorana
{"title":"Mining literatures to discover novel multiple biological associations in a disease context.","authors":"Alberto Faro, Daniela Giordano, Francesco Maiorana","doi":"10.1504/ijdmb.2015.069419","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069419","url":null,"abstract":"The text mining methods proposed to discover associations between pairs of biological entities by mining a scientific literature often extract associations already existing in the literature, whereas their extensions supervise too much the discovery process with heuristics and ontologies that limit the research space. On the other hand, the methods that search novel associations applying the text mining methods to two literatures do not avoid the risk of discovering syllogisms based on faulty premises. For this reason, the paper proposes a method that helps the users to discover associations among biological entities by mining the literature using an unsupervised clustering approach. The discovered multiple associations are derived from binary associations to limit the computational load without compromising the methodology accuracy. A case study demonstrates how the tool derived from the methodology works in practice. A comparison between this tool and other tools available in the literature points out the methodology effectiveness.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069419","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34123513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaheera Rashwan, Amany Sarhan, Muhamed Talaat Faheem, Bayumy A Youssef
{"title":"Fuzzy watershed segmentation algorithm: an enhanced algorithm for 2D gel electrophoresis image segmentation.","authors":"Shaheera Rashwan, Amany Sarhan, Muhamed Talaat Faheem, Bayumy A Youssef","doi":"10.1504/ijdmb.2015.069659","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069659","url":null,"abstract":"<p><p>Detection and quantification of protein spots is an important issue in the analysis of two-dimensional electrophoresis images. However, there is a main challenge in the segmentation of 2DGE images which is to separate overlapping protein spots correctly and to find the weak protein spots. In this paper, we describe a new robust technique to segment and model the different spots present in the gels. The watershed segmentation algorithm is modified to handle the problem of over-segmentation by initially partitioning the image to mosaic regions using the composition of fuzzy relations. The experimental results showed the effectiveness of the proposed algorithm to overcome the over segmentation problem associated with the available algorithm. We also use a wavelet denoising function to enhance the quality of the segmented image. The results of using a denoising function before the proposed fuzzy watershed segmentation algorithm is promising as they are better than those without denoising.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069659","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34125294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed F Ghalwash, Dušan Ramljak, Zoran Obradović
{"title":"Patient-specific early classification of multivariate observations.","authors":"Mohamed F Ghalwash, Dušan Ramljak, Zoran Obradović","doi":"10.1504/ijdmb.2015.067955","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067955","url":null,"abstract":"<p><p>Early classification of time series has been receiving a lot of attention recently. In this paper we present a model, which we call the Early Classification Model (ECM), that allows for early, accurate and patient-specific classification of multivariate observations. ECM is comprised of an integration of the widely used Hidden Markov Model (HMM) and Support Vector Machine (SVM) models. It attained very promising results on the datasets we tested it on: in one set of experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification. In the set of experiments tested on a sepsis therapy dataset, ECM was able to surpass the standard threshold-based method and the state-of-the-art method for early classification of multivariate time series.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34145686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A system biology approach for understanding the miRNA regulatory network in colon rectal cancer.","authors":"Meeta Pradhan, Kshithija Nagulapalli, Lakenvia Ledford, Yogesh Pandit, Mathew Palakal","doi":"10.1504/ijdmb.2015.066332","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066332","url":null,"abstract":"<p><p>In this paper we present a systems biology approach to the understanding of the miRNA-regulatory network in colon rectal cancer. An initial set of significant genes in Colon Rectal Cancer (CRC) were obtained by mining relevant literature. An initial set of cancer-related miRNAs were obtained from three databases: miRBase, miRWalk, Targetscan and GEO microarray experiment. First principle methods were then used to generate the global miRNA-gene network. Significant miRNAs and associated transcription factors in the global miRNA-gene network were identified using topological and sub-graph analyses. Eleven novel miRNAs were identified and three of the novel miRNAs, hsa-miR-630, hsa-miR-100 and hsa-miR-99a, were further analysed to elucidate their role in CRC. The proposed methodology effectively made use of literature data and was able to show novel, significant miRNA-transcription associations in CRC.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066332","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}