Junzhong Ji, Hongxin Liu, Aidong Zhang, Zhijun Liu, Chunnian Liu
{"title":"ACC-FMD: ant colony clustering for functional module detection in protein-protein interaction networks.","authors":"Junzhong Ji, Hongxin Liu, Aidong Zhang, Zhijun Liu, Chunnian Liu","doi":"10.1504/ijdmb.2015.067323","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067323","url":null,"abstract":"<p><p>Mining functional modules in Protein-Protein Interaction (PPI) networks is a very important research for revealing the structure-functionality relationships in biological processes. More recently, some swarm intelligence algorithms have been successfully applied in the field. This paper presents a new nature-inspired approach, ACC-FMD, which is based on ant colony clustering to detect functional modules. First, some proteins with the higher clustering coefficients are, respectively, selected as ant seed nodes. And then, the picking and dropping operations based on ant probabilistic models are developed and employed to assign proteins into the corresponding clusters represented by seeds. Finally, the best clustering result in each generation is used to perform the information transmission by updating the similarly function. Experimental results on some benchmarked datasets show that ACC-FMD outperforms the CFinder and MCODE algorithms and has comparative performance with the MINE, COACH, DPClus and Core algorithms in terms of the general evaluation metrics.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067323","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34039167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A method for extracting task-oriented information from biological text sources.","authors":"Dhanasekaran Kuttiyapillai, R Rajeswari","doi":"10.1504/ijdmb.2015.070072","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.070072","url":null,"abstract":"<p><p>A method for information extraction which processes the unstructured data from document collection has been introduced. A dynamic programming technique adopted to find relevant genes from sequences which are longest and accurate is used for finding matching sequences and identifying effects of various factors. The proposed method could handle complex information sequences which give different meanings in different situations, eliminating irrelevant information. The text contents were pre-processed using a general-purpose method and were applied with entity tagging component. The bottom-up scanning of key-value pairs improves content finding to generate relevant sequences to the testing task. This paper highlights context-based extraction method for extracting food safety information, which is identified from articles, guideline documents and laboratory results. The graphical disease model verifies weak component through utilisation of development data set. This improves the accuracy of information retrieval in biological text analysis and reporting applications.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.070072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34192164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Granular support vector machine to identify unknown structural classes of protein.","authors":"Rohayanti Hassan, Razib M Othman, Zuraini A Shah","doi":"10.1504/ijdmb.2015.070065","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.070065","url":null,"abstract":"<p><p>To date, classification of structural class using local protein structure rather than the whole structure has been gaining widespread attention. It is noted that the structural class lies in local composition or arrangement of secondary structure, while the threshold-based classification method has restricted rules in determining these structural classes. As a consequence, some of the structures are unknown. In order to determine these unknown structural classes, we propose a fusion algorithm, abbreviated as GSVM-SigLpsSCPred (Granular Support Vector Machine--with Significant Local protein structure for Structural Class Prediction), which consists of two major components, which are: optimal local protein structure to represent the feature vector and granular support vector machine to predict the unknown structural classes. The results highlight the performance of GSVM-SigLpsSCPred as an alternative computational method for low-identity sequences.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.070065","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34192168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinsoo Lee, Romans Kasperovics, Wook-Shin Han, Jeong-Hoon Lee, Min Soo Kim, Hune Cho
{"title":"An efficient algorithm for updating regular expression indexes in RDF databases.","authors":"Jinsoo Lee, Romans Kasperovics, Wook-Shin Han, Jeong-Hoon Lee, Min Soo Kim, Hune Cho","doi":"10.1504/ijdmb.2015.066767","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066767","url":null,"abstract":"<p><p>The Resource Description Framework (RDF) is widely used for sharing biomedical data, such as gene ontology or the online protein database UniProt. SPARQL is a native query language for RDF, featuring regular expressions in queries for which exact values are either irrelevant or unknown. The use of regular expression indexes in SPARQL query processing improves the performance of queries containing regular expressions by up to two orders of magnitude. In this study, we address the update operation for regular expression indexes in RDF databases. We identify major performance problems of straightforward index update algorithms and propose a new algorithm that utilises unique properties of regular expression indexes to increase performance. Our contributions can be summarised as follows: (1) we propose an efficient update algorithm for regular expression indexes in RDF databases, (2) we build a prototype system for the proposed algorithm in C++ and (3) we conduct extensive experiments demonstrating the improvement of our algorithm over the straightforward approaches by an order of magnitude.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33906549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuo Li, James O Nyagilo, Digant P Dave, Wei Wang, Baoju Zhang, Jean Gao
{"title":"Probabilistic partial least squares regression for quantitative analysis of Raman spectra.","authors":"Shuo Li, James O Nyagilo, Digant P Dave, Wei Wang, Baoju Zhang, Jean Gao","doi":"10.1504/ijdmb.2015.066768","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066768","url":null,"abstract":"<p><p>With the latest development of Surface-Enhanced Raman Scattering (SERS) technique, quantitative analysis of Raman spectra has shown the potential and promising trend of development in vivo molecular imaging. Partial Least Squares Regression (PLSR) is state-of-the-art method. But it only relies on training samples, which makes it difficult to incorporate complex domain knowledge. Based on probabilistic Principal Component Analysis (PCA) and probabilistic curve fitting idea, we propose a probabilistic PLSR (PPLSR) model and an Estimation Maximisation (EM) algorithm for estimating parameters. This model explains PLSR from a probabilistic viewpoint, describes its essential meaning and provides a foundation to develop future Bayesian nonparametrics models. Two real Raman spectra datasets were used to evaluate this model, and experimental results show its effectiveness.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066768","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33906550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering essential proteins based on PPI network and protein complex.","authors":"Jun Ren, Jianxin Wang, Min Li, Fangxiang Wu","doi":"10.1504/ijdmb.2015.068951","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068951","url":null,"abstract":"<p><p>Most computational methods for identifying essential proteins focus on the topological centrality of protein-protein interaction (PPI) networks. However, these methods have limitations, such as the difficulty for identifying essential proteins with low centrality values and the poor performance for incomplete PPI network. In this paper, protein complex is proven to be an important factor for determining protein essentiality and a new centrality measure, complex centrality, is proposed. The weighted average of complex centrality and subgraph centrality, called harmonic centrality (HC), is proposed to predict essential proteins. It combines PPI network topology and protein complex information and has better performance than methods based on PPI network. The improvement is higher when the PPI network is incomplete. Furthermore, a weighted PPI network is generated by integrating cellular localisation and biological process to a PPI network. The performance of HC measure is improved 5% in this weighted PPI network.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068951","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdul Hakim Mohamed Salleh, Mohd Saberi Mohamad, Safaai Deris, Rosli Md Illias
{"title":"Metabolites production improvement by identifying minimal genomes and essential genes using flux balance analysis.","authors":"Abdul Hakim Mohamed Salleh, Mohd Saberi Mohamad, Safaai Deris, Rosli Md Illias","doi":"10.1504/ijdmb.2015.068955","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068955","url":null,"abstract":"<p><p>With the advancement in metabolic engineering technologies, reconstruction of the genome of host organisms to achieve desired phenotypes can be made. However, due to the complexity and size of the genome scale metabolic network, significant components tend to be invisible. We proposed an approach to improve metabolite production that consists of two steps. First, we find the essential genes and identify the minimal genome by a single gene deletion process using Flux Balance Analysis (FBA) and second by identifying the significant pathway for the metabolite production using gene expression data. A genome scale model of Saccharomyces cerevisiae for production of vanillin and acetate is used to test this approach. The result has shown the reliability of this approach to find essential genes, reduce genome size and identify production pathway that can further optimise the production yield. The identified genes and pathways can be extendable to other applications especially in strain optimisation.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"To screen the effective software for analysing gene interactions from Kashin-Beck disease genome profiling pathway and network, according to the tool of GeneMANIA.","authors":"Sen Wang, Weizhuo Wang, Junjie Zhao, Feng Zhang, Shulan He, Xiong Guo","doi":"10.1504/ijdmb.2015.068963","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068963","url":null,"abstract":"<p><p>In order to screen the more effective software for the pathway and network analysis of Kashin-Beck disease, gene microarrays, TranscriptomeBrowser, MetaCore and GeneMANIA were used for analysis. Three significant chondrocytic pathways and one network were screened by TranscriptomeBrowser; one significant pathway and one network were identified by MetaCore. BAX, APAF1, CASP6, BCL2, VEGF, SOCS3, BAK, TGFBI, TNFAIP6, TNFRSF11B and THBS1 were significant genes associated with the biological function of chondrocyte or cartilage involved in the TranscriptomeBrowser or MetaCore results. The interactions between the significant genes and their adjacent genes were searched and classified in GeneMANIA. In pathway analysis results, TranscriptomeBrowser is superior to get the interaction of pathway and co-expression compared with MetaCore; MetaCore is superior to get the interaction of physical interaction compared with TranscriptomeBrowser. In network analysis results, TranscriptomeBrowser contains more interaction message of co-localisation, MetaCore contains, more interaction message of co-expression.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068963","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modelling and structural characteristics analysis of gene networks for prostate cancer.","authors":"Yulin Zhang, Shudong Wang, Dazhi Meng","doi":"10.1504/ijdmb.2015.068950","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068950","url":null,"abstract":"<p><p>Analysing structure of gene networks is an important way to understand regulatory mechanisms of organism at the molecular level. In this work, gene mutual information networks are constructed based on gene expression profiles in prostate tissues with and without cancer. In order to contrast structural difference of normal and diseased networks, curves of four structural parameters are given with the change of thresholds. Then threshold discrimination intervals and discrimination weights are defined. A method of finding structural key genes with significant degree-difference is proposed. The finding of key genes will help the biomedical scientists to further research the pathogenesis of prostate cancer. Finally randomisation test is performed to prove that these structural parameters can distinguish normal and prostate cancer in their structures compared with these results in real data.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068950","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34106977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christine W Duarte, Yann C Klimentidis, Jacqueline J Harris, Michelle Cardel, José R Fernández
{"title":"Discovery of phenotypic networks from genotypic association studies with application to obesity.","authors":"Christine W Duarte, Yann C Klimentidis, Jacqueline J Harris, Michelle Cardel, José R Fernández","doi":"10.1504/ijdmb.2015.069414","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069414","url":null,"abstract":"<p><p>Genome-wide Association Studies (GWAS) have resulted in many discovered risk variants for several obesity-related traits. However, before clinical relevance of these discoveries can be achieved, molecular or physiological mechanisms of these risk variants needs to be discovered. One strategy is to perform data mining of phenotypically-rich data sources such as those present in dbGAP (database of Genotypes and Phenotypes) for hypothesis generation. Here we propose a technique that combines the power of existing Bayesian Network (BN) learning algorithms with the statistical rigour of Structural Equation Modelling (SEM) to produce an overall phenotypic network discovery system with optimal properties. We illustrate our method using the analysis of a candidate SNP data set from the AMERICO sample, a multi-ethnic cross-sectional cohort of roughly 300 children with detailed obesity-related phenotypes. We demonstrate our approach by showing genetic mechanisms for three obesity-related SNPs.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069414","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34123508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}