{"title":"Search for Protein Sequence Homologues that Display Considerable Domain Length Variations","authors":"Eshita Mutt, A. Mitra, R. Sowdhamini","doi":"10.4018/jkdb.2011040104","DOIUrl":"https://doi.org/10.4018/jkdb.2011040104","url":null,"abstract":"Independent folding units which have the capability of carrying out biological functions have been classified as “protein domains†. These minimal structural units lead not only to considerable sequence changes of protein domains of similar folds and functions, but also gives rise to remarkable length variations under evolutionary pressure. Rapid and heuristic sequence search algorithms are generally sensitive and effective in recognizing protein domains that are distantly related within large sequence databases, but are not well-suited to identify remote homologues of varying lengths. An even more challenging aspect is introduced to distinguish reliable hits from a vast number of putative false positives that could have suboptimal sequence similarities. Here, the authors present a data-mining approach that provides stage-specific filters in sequence searches to reliably accumulate remote homologues, which encourages sampling of length variations albeit with a low false positive rate. Realization of such remote homologues with vivid length variations could contribute to better understanding of functional variety within protein domain superfamilies.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126191671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-Based Shape Analysis for MRI Classification","authors":"L. Holder","doi":"10.4018/JKDB.2011040102","DOIUrl":"https://doi.org/10.4018/JKDB.2011040102","url":null,"abstract":"Searching for correlations between brain structure and attributes of a person’s intellectual state is a process which may be better done by automation than by human labor. Such an automated system would be capable of performing classification based on the discovered correlation, which would be means of testing how accurate the discovered correlation is. The authors have developed a system which generates a graph-based representation of the shape of the third and lateral ventricles based on a structural MRI, and classifies images represented in this manner. The system is evaluated on accuracy at classifying individuals showing cognitive impairment to Alzheimer’s Disease. Classification accuracy is 74.2% when individuals with CDR 0.5 are included as impaired in a balanced dataset of 166 images, and 79.3% accuracy when differentiating individuals with CDR at least 1.0 and healthy individuals in a balanced dataset of 54 images. Finally, the system is used to classify MR images according to level of education, with 77.2% accuracy differentiating highly-educated individuals from those for whom no higher education is listed, in a balanced dataset of 178 images.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130400600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miao Wang, Xuequn Shang, Shaohua Zhang, Zhanhuai Li
{"title":"Efficient Mining Frequent Closed Discriminative Biclusters by Sample-Growth: The FDCluster Approach","authors":"Miao Wang, Xuequn Shang, Shaohua Zhang, Zhanhuai Li","doi":"10.4018/jkdb.2010100104","DOIUrl":"https://doi.org/10.4018/jkdb.2010100104","url":null,"abstract":"DNA microarray technology has generated a large number of gene expression data. Biclustering is a methodology allowing for condition set and gene set points clustering simultaneously. It finds clusters of genes possessing similar characteristics together with biological conditions creating these similarities. Almost all the current biclustering algorithms find bicluster in one microarray dataset. In order to reduce the noise influence and find more biological biclusters, the authors propose the FDCluster algorithm in order to mine frequent closed discriminative bicluster in multiple microarray datasets. FDCluster uses Apriori property and several novel techniques for pruning to mine biclusters efficiently. To increase the space usage, FDCluster also utilizes several techniques to generate frequent closed bicluster without candidate maintenance in memory. The experimental results show that FDCluster is more effective than traditional methods in either single micorarray dataset or multiple microarray datasets. This paper tests the biological significance using GO to show the proposed method is able to produce biologically relevant biclusters.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128402840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Prediction Accuracy via Subspace Modeling in a Statistical Geometry Based Computational Protein Mutagenesis","authors":"M. Masso","doi":"10.4018/jkdb.2010100103","DOIUrl":"https://doi.org/10.4018/jkdb.2010100103","url":null,"abstract":"A computational mutagenesis is detailed whereby each single residue substitution in a protein chain of primary sequence length N is represented as a sparse N-dimensional feature vector, whose M","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"1276 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123356399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revealing the Origin and Nature of Drug Resistance of Dynamic Tumour Systems","authors":"R. Santiago-Mozos, I. Khan, M. G. Madden","doi":"10.4018/jkdb.2010100102","DOIUrl":"https://doi.org/10.4018/jkdb.2010100102","url":null,"abstract":"In this paper, the authors identify the strategies that resistant subpopulations of cancer cells undertake to overcome the effect of the anticancer drug Topotecan. For the analyses of cell lineage data encoded from timelapse microscopy, data mining tools are chosen that generate interpretable models of the data, addressing their statistical significance. By interpreting the short-term and long-term cytotoxic effect of Topotecan through these data models, the authors reveal the strategies that resistant subpopulations of cells undertake to maximize their clonal expansion potential. In this context, this paper identifies a pattern of cell death independent of cytotoxic effect. Finally, it is observed that cells exposed to Topotecan have higher movement over time, indicating a putative relationship between cytotoxic effect and cell motility.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125549397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junming Shao, K. Hahn, Qinli Yang, A. Wohlschläger, C. Böhm, Nicholas Myers, C. Plant
{"title":"Hierarchical Density-Based Clustering of White Matter Tracts in the Human Brain","authors":"Junming Shao, K. Hahn, Qinli Yang, A. Wohlschläger, C. Böhm, Nicholas Myers, C. Plant","doi":"10.4018/jkdb.2010100101","DOIUrl":"https://doi.org/10.4018/jkdb.2010100101","url":null,"abstract":"Diffusion tensor magnetic resonance imaging (DTI) provides a promising way of estimating the neural fiber pathways in the human brain non-invasively via white matter tractography. However, it is difficult to analyze the vast number of resulting tracts quantitatively. Automatic tract clustering would be useful for the neuroscience community, as it can contribute to accurate neurosurgical planning, tract-based analysis, or white matter atlas creation. In this paper, the authors propose a new framework for automatic white matter tract clustering using a hierarchical density-based approach. A novel fiber similarity measure based on dynamic time warping allows for an effective and efficient evaluation of fiber similarity. A lower bounding technique is used to further speed up the computation. Then the algorithm OPTICS is applied, to sort the data into a reachability plot, visualizing the clustering structure of the data. Interactive and automatic clustering algorithms are finally introduced to obtain the clusters. Extensive experiments on synthetic data and real data demonstrate the effectiveness and efficiency of our fiber similarity measure and show that the hierarchical density-based clustering method can group these tracts into meaningful bundles on multiple scales as well as eliminating noisy fibers.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"51 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132780981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Protein Interactome Networks to Measure Interaction Reliability and Select Hub Proteins","authors":"Young-Rae Cho, A. Zhang","doi":"10.4018/jkdb.2010070102","DOIUrl":"https://doi.org/10.4018/jkdb.2010070102","url":null,"abstract":"High-throughput techniques involve large-scale detection of protein-protein interactions. This interaction data set from the genome-scale perspective is structured into an interactome network. Since the interaction evidence represents functional linkage, various graph-theoretic computational approaches have been applied to the interactome networks for functional characterization. However, this data is generally unreliable, and the typical genome-wide interactome networks have a complex connectivity. In this paper, the authors explore systematic analysis of protein interactome networks, and propose a $k$-round signal flow simulation algorithm to measure interaction reliability from connection patterns of the interactome networks. This algorithm quantitatively characterizes functional links between proteins by simulating the propagation of information signals through complex connections. In this regard, the algorithm efficiently estimates the strength of alternative paths for each interaction. The authors also present an algorithm for mining the complex interactome network structure. The algorithm restructures the network by hierarchical ordering of nodes, and this structure re-formatting process reveals hub proteins in the interactome networks. This paper demonstrates that two rounds of simulation accurately scores interaction reliability in terms of ontological correlation and functional consistency. Finally, the authors validate that the selected structural hubs represent functional core proteins.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126234482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bioinformatics Methods for Studying MicroRNA and ARE-Mediated Regulation of Post-Transcriptional Gene Expression","authors":"Richipal Singh Bindra, J. Wang, P. Bagga","doi":"10.4018/jkdb.2010070106","DOIUrl":"https://doi.org/10.4018/jkdb.2010070106","url":null,"abstract":"MicroRNAs (miRNAs) are short single-stranded RNA molecules with 21-22 nucleotides known to regulate post-transcriptional expression of protein-coding genes involved in most of the cellular processes. Prediction of miRNA targets is a challenging bioinformatics problem. AU-rich elements (AREs) are regulatory RNA motifs found in the 3’ untranslated regions (UTRs) of mRNAs, and they play dominant roles in the regulated decay of short-lived human mRNAs via specific interactions with proteins. In this paper, the authors review several miRNA target prediction tools and data sources, as well as computational methods used for the prediction of AREs. The authors discuss the connection between miRNA and ARE-mediated post-transcriptional gene regulation. Finally, a data mining method for identifying the co-occurrences of miRNA target sites in ARE containing genes is presented.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115029561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammed J. Zaki, Naren Ramakrishnan, Lizhuang Zhao
{"title":"Mining Frequent Boolean Expressions: Application to Gene Expression and Regulatory Modeling","authors":"Mohammed J. Zaki, Naren Ramakrishnan, Lizhuang Zhao","doi":"10.4018/jkdb.2010070105","DOIUrl":"https://doi.org/10.4018/jkdb.2010070105","url":null,"abstract":"Regulatory network analysis and other bioinformatics tasks require the ability to induce and represent arbitrary boolean expressions from data sources. In this paper, the authors introduce a novel framework called BLOSOM for mining (frequent) boolean expressions over binary-valued datasets. Boolean expressions can be grouped into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. The authors’ main focus is on mining the simplest expressions (the minimal generators), but also to propose closure operators that yield closed (or unique maximal) boolean expressions. BLOSOM efficiently mines frequent boolean expressions by utilizing a number of methodical pruning techniques. Experiments showcase the behavior of BLOSOM for different input settings and parameter thresholds. Application studies on gene expression and gene regulation patterns showcase the effectiveness of this approach.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124949196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discriminative Subgraph Mining for Protein Classification","authors":"Ning Jin, Calvin Young, Wei Wang","doi":"10.4018/jkdb.2010070103","DOIUrl":"https://doi.org/10.4018/jkdb.2010070103","url":null,"abstract":"Protein classification can be performed by representing 3-D protein structures by graphs and then classifying the corresponding graphs. One effective way to classify such graphs is to use frequent subgraph patterns as features; however, the effectiveness of using subgraph patterns in graph classification is often hampered by the large search space of subgraph patterns. In this paper, the authors present two efficient discriminative subgraph mining algorithms: COM and GAIA. These algorithms directly search for discriminative subgraph patterns rather than frequent subgraph patterns which can be used to generate classification rules. Experimental results show that COM and GAIA can achieve high classification accuracy and runtime efficiency. Additionally, they find substructures that are very close to the proteins’ actual active sites.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123170257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}