{"title":"Incorporating Network Topology Improves Prediction of Protein Interaction Networks from Transcriptomic Data","authors":"Peter E. Larsen, F. Collart, Yang Dai","doi":"10.4018/jkdb.2010070101","DOIUrl":"https://doi.org/10.4018/jkdb.2010070101","url":null,"abstract":"The reconstruction of protein-protein interaction (PPI) networks from high-throughput experimental data is one of the most challenging problems in bioinformatics. These biological networks have specific topologies defined by the functional and evolutionary relationships between the proteins and the physical limitations imposed on proteins interacting in the three-dimensional space. In this paper, the authors propose a novel approach for the identification of potential protein-protein interactions based on the integration of known PPI network topology and transcriptomic data. The proposed method, Function Restricted Value Neighborhood (FRV-N), was used to reconstruct PPI networks using an experimental data set consisting of 170 yeast microarray profiles. The results of this analysis demonstrate that incorporating knowledge of interactome topology improves the ability of transcriptome analysis to reconstruct interaction networks with a high degree of biological relevance.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133254221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew E. Blanchard, Christopher Wolter, D. McNabb, Eitan Gross
{"title":"Wave-SOM: A Novel Wavelet-Based Clustering Algorithm for Analysis of Gene Expression Patterns","authors":"Andrew E. Blanchard, Christopher Wolter, D. McNabb, Eitan Gross","doi":"10.4018/jkdb.2010040104","DOIUrl":"https://doi.org/10.4018/jkdb.2010040104","url":null,"abstract":"In this paper, the authors present a wavelet-based algorithm (Wave-SOM) to help visualize and cluster oscillatory time-series data in two-dimensional gene expression micro-arrays. Using various wavelet transformations, raw data are first de-noised by decomposing the time-series into low and high frequency wavelet coefficients. Following thresholding, the coefficients are fed as an input vector into a two-dimensional Self-Organizing-Map clustering algorithm. Transformed data are then clustered by minimizing the Euclidean (L2) distance between their corresponding fluctuation patterns. A multi-resolution analysis by Wave-SOM of expression data from the yeast Saccharomyces cerevisiae, exposed to oxidative stress and glucose-limited growth, identified 29 genes with correlated expression patterns that were mapped into 5 different nodes. The ordered clustering of yeast genes by Wave-SOM illustrates that the same set of genes (encoding ribosomal proteins) can be regulated by two different environmental stresses, oxidative stress and starvation. The algorithm provides heuristic information regarding the similarity of different genes. Using previously studied expression patterns of yeast cell-cycle and functional genes as test data sets, the authors’ algorithm outperformed five other competing programs.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127638943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ulrich, P. Baumann, A. Conca, H. Kuss, V. Stieffenhofer, C. Hiemke
{"title":"SPCCTDM, a Catalogue for Analysis of Therapeutic Drug Monitoring Related Contents in the Drug Prescription Information","authors":"S. Ulrich, P. Baumann, A. Conca, H. Kuss, V. Stieffenhofer, C. Hiemke","doi":"10.4018/jkdb.2010040101","DOIUrl":"https://doi.org/10.4018/jkdb.2010040101","url":null,"abstract":"Therapeutic drug monitoring (TDM) has consistently been shown to be useful for optimization of drug therapy. For the first time, a method has been developed for the text analysis of TDM in SPCs in that a catalogue SPC-ContentTDM (SPCCTDM) provides a codification of the content of TDM in SPCs. It consists of six structure-related items (dose, adverse drug reactions, drug interactions, overdose, pregnancy/breast feeding, and pharmacokinetics) according to implicit or explicit references to TDM in paragraphs of the SPC, and four theory-guided items according to the information about ranges of plasma concentrations and a recommendation of TDM in the SPC. The catalogue is regarded as valid for the text analysis of SPCs with respect to TDM. It can be used in the comparison of SPCs, in the comparison with medico-scientific evidence and for the estimation of the perception of TDM in SPCs by the reader. Regarding the approach as a model of text mining, it may be extended for evaluation of other aspects reported in SPCs.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115869520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erliang Zeng, Chengyong Yang, Tao Li, G. Narasimhan
{"title":"Clustering Genes Using Heterogeneous Data Sources","authors":"Erliang Zeng, Chengyong Yang, Tao Li, G. Narasimhan","doi":"10.4018/jkdb.2010040102","DOIUrl":"https://doi.org/10.4018/jkdb.2010040102","url":null,"abstract":"Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123938188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infer Species Phylogenies Using Self-Organizing Maps","authors":"Xiaoxu Han","doi":"10.4018/jkdb.2010040103","DOIUrl":"https://doi.org/10.4018/jkdb.2010040103","url":null,"abstract":"With rapid advances in genomics, phylogenetics has turned to phylogenomics due to the availability of large amounts of sequence and genome data. However, incongruence between species trees and gene trees remains a challenge in molecular phylogenetics for its biological and algorithmic complexities. A state-of-the-art gene concatenation approach was proposed to resolve this problem by inferring the species phylogeny using a random combination of widely distributed orthologous genes screened from genomes. However, such an approach may not be a robust solution to this problem because it ignores the fact that some genes are more informative than others in species inference. This paper presents a self-organizing map (SOM) based phylogeny inference method to overcome its weakness. The author’s proposed algorithm not only demonstrates its superiority to the original gene concatenation method by using same datasets, but also shows its advantages in generalization. This paper illustrates that data missing may not play a negative role in phylogeny inferring. This study presents a method to cluster multispecies genes, estimate multispecies gene entropy and visualize the species patterns through the self-organizing map mining.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124346859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification of Distinguishing Motifs","authors":"Wangsen Feng, Lusheng Wang","doi":"10.4018/jkdb.2010070104","DOIUrl":"https://doi.org/10.4018/jkdb.2010070104","url":null,"abstract":"Motif identification for DNA sequences has many important applications in biological studies, including diagnostic probe design, locating binding sites and regulatory signals, and potential drug target identification. There are two versions—the Single Group and Two Groups. Here, the occurrences of the motif in the given sequences have errors. Currently, most of existing programs can only handle the case of single group. However, most of the programs do not allow indels (insertions and deletions) in the occurrences of the motif. In this paper, the authors propose a randomized algorithm for the one group problem that can handle indels in the occurrences of the motif. Finally, an algorithm for the two groups’ problem is given along with extensive simulations evaluating algorithms.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133965944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scaling Unsupervised Risk Stratification to Massive Clinical Datasets","authors":"Z. Syed, I. Rubinfeld","doi":"10.4018/jkdb.2011010103","DOIUrl":"https://doi.org/10.4018/jkdb.2011010103","url":null,"abstract":"While rare clinical events, by definition, occur infrequently in a population, the consequences of these events can be drastic. Unfortunately, developing risk stratification algorithms for these conditions requires large volumes of data to capture enough positive and negative cases. This process is slow, expensive, and burdensome to both patients and caregivers. This paper proposes an unsupervised machine learning approach to address this challenge and risk stratify patients for adverse outcomes without use of a priori knowledge or labeled training data. The key idea of the approach is to identify high-risk patients as anomalies in a population. Cases are identified through a novel algorithm that finds an approximate solution to the k-nearest neighbor problem using locality sensitive hashing (LSH) based on p-stable distributions. The algorithm is optimized to use multiple LSH searches, each with a geometrically increasing radius, to find the k-nearest neighbors of patients in a dynamically changing dataset where patients are being added or removed over time. When evaluated on data from the National Surgical Quality Improvement Program (NSQIP), this approach successfully identifies patients at an elevated risk of mortality and rare morbidities. The LSH-based algorithm provided a substantial improvement over an exact k-nearest neighbor algorithm in runtime, while achieving a similar accuracy.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116526562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kshira Sagar Sahoo, B. Sahoo, Ratnakar Dash, B. K. Mishra
{"title":"Improving Resiliency in SDN using Routing Tree Algorithms","authors":"Kshira Sagar Sahoo, B. Sahoo, Ratnakar Dash, B. K. Mishra","doi":"10.4018/IJKDB.2017010104","DOIUrl":"https://doi.org/10.4018/IJKDB.2017010104","url":null,"abstract":"The ability to recover the control logic after a failure is detected in specific time window is called resiliency. The Software Defined Network SDN is an emerged and powerful architecture which allow to separate the control plane from forwarding. This decoupling architecture brings new difficulties to the network resiliency because link failure between switch and controller could defunct the forwarding plane. It has been identified that the resiliency of the network can be improved by choosing the correct place for the controller and by choosing proper routing tree once the controller location is known. In this work, we have analysed the performance of various Routing Tree algorithms on different network topology generated by Bernoulli Random Graph model and found that Greedy Routing Tree GRT provides the maximum resiliency. The Closeness Centrality Theorem has proposed to find the best controller position and later analysed the performance of various single controller placement algorithms on GRT for finding the overall improvement of the resiliency of the network.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123988023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time-Aware Task Allocation for Cloud Computing Environment","authors":"Sushanta Meher, S. K. Pande, S. K. Panda","doi":"10.4018/IJKDB.2017010101","DOIUrl":"https://doi.org/10.4018/IJKDB.2017010101","url":null,"abstract":"Cloud computing provides access to various services such as servers, storage and applications to the customers' as and when required. The services on the cloud can be accessed with minimum efforts through any handheld devices that are connected to the Internet. In IaaS cloud, the services to the customers are provided in the form of two leases, AR and BE. Here, a running BE lease can be preempted upon arrival of an AR lease as BE has lower priority. However, frequent preemption of the BE lease causes an overhead to the system and leads to customer dissatisfaction. In this paper, we propose a fairness algorithm called TATA to provide fairness among the leases. We evaluate the proposed algorithm on various synthetic datasets and compare the results with an existing fairness algorithm. The results of simulation show that TATA produces better response time for both the leases than the existing algorithm.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126273059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Animal Actin Phylogeny and RNA Secondary Structure Study","authors":"B. P. Barik","doi":"10.4018/IJKDB.2015010104","DOIUrl":"https://doi.org/10.4018/IJKDB.2015010104","url":null,"abstract":"Animal actin is a diverse and evolutionarily ancient protein. Actin genes and their corresponding protein sequences were used to infer phylogenetic affiliations. The study indicated that several species appear to be polyphyletic and several unrelated species appear to share the same clade. Consensus actin RNA secondary structures showed that the structural features of all forms were quite distinct and different from each other. This observation supports the phylogenetic inference in which similarly named species clustered together based on their lifestyles. Consideration of actin gene geneology and consensus RNA secondary structures could be used as a possible phylogenetic marker among diverse species of the animal kingdom for large scale data analysis. In-silico study revealed variations among the groups. The percentages of long disordered regions in proteins were found to be very high in all forms. Such findings suggest that the complexity and ability to adapt in diverse habitats by species may be due to higher percentage of disordered proteins.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124070730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}