{"title":"Predicting gene function by combining expression and interaction data","authors":"R. V. Berlo, L. Wessels, S. Martes, M. Reinders","doi":"10.1109/CSBW.2005.111","DOIUrl":"https://doi.org/10.1109/CSBW.2005.111","url":null,"abstract":"In this study we combined the spurious protein interaction data from the Database of Interacting Proteins with the recently published gene expression data of S. cerevisiae grown with limited nutrient limitations under different physical/chemical conditions (Tai et al.) in order to predict protein interactions and protein functions with more confidence. Because proteins often have multiple functional annotations, we propose to employ a continuous metric (e.g. the cosine angle) for measuring functional similarity. We show that it is possible to extract multiple functional associations of a gene, but only by applying a strict Pearson correlation threshold on the gene expression data. Using this strategy, we were able to predict the function of six formally unclassified proteins. Additionally, we revealed six small networks of interacting proteins. These networks strongly match with existing biological knowledge. Furthermore, transcription factors could be assigned to four of these interaction networks by employing a recently published transcription database (Harbison et al.).","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128333305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PROMOCO: a new program for prediction of cis regulatory elements: from high-information content analysis to clique identification","authors":"Guojun Li, Jizhu Lu, V. Olman, Ying Xu","doi":"10.1109/CSBW.2005.113","DOIUrl":"https://doi.org/10.1109/CSBW.2005.113","url":null,"abstract":"We present a computational study for prediction of cis regulatory elements. We model the problem as follows. Each set of conserved binding motifs, evolved from one common ancestor, have a short (Hamming) distance from this ancestor. The problem is to identify a set of l-mers from a given set of promoter sequences which have at most k different positions from the to-be-identified ancestor. A number of papers published in the past attempt to solve this challenging problem. Although the putative ancestor is unknown, even it does not appear in whole background database, we may assume that an instance of it at hand since we can guess it. Our main contribution in this paper is to develop an algorithm, named PROMOCO (PROfile Motif Collection), to find a profile containing all the motifs and relatively small number of random l-mers so that the consensus of the profile would be the putative ancestor. The key idea of the PROMOCO algorithm lies in a new distance measure.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129710021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cluster utility: a new metric for clustering biological sequences","authors":"Jason Lee, Sun Kim","doi":"10.1109/CSBW.2005.38","DOIUrl":"https://doi.org/10.1109/CSBW.2005.38","url":null,"abstract":"We propose cluster utility (CU), a metric that is based on consideration of similarity within a cluster and difference between clusters without metric space assumption. CU showed a very high correlation with the quality index. CU scales very well with data size and its strong correlation with quality index was nearly invariable regardless of data size change. CU can be used in two ways: to guide sequence clustering algorithms and to evaluate clustering results.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114662864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-occurring and rare quads in PDB and translated introns from XPro with possible applications in nanostructure design","authors":"G. Sampath, James TenEyck","doi":"10.1109/CSBW.2005.97","DOIUrl":"https://doi.org/10.1109/CSBW.2005.97","url":null,"abstract":"Exhaustive search over 17313 unique protein sequences in the database PDB indicates the absence of 4036 of the 160000 possible subsequences of four residues (quads). When the polypeptides obtained by translating 100000 prion sequences in the database XPro are searched the number drops to 424, which still exceeds what would be obtained by pure chance. More generally there are 11444 quads that occur 3 or fewer times in PDB. Using the Kyte-Doolittle hydrophobicity index, the 4036 quads (including the 424 absent in XPro) are divided into 16 groups, five of which can form unbroken helices or sheets by repetition. Most of the 16 groups are evenly distributed, one exception being quads with all-apolar residues, which are significantly less frequent. The helical and sheet structures so formed are artificial polypeptides not observed in nature. By using patterns from the other 11 groups more complex structures can be formed. Such structures could potentially serve as tubules and substrates in nanostructure design.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128267675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open microscopy environment","authors":"I. Goldberg","doi":"10.1109/CSBW.2005.100","DOIUrl":"https://doi.org/10.1109/CSBW.2005.100","url":null,"abstract":"Summary form only given. The Open Microscopy Environment (OME) is a framework for the management and analysis of image data and metadata in biological microscopy. Biological microscopy images can be collected in many different ways, and may represent many different kinds of information which changes continually with evolving technology and experimental goals. A framework that fully encompasses biological microscopy in scope cannot rely on a fixed data model - it must be designed to accommodate ever-changing informatics needs. The challenge posed by a fluid data model is similar to one addressed by the semantic web, principally support for arbitrary or user-defined semantics and ontologies. However, an analysis framework faces additional challenges besides the management of semantics: definition of analytic units as transforms between semantic constructs, definition of aggregates of analytic units to represent work flows, interfaces to algorithm implementations, and maintenance of data provenance or history. A collaborative scientific environment also demands that these locally defined semantics and transforms be fully transportable for subsequent review or analysis. OME incorporates these components in a database-backed system targeted at end-user biologists. This presentation will briefly describe the various components of OME: the semantic layer (Semantic Types - STs), the analytical un its used to transform between STs (Analysis Modules), the units of work flow (Analysis Chains), the work-flow processor (Analysis Engine), and the transportability layer (OME XML). An example of a complex work flow will also be presented illustrating how this system is used for automated image classification.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128507687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A generic algorithm to find all common intervals of two permutations","authors":"G. Feng, Yujiang Shan","doi":"10.1109/CSBW.2005.9","DOIUrl":"https://doi.org/10.1109/CSBW.2005.9","url":null,"abstract":"Let K he the set of {1,2,....,m}, [x, y] denote the set of [x,x+1,...,y], where 1/spl les/x,y/spl les/m. Given two permutations /spl sigma//sub A/ and /spl sigma//sub B/ of a set /spl aleph/, A 2-tuple of intervals ([x/sub 1/, y/sub 1/], [x/sub 2/, y/sub 2/]) is called common intervals if /spl sigma//sub A/([x/sub 1/, y/sub 1/])=([x/sub 2/, y/sub 2/]). In this paper, we propose a sufficient and necessary condition for a 2-tuple of intervals to be common intervals. Based on these conditions, we present a generic algorithm that finds all common intervals of these two permutations.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128598115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Novel hybrid hierarchical-K-means clustering method (H-K-means) for microarray analysis","authors":"Bernard Chen, R. Harrison, Yi Pan, P. Tai","doi":"10.1109/CSBW.2005.98","DOIUrl":"https://doi.org/10.1109/CSBW.2005.98","url":null,"abstract":"Hierarchical and k-means clustering are two major analytical tools for unsupervised microarray datasets. However, both have their innate disadvantages. Hierarchical clustering cannot represent distinct clusters with similar expression patterns. Also, as clusters grow in size, the actual expression patterns become less relevant. K-means clustering requires a specified number of clusters in advance and chooses initial centroids randomly: in addition, it is sensitive to outliers. We present a novel hybrid approach to combined merits of the two and discard disadvantages we mentioned above. It is different from existed method: carry out hierarchical clustering first to decide location and number of clusters in the first round and run the K-means clustering in another round. The brief idea is we cluster around half data through hierarchical clustering and succeed by K-means for the rest half in one single round. Also, our approach provides a mechanism to handle outliers. Comparing with existed hybrid clustering approach and K-means clustering in 2 different distance measure on Eisen's yeast microarray data, our method always generate much higher quality clusters.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133767312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Vaughan, Rahul Singh, Ilmi Yoon, M. Fuse
{"title":"Eigenphenotypes: towards an algorithmic framework for phenotype discovery","authors":"Alexander Vaughan, Rahul Singh, Ilmi Yoon, M. Fuse","doi":"10.1109/CSBW.2005.60","DOIUrl":"https://doi.org/10.1109/CSBW.2005.60","url":null,"abstract":"Studying the genetic control of molecular, anatomical and/or morphological phenotypes in model organisms is a powerful tool in the functional analysis of a gene. The goal of our research is to develop algorithms that discover phenotypes of behavior in model organisms, which may identify, categorize, and quantify these phenotypes under conditions of minimal a priori information. Starting from a non-invasive video monitoring of a model organism, we propose an eigen-decomposition of the organism's behavior captured in video. Traditional clustering techniques in space, time, and frequency can utilize this decomposition to characterize the categorical behaviors of an animal, and for an analysis of the behavioral repertoire. This supplies a quantified analysis of behavior with minimal assumptions, a crucial first step in the genetic analysis of behavior.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133457399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Zhong, Gulsah Altun, R. Harrison, P. Tai, Yi Pan
{"title":"Mining protein sequence motifs representing common 3D structures","authors":"Wei Zhong, Gulsah Altun, R. Harrison, P. Tai, Yi Pan","doi":"10.1109/CSBW.2005.93","DOIUrl":"https://doi.org/10.1109/CSBW.2005.93","url":null,"abstract":"Understanding the relationship between protein structure and its sequence is one of the most important tasks of current bioinformatics research. In this work, recurring protein sequence motifs are explored with a K-means clustering algorithm. No structural information is used during the clustering process so that the relationship between sequence similarity and structural similarity for sequence-based clusters can be studied. This work focuses on characterizing structural similarity so that the quality of sequence clusters can be assessed accurately. Analysis of results reveals that the combined metric of distance matrix root mean squared deviation for sequence cluster (dmRMSD/spl I.bar/SC) and torsion angle RMSD/spl I.bar/SC (taRMSD/spl I.bar/SC) can provide the reliable indication of structural similarity for sequence clusters. Based on our combined metric, the recurrent sequence clusters with high structural similarity are used to generate sequence motifs. The common 3D structure of a sequence motif is represented by both representative backbone torsion angles and average distance matrices of the sequence cluster used to produce this motif. These motifs provide the foundation to develop a protein vocabulary reflecting sequence-structure correspondence.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127948481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. M. Eklund, R. Bajcsy, J. Sprinkle, G. V. Simpson
{"title":"Computing MEG signal sources","authors":"J. M. Eklund, R. Bajcsy, J. Sprinkle, G. V. Simpson","doi":"10.1109/CSBW.2005.42","DOIUrl":"https://doi.org/10.1109/CSBW.2005.42","url":null,"abstract":"This paper deals with the complexity of the inverse computation of brain currents from magnetoencephalography (MEG) signals. MEG measures the magnetic field outside the head: in effect, the resultant field from the flow of current inside the brain. We describe our current techniques to perform this inverse computation (called source estimation in much of the literature), which provides a view of brain activity that is less sensitive to disturbances which affect other kinds of brain activity measurements, though much more expensive to record.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131110573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}