{"title":"Sequential diagonal linear discriminant analysis (SeqDLDA) for microarray classification and gene identification","authors":"R. Pique-Regi, Antonio Ortega, S. Asgharzadeh","doi":"10.1109/CSBW.2005.124","DOIUrl":"https://doi.org/10.1109/CSBW.2005.124","url":null,"abstract":"In microarray classification we are faced with a very large number of features and very few training samples. This is a challenge for classical Linear Discriminant Analysis (LDA), since reliable estimates of the covariance matrix cannot be obtained. Alternative techniques based on Diagonal LDA (DLDA) combined with an independent gene selection (filtering) have been proposed. In this paper we propose a novel sequential DLDA (SeqDLDA) technique that combines gene selection and classification. At each iteration, one gene is sequentially added and the linear discriminant (LD) recomputed using the DLDA model (i.e., a diagonal co-variance matrix). Classical DLDA will add the gene with highest t-test score without checking the resulting model. In contrast, SeqDLDA will find the one gene that better improves class separation after recomputing the model measured using a robustified t-test score. We evaluate the new method in several 2-class datasets (Neuroblastoma, Prostate, Leukemia, Colon) using 10-fold cross-validation. For example, for the Neuroblastoma data set, the average misclassification rate of DLDA (16.91%) is significantly reduced to 13.87% using SeqDLDA.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133513092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An approach to distributed interactive simulation and visualization of complex systems using cluster computing","authors":"D. Gračanin","doi":"10.1109/CSBW.2005.20","DOIUrl":"https://doi.org/10.1109/CSBW.2005.20","url":null,"abstract":"When dealing with complex systems, interactive, realtime simulations require significant computational capabilities that can be provided by cluster computing. Current cluster computing based techniques are mostly focused on batch jobs. However, it is possible to use clusters so that an application can run and directly communicate with the remote client(s). Direct communication enables, without loss of accuracy or frame rate, real time visualization of and interaction with much larger models compared to a single machine implementation. The degree of coupling between the dependent variables in the model determines the degree of parallelization that can be achieved by evaluating the solution for each dependent variable in parallel. A distributed mass-spring simulation system was developed to serve as an open platform that can be used to improve the scalability of the simulation computation. Several techniques are used to improve scalability, both in terms of the problem size and number of clients. The developed system provides support for large scale mass-spring simulations to leverage available cluster computing and visualization resources. It can be applied to a wide range of problems related to de-formable solids including many biologically related like human organ modeling and medical animation where realtime feedback is required.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131772891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new clustering strategy with stochastic merging and removing based on kernel functions","authors":"Huimin Geng, H. Ali","doi":"10.1109/CSBW.2005.10","DOIUrl":"https://doi.org/10.1109/CSBW.2005.10","url":null,"abstract":"With hierarchical clustering methods, divisions or fusions, once made, are irrevocable. As a result, when two elements in a bottom-up algorithm are assigned to one cluster, they cannot subsequently be separated. Also, when a top-down algorithm separates two elements, they can't be rejoined. Such greedy property may lead to premature convergence and consequently lead to a clustering that is far from optimal. To overcome this problem, we propose a new Stochastic Message Passing Clustering (SMPC) method based on the Message Passing Clustering (MPC) algorithm introduced in our earlier work. SMPC, as a generalized version of MPC, extends the clustering algorithm from a deterministic process to a stochastic process, adding two major advantages. First, in deciding the merging cluster pair, the influences of all clusters are quantified by probabilities, estimated by kernel functions based on their relative distances. Secondly, clustering can be undone to improve the clustering performance when the algorithm detects elements which don't have good probabilities inside the cluster and moves them outside. The test results on colon cancer gene-expression data show that SMPC performs better than the deterministic MPC or hierarchical clustering method.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128998365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kwangmin Choi, Jeong-Hyeon Choi, A. Saple, Zhiping Wang, Jason Lee, Sun Kim
{"title":"PLATCOM: a platform for computational comparative genomics on the Web","authors":"Kwangmin Choi, Jeong-Hyeon Choi, A. Saple, Zhiping Wang, Jason Lee, Sun Kim","doi":"10.1109/CSBW.2005.107","DOIUrl":"https://doi.org/10.1109/CSBW.2005.107","url":null,"abstract":"The exponential accumulation of genomic sequence data demands systematic analysis of genetic information and requires use of various computational approaches to handle such huge sets of genomic data. Comparative genomics, with such organized data and diverse computational techniques, has become useful not only for finding common features in different genomes, but also for understanding evolutionary process and mechanism among multiple genomes. We have developed high performance data mining tools of our own. With the databases and sequence analysis tools genomes can be compared. There are currently six modules: genome plot, conserved gene neighbourhood navigation, metabolic pathways, comparative sequence clustering analysis, putative gene fusion events detection, and multiple genome alignment. A set of genomes selected by users is submitted with parameter settings via Web interface.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122581740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating life sciences applications in the architectural optimizations of next-generation petaflop-system","authors":"David A. Bader, Vipin Sachdeva","doi":"10.1109/CSBW.2005.77","DOIUrl":"https://doi.org/10.1109/CSBW.2005.77","url":null,"abstract":"Advances in experimental techniques have transformed biology into a data-intensive science, with a rapid explosion of data at the genomic and proteomic level. Few comprehensive suites of computationally-intensive life science applications are available to the computer science community for optimization of current high-performance architectures specifically targeted towards the computational biology applications. BioSplash represents a wide variety of open-source codes spanning the heterogeneity of algorithms, biological problems, popularity among biologists, and memory traits, gearing the suite to be of importance to both biologists and computer scientists.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116793572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Krämer, D. Richards, James O. Bowlby, R. Felciano
{"title":"Functional modularity in a large-scale mammalian molecular interaction network","authors":"Andreas Krämer, D. Richards, James O. Bowlby, R. Felciano","doi":"10.1109/CSBW.2005.67","DOIUrl":"https://doi.org/10.1109/CSBW.2005.67","url":null,"abstract":"The Ingenuity/spl trade/ Pathways Knowledge Base (IPKB) contains over one million findings manually curated from the scientific literature. Highly-structured content from the IPKB forms the basis for a large-scale molecular network of direct interactions observed between mammalian orthologs, which is used in Ingenuity's Pathway Analysis (IPA) system. In this study we explore the relationship between this global network and known functional annotations of genes. In particular we show that (a) subnetworks formed by genes annotated with the same functional category have significantly more edges than equivalent random subnetworks, and (b) highly-interconnected subnetworks are significantly enriched in genes with specific functional annotations.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125203370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyon Chang Kim, Yong Beom Seo, Ji Hwan Song, D. Choi, C. Min, Han Jip Kim
{"title":"Massive multiple sequence alignment of 16S bacterial ribosomal RNAs using ClustalW-Message Passing Interface (MPI) Based on Beowulf Linux system","authors":"Hyon Chang Kim, Yong Beom Seo, Ji Hwan Song, D. Choi, C. Min, Han Jip Kim","doi":"10.1109/CSBW.2005.88","DOIUrl":"https://doi.org/10.1109/CSBW.2005.88","url":null,"abstract":"We have built a Debian-Beowulf computer cluster consisting of 15 computational nodes, each equipped with 15 AMD Opteron 64 bit microprocessors. Local Area Multicomputer (LAM) - Message Passing Interface (MPI) was used as a portable high-performance implementation for MPI. More than 2,000 bacterial 16S ribosomal RNAs (rRNAs) were multiply aligned using ClustalW-MPI. Systematic sequence comparison provided several sequences with a very high degree of homology despite their different origins of species. These highly conservative sequences were collected as candidate sequences for drug targets of ribosomal antibiotics.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124331989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consensus methods using phylogenetic databases","authors":"M. Kulkarni, Bernard M. E. Moret","doi":"10.1109/CSBW.2005.43","DOIUrl":"https://doi.org/10.1109/CSBW.2005.43","url":null,"abstract":"With the increasing use and size of phytogenies, the output of reconstruction programs must be stored for future reference, in which case post-tree analyses such as consensus must be run from a database. We set out to determine whether such analyses can be run at a reasonable cost; we chose consensus (which summarizes the information from many trees into a single tree) because of its general applicability and because it creates a severe demand on the database by requiring examination of every edge of every tree. We preprocess the data (trees) to create tables that support consensus computations, using our own extensions to the PhyloDB schema of Nakhleh et al. For each of the three consensus methods (strict, majority, and greedy), we compare the database computation with the memory-resident computation using the Phylip consensus programs. We use a large selection of datasets of varying sizes (up to 1,000 trees of up to 1,500 taxa each) and of varying degrees of commonality. The computations from the database are very practical: they often run faster, and never run more than 5 times slower, than the computations in main memory using Phylip. The additional storage costs are easily handled by any database system, while the preprocessing costs remain reasonable. Thus suitable preprocessing of phylogenetic data allows post-tree analyses to be run directly from the database at much the same cost as current memory-resident analyses.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124391869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Birmingham, E. Anderson, W. Marshall, A. Khvorova
{"title":"Maximum sequence alignment fails to predict off-targeted gene regulation by RNAi","authors":"A. Birmingham, E. Anderson, W. Marshall, A. Khvorova","doi":"10.1109/CSBW.2005.90","DOIUrl":"https://doi.org/10.1109/CSBW.2005.90","url":null,"abstract":"We have employed various sequence alignment algorithms and scoring techniques to determine whether current computational tools accurately predict genes that will be off-targeted by the RNA interference (RNAi) pathway. Our studies show that distributions of maximum alignment scores for off-targeted and untargeted genes are statistically indistinguishable, indicating that maximum complementarity by itself is an unsatisfactory predictor of off-targeting. Interestingly, a highly significant association was observed between off-targeting and exact complementarity between the seed region (bases 2-7) of siRNA and their off-targeted genes. This pattern has been previously recognized in microRNA-mediated gene knockdown and suggests a distinctive role for the 5 terminus of these strands in RNAi-triggered gene suppression.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123676146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lossless compression of DNA microarray images","authors":"Yong Zhang, Rahul Parthe, D. Adjeroh","doi":"10.1109/CSBW.2005.85","DOIUrl":"https://doi.org/10.1109/CSBW.2005.85","url":null,"abstract":"Microarray experiments are characterized by a massive amount of data, usually in the form of an image. Based on the nature of microarray images, we consider the microarray in terms of its structure and statistics. Based on the microarray image model, we propose a context-based method for lossless compression of microarray images using prediction by partial approximate matching (PPAM). In synchronization experiments, the raw data consists of two channel microarray images. The correlation between these two channel microarray images is explored in order to improve the compression performance. Our results show that, the proposed approach produces a better compression result when compared with results from the best-known microarray compression algorithm.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125550818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}