{"title":"Regularization of sequence data for machine learning","authors":"Bryan Bai, S. C. Kremer","doi":"10.1109/BIBMW.2011.6112350","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112350","url":null,"abstract":"We examine the problem of classifying biological sequences, and in particular the challenge of generalizing results to novel input data. We observe that the high-dimensionality of sequence data representations results in an extremely sparsely populated input space. This motivates a need for regularization (a form of inductive bias), in order to achieve generalization. We discuss regularization in the context of regular neural networks, deep belief networks and support vector machines, and provide experimental results for these architectures. Our results support the importance of using an effective regularization method and identify which methods work well on a real-world dataset.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"13 1","pages":"19-25"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87631016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianxin Wang, Xiaoqing Peng, Min Li, Yong Luo, Yi Pan
{"title":"Active Protein Interaction Network and Its Application on Protein Complex Detection","authors":"Jianxin Wang, Xiaoqing Peng, Min Li, Yong Luo, Yi Pan","doi":"10.1109/BIBM.2011.45","DOIUrl":"https://doi.org/10.1109/BIBM.2011.45","url":null,"abstract":"In recent years, more and more attentions are focused on modelling and analyzing dynamic network. Some researchers attempted to extract dynamic network by combining the dynamic information from gene expression data or sub cellular localization data with protein network. However, the dynamics of proteins' presence does not guarantee the dynamics of interactions, since the presence of a protein does not indicate the protein's activity. The activity of a protein is closely connected with its function. Thus only the dynamics of proteins activity ensure the dynamics of interaction. The gene expression of a cellular process or cycle carries more information than only the dynamics of proteins' presence. We assume that a protein is active when its expression values are near its maximum expression value, since the expression quantity will decrease after it has performed its function that leads a feedback for controlling the expression quantity. In this paper, we proposed a method to identify active time points for each protein in a cellular process or cycle by using a 3-sigma principle to compute an active threshold for each gene according to the characteristics of its expression curve. Combined the activity information and protein interaction network, we can construct an active protein interaction network (APPI). To demonstrate the efficiency of APPI network model, we applied it on complex detection. Compared with single threshold time series networks, APPI network achieves a better performance on protein complex prediction.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"12 1","pages":"37-42"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90168797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Guzzi, Marianna Milano, P. Veltri, M. Cannataro
{"title":"Using semantic similarity to detect features in yeast protein complexes","authors":"P. Guzzi, Marianna Milano, P. Veltri, M. Cannataro","doi":"10.1109/BIBMW.2011.6112419","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112419","url":null,"abstract":"Biological data stored in databases can be associated with information (knowledge) such as experiments, properties and functions, response to drugs etc. Such a knowledge is often stored in biological ontologies. Gene Ontology is one of the main resource of biological knowledge providing both a categorization of terms and a source of annotation for genes and proteins. This enables the use of ontology-based methodologies for the analysis of proteins and their functions. One methodology is based on semantic based similarity measures. Recently there is a growing interest in the use of semantic based methodologies to the analysis of protein interaction data such as the prediction of protein complexes based on semantic similarity measures. Despite this interest, there is the need for an assessment of semantic measures as well as a deep study on the impact of the chosen measure in the obtained results. This paper treats the problem of using semantic similarity measure to analyse protein complexes and to improve protein complexes prediction frameworks. Tests have been performed in yeast protein complexes. Results indicate that there exists a bias among measures as well as an higher value of semantic similarity within proteins that participate in the same complex, proving both a possible use of semantic similarity protein complexes prediction and a suggestion in the measure.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"119 5 1","pages":"495-502"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88751159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparative study of text classification approaches for personalized retrieval in PubMed","authors":"Sachintha Pitigala, Cen Li, S. Seo","doi":"10.1109/BIBMW.2011.6112503","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112503","url":null,"abstract":"Retrieval of the information relevant to one's need from PubMed is becoming increasingly challenging due to its large volume and rapid growth. The traditional information search techniques based on keyword matching are insufficient for large databases such as PubMed. A personalized article retrieval system that is tailored to individual researcher's specific interests and selects only highly relevant articles can be a helpful tool in the field of Bioinformatics. The text classification methods developed in the text mining community have shown good results in differentiating relevant articles from the irrelevant ones. This study compares two text classification methods, Naïve Bayes and Support Vector Machines, in order to study the effectiveness of the two methods on classifying full text articles in the case when only a small set of training data is available. The comparison results show that the Naïve Bayes method is a better choice than Support Vector Machines in building a personalized article retrieval system which can learn (train) from a small set of full text articles.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"67 1","pages":"919-921"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89067841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cynthia Zavala, N. Serao, M. Villamil, G. Caetano-Anollés, S. Rodriguez-Zas
{"title":"Additive and multiplicative genome-wide association models identify genes associated with growth","authors":"Cynthia Zavala, N. Serao, M. Villamil, G. Caetano-Anollés, S. Rodriguez-Zas","doi":"10.1109/BIBMW.2011.6112527","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112527","url":null,"abstract":"Standard genome-wide association studies evaluate the association between single nucleotide polymorphisms (SNPs or Genotype G) and phenotype (e.g. growth) conditional on non-SNP covariates including environmental factors (E, e.g. diet) or population stratification, on an additive fashion. For traits known to be the result of genotype-by-environment interactions (G×E), like growth, a multiplicative model could potentially uncover additional SNPs that influence growth on a context-dependent (e.g. diet or breed) fashion. The objective of this study was to assess and compare the performance of context-independent (additive, G+E) and context-dependent (multiplicative, G+E+G×E) models to identify polymorphisms and corresponding genes associated with growth that are context-independent and context-dependent. In addition to single-SNP analysis, a multi-SNP haplotype-based analysis that can increase the precision of the estimates was evaluated for the additive model.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"24 1","pages":"975-977"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85978997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved RNA-Seq Partitions in Linear Models for Isoform Quantification","authors":"Brian E. Howard, P. Veronese, S. Heber","doi":"10.1109/BIBM.2011.102","DOIUrl":"https://doi.org/10.1109/BIBM.2011.102","url":null,"abstract":"Here, we present an extension of our is form quantification method that accommodates paired end RNA Sequencing data. We explore several alternate methods of partitioning read count data in order to better exploit the available fragment size distribution, and to reduce the variance in the resulting estimates. In many cases, this significantly improves the accuracy of our approach.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"24 1","pages":"151-154"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85037371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. M. Fawcett, S. Irausquin, Mikhail Simin, H. Valafar
{"title":"An Artificial Neural Network Based Approach for Identification of Native Protein Structures Using an Extended Forcefield","authors":"T. M. Fawcett, S. Irausquin, Mikhail Simin, H. Valafar","doi":"10.1109/BIBM.2011.53","DOIUrl":"https://doi.org/10.1109/BIBM.2011.53","url":null,"abstract":"Current protein force fields like the ones seen in CHARMM or Xplor-NIH have many terms that include bonded and non-bonded terms. Yet the force fields do not take into account the use of hydrogen bonds which are important for secondary structure creation and stabilization of proteins. SCOPE is an open-source program that generates proteins from rotamer space. It then creates a force field that uses only non-bonded and hydrogen bond energy terms to create a profile for a given protein. The profiles can then be used in an artificial neural network to create a linear model which is funneled to the true protein conformation.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"15 1","pages":"500-505"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89766963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiobjective optizition shuffled frog-leaping biclustering","authors":"Junwan Liu, Xiaohua Hu, Zhoujun Li, Yiming Chen","doi":"10.1109/BIBMW.2011.6112368","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112368","url":null,"abstract":"Biclustering of DNA microarray data that can mine significant patterns to help in understanding gene regulation and interactions. This is a classical multi-objective optimization problem (MOP). Recently, many researchers have developed stochastic search methods that mimic the efficient behavior of species such as ants, bees, birds and frogs, as a means to seek faster and more robust solutions to complex optimization problems. The particle swarm optimization(PSO) is a heuristics-based optimization approach simulating the movements of a bird flock finding food. The shuffled frog leaping algorithm (SFLA) is a population-based cooperative search metaphor combining the benefits of the local search of PSO and the global shuffled of information of the complex evolution technique. This paper introduces SFL algorithm to solve biclustering of microarray data, and proposes a novel multi-objective shuffled frog leaping biclustering(MOSFLB) algorithm to mine coherent patterns from microarray data. Experimental results on two real datasets show that our approach can effectively find significant biclusters of high quality.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"46 1","pages":"151-156"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84430425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Traditional Chinese Medicine syndromes of psoriasis in Chinese patients: Contribution of demographic and clinical variables","authors":"Zehui He, Chuanjian Lu, A. Ou","doi":"10.1109/BIBMW.2011.6112468","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112468","url":null,"abstract":"This article was to examine the specific contribution of demographic, medical and psychological variables to the Traditional Chinese Medicine (TCM) Syndromes of psoriasis. A cross-sectional survey of psoriasis patients was conducted at 7 hospitals of TCM of different regions. In all, 671 psoriasis patients underwent a clinical assessment including differentiation of TCM syndromes and psoriasis severity (assessed by Psoriasis Area and Severity Index, PASI). Patients also completed questions on demographic data and a quality of life scale (Dermatology Life Quality Index, DLQI). The three main TCM syndromes were included: 354 patients with Wind-heat (52.8%), 161 with Blood-stasis (24.0%), and 156 with Blood-dryness (23.2%). They distributed differently in subgroups of patients with different gender, age, chronic disease, duration of psoriasis, PASI, and DLQI score. The TCM syndromes were close related to demographic and clinical conditions of patients. TCM clinical treatment should consider both characteristics of syndrome and demographic variables of psoriasis.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"20 1","pages":"765-768"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87906436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Condition-Dependent Dynamical PPI Networks from Conflict-Sensitive Phosphorylation Dynamics","authors":"Qiong Cheng, M. Ogihara, Vineet K Gupta","doi":"10.1109/BIBM.2011.127","DOIUrl":"https://doi.org/10.1109/BIBM.2011.127","url":null,"abstract":"An important issue in protein-protein interaction network studies is the identification of interaction dynamics. Two factors contribute to the dynamics. One, not all proteins may be expressed in a given cell, and two, competition may exist among multiple proteins for a particular protein domain. Taking into account these two factors, we propose a novel approach to predict protein-protein interaction network dynamics by learning from conflict-sensitive phosphorylation dynamics. We built a training model from conflict-sensitive phosphorylation dynamics. In this model, each node is not an individual protein but a protein-protein pair and is labeled with terms representing conditions in which the interaction should be observed. We mapped the protein pairs in a vector space, built hyper-edges over the interaction nodes, and developed rank-like SVM with Laplacian regularizers for PPI network dynamics prediction. We also employed the standard F1 measure for evaluating the effectiveness of classification results.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"48 1","pages":"309-312"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87013565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}