{"title":"Handling Data Sparseness in Gene Network Reconstruction","authors":"G. B. Bezerra, T.V. Barra, F. V. Zuben, L. Castro","doi":"10.1109/CIBCB.2005.1594900","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594900","url":null,"abstract":"One of the main problems related to regulatory network reconstruction from expression data concerns the small size and low quality of the available dataset. When trying to infer a model from little information it is necessary to give much more precedence to generalization, rather than specificity, otherwise, any attempt will be fated to overfitting. In this paper we address this issue by focusing on data sparseness and noisy information, and propose a density estimation technique that achieves regularized curves when data is scarce. We first compare the proposed method with the EM algorithm for mixture models on density estimation problems. Next, we apply the method, together with Bayesian networks, on realistic simulations of static gene networks, and compare the obtained results with the standard discrete Bayesian network model. We intend to demonstrate that adopting a discrete approach is not justifiable when only a small amount of information is available.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129325283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fuzzy Profile Hidden Markov Models for Protein Sequence Analysis","authors":"Niranjan P. Bidargaddi, M. Chetty, J. Kamruzzaman","doi":"10.1109/CIBCB.2005.1594950","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594950","url":null,"abstract":"Profile HMMs based on classical hidden Markov models have been widely applied for alignment and classification of protein sequence families. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile hidden Markov model to overcome the limitations of the statistical independence assumption of probability theory. The strong correlations and the sequence preference involved in the protein structures make fuzzy architecture based models as suitable candidates for building profiles of a given family since fuzzy set can handle uncertainties better than classical methods. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures using Choquet integrals which is extended to fuzzy Baum-Welch parameter estimation algorithm for profiles. It was built and tested on widely studied globin and kinase family sequences and its performance was compared with classical HMM. A comparative analysis based on Log-Likelihood (LL) scores of sequences and Receiver Operating Characteristic (ROC) demonstrates the superiority of fuzzy profile HMMs over the classical profile model.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114993635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Semi-Supervised Subspace Clustering Algorithm on Fitting Mixture Models","authors":"Young Bun Kim, Jean X. Gao","doi":"10.1109/CIBCB.2005.1594919","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594919","url":null,"abstract":"We propose a new subspace clustering algorithm (EPSCMIX), which is based on the feature saliency measure that is obtained by using both the Emerging Patterns algorithm and the EM algorithm, for the analysis of microarray data. For the model selection, it employs a novel agglomerative step together with MDL criterion. And, we present the result of comparative experiments between AIC, MDL and minimum message length (MML) used to determine a criterion for our algorithm. The robustness of using emerging patterns based on mixture models, as well as using the Gaussian mixture model for subspace clustering, was demonstrated on both synthetic and real data sets. In experiments, it also certified that a new agglomerative method that merges mostly correlated components with MDL consistently worked better than the one that removes weak weight components.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"85 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114002692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Protein Structure Analysis with Self-Organizing Maps","authors":"L. Hamel, Gongqin Sun, Jing Zhang","doi":"10.1109/CIBCB.2005.1594961","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594961","url":null,"abstract":"Establishing structure-function relationships on the proteomic scale is a unique challenge faced by bioinformatics and molecular biosciences. Large protein families represent natural libraries of analogues of a given catalytic or protein function, thus making them ideal targets for the investigation of structure-function relationships in proteins. To this end, we have developed a new technique for analyzing large amounts of detailed molecular structure information focusing on the functional centers of homologous proteins. Our approach uses unsupervised machine learning, in particular, self-organizing maps. The information captured by a self-organizing map and stored in its reference models highlights the essential structure of the proteins under investigation and can be effectively used to study detailed structural differences and similarities among homologous proteins. Our preliminary results obtained with a prototype based on these techniques demonstrate that we can classify proteins and identify common and unique structures within a family and, more importantly, identify common and unique structural features of different conformations of the same protein. The approach developed here outperforms many of today’s structure analysis tools. These tools are usually either limited by the number of proteins they can process at the same time or they are limited by the structural resolution they can accommodate, that is, many of the structural analysis tools that can handle multiple proteins at the same time limit themselves to secondary structure analysis and therefore miss fine structural nuances within proteins.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121902714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}