{"title":"Detecting experimental noises in protein-protein interactions with iterative sampling and model-based clustering","authors":"Hiroshi Mamitsuka","doi":"10.1109/BIBE.2003.1188977","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188977","url":null,"abstract":"One of the most important issues in current molecular biology is to build exact networks of protein-protein interactions. Recently developed high-throughput experimental techniques accumulate a vast amount of protein-protein interaction data, but it is well known that data reliability has not reached at a satisfactory level. In this paper we attempt to computationally detect experimental errors or noises presumably contained in the protein-protein interaction data by an iterative sampling method using the learning of a stochastic model as its subroutine. The method repeats two steps of selecting examples that can be regarded as non-noises, and training the component algorithm with the selected examples alternately. Noise candidates are selected as the examples having the smallest average likelihoods computed by previously obtained stochastic models. We empirically evaluated the method with other two methods by using both synthetic and real data sets. We examined the effect of noises and data sizes by using medium- and large-sized synthetic data sets that contain noises added intentionally. The results obtained by the medium-sized synthetic data sets show that the significance level of the performance difference between the method and the two other methods has more pronounced for higher noise ratios. Further experiments show that this experimental finding was also true of a large-scale data set. The performance advantage of the method was further confirmed by the experiments using a real protein-protein interaction data set.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128198396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hsien-Da Huang, Huei-Lin Chang, T. Tsou, Baw-Jhiune Liu, Jorng-Tzong Horng
{"title":"A data mining method to predict transcriptional regulatory sites based on differentially expressed genes in human genome","authors":"Hsien-Da Huang, Huei-Lin Chang, T. Tsou, Baw-Jhiune Liu, Jorng-Tzong Horng","doi":"10.1109/BIBE.2003.1188966","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188966","url":null,"abstract":"Very large-scale gene expression analysis, i.e., UniGene and dbEST, are provided to find those genes with significantly differential expression in specific tissues. The differentially expressed genes in a specific tissue are potentially regulated concurrently by a combination of transcription factors. This study attempts to mine putative binding sites on how combinations of the known regulatory sites homologs and over-represented repetitive elements are distributed in the promoter regions of considered groups of differentially expressed genes. We propose a data mining approach to statistically discover the significantly tissue-specific combinations of known site homologs and over-represented repetitive sequences, which are distributed in the promoter regions of differential gene groups. The association rules mined would facilitate to predict putative regulatory elements and identify genes potentially co-regulated by the putative regulatory elements.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125022528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Tamagawa, F. Nogata, Toyotaka Watanabe, A. Abe, S. Popovic
{"title":"Influence of the thermal treatment applied to PAN gel on its length change and generated force","authors":"H. Tamagawa, F. Nogata, Toyotaka Watanabe, A. Abe, S. Popovic","doi":"10.1109/BIBE.2003.1188964","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188964","url":null,"abstract":"PAN gel is known for its strong matrix as well as for its fast length change by the acid-base environmental solution exchange. Besides, PAN gel is a quite soft material like a real cell. Therefore it's been regarded as a most promising material as an artificial muscle. However, its matrix strength declines extremely unfortunately in basic solution. Its matrix should be improved so as not decline, otherwise it cannot be an artificial muscle for practical use. We applied a high temperature thermal treatment and a subsequent hydrolysis to PAN gel prepared through the nearly conventional processing method, and we investigated the time dependence of its length change ratio and generated force through the acid-base solution exchange, and its durability. Although its length change and force generation performances were impaired to some extent, we found an improvement of its matrix robustness and durability.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125108936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing the Escherichia coli gene expression data by a multilayer adjusted tree organizing map","authors":"Ning Wei, L. Gruenwald, T. Conway","doi":"10.1109/BIBE.2003.1188965","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188965","url":null,"abstract":"Using the DNA microarray technology, biologists have thousands of array data available. Discovering the function relations between genes and their involvements in biological processes depends on the ability to efficiently process and quantitatively analyze large amounts of array data. Clustering algorithms are among the popular tools that can be used to help biologists achieve their goals. Although some existing research projects employed clustering algorithms on biological data, none of them has examined the Escherichia coli (E. coli) gene expression data. This paper proposes a clustering algorithm called Multilayer Adjusted Tree Organizing Map (MA TOM) to analyze the E. coli gene expression data. In a semi-supervised manner, MATOM constructs a multilayer map, and at the same time, removes noise data in the previously trained maps in order to improve the training process. This paper then presents the clustering results produced by MATOM and other existing clustering algorithms using the E. coli gene expression data, and a new evaluation method to assess them. The results show that MATOM performs the best in terms of percentage of genes that are clustered correctly.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122716356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of contact maps using support vector machines","authors":"Ying Zhao, G. Karypis","doi":"10.1109/BIBE.2003.1188926","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188926","url":null,"abstract":"Contact map prediction is of great interest for its application in fold recognition and protein 3D structure determination. In this paper we present a contact-map prediction algorithm that employs Support Vector Machines as the machine learning tool and incorporates various features such as sequence profiles and their conservation, correlated mutation analysis based on various amino acid physicochemical properties, and secondary structure. In addition, we evaluated the effectiveness of the different features on contact map prediction for different fold classes. On average, our predictor achieved a prediction accuracy of 0.2238 with an improvement over a random predictor of a factor 11.7, which is better than reported studies. Our study showed that predicted secondary structure features play an important roles for the proteins containing beta structures. Models based on secondary structure features and CMA features produce different sets of predictions. Our study also suggests that models learned separately for different protein fold families may achieve better performance than a unified model.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127652614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolving bubbles for prostate surface detection from TRUS images","authors":"Fan Shao, K. Ling, W. Ng","doi":"10.1109/BIBE.2003.1188936","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188936","url":null,"abstract":"Prostate boundary detection from ultrasound images plays a key role in prostate disease diagnoses and treatments. Due to the poor quality of ultrasound images, however, this still remains as a difficult task. Currently, boundary detection are performed manually, which is arduous and heavily user dependent. This paper presents a new approach derived from level set method to semiautomatically detect the prostate surface from 3D transrectal ultrasound images. In this method, a few initial bubbles are simply specified by the user from five particular slices based on the prostate shape. When bubbles evolve, they expand, shrink merge and split, and finally produce the desired prostate surface. To remedy the \"boundary leaking\" problem caused by gaps or weak boundaries, both region information and statistical intensity distribution are incorporated into the model. We applied the proposed method to eight 3D TRUS images and the results have shown its effectiveness.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115026126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time series analysis of gene expression and location data","authors":"Chen-Hsiang Yeang, T. Jaakkola","doi":"10.1109/BIBE.2003.1188967","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188967","url":null,"abstract":"We develop a method for integrating time series expression profiles and factor-gene binding data to quantify dynamic aspects of gene regulation. We estimate latencies for transcription activation by explaining time correlations between gene expression profiles through available factor-gene binding information. The resulting aligned expression profiles are subsequently clustered and again combined with binding information to determine groups or subgroups of co-regulated genes. The predictions derived from this approach are consistent with existing results. Our analysis also provides several hypotheses not implicated in previous studies.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115893429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Determination of the minimum sample size in microarray experiments to cluster genes using k-means clustering","authors":"Fang-Xiang Wu, W. Zhang, A. Kusalik","doi":"10.1109/BIBE.2003.1188979","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188979","url":null,"abstract":"Gene expression profiles obtained from time-series microarray experiments can reveal important information about biological processes. However, conducting such experiments is costly and time consuming. The cost and time required are linearly proportional to sample size. Therefore, it is worthwhile to provide a way to determine the minimal number of samples or trials required in a microarray experiment. One of the uses of microarray hybridization experiments is to group together genes with similar patterns of the expression using clustering techniques. In this paper, the k-means clustering technique is used. The basic idea of our approach is an incremental process in which testing, analysis and evaluation are integrated and iterated. The process is terminated when the evaluation of the results of two consecutive experiments shows they are sufficiently close. Two measures of \"closeness\" are proposed and two real microarray datasets are used to validate our approach. The results show that the sample size required to cluster genes in these two datasets can be reduced; i.e. the same results can be achieved with less cost. The approach can be used with other clustering techniques as well.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123035823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vessel extraction in medical images by 3D wave propagation and traceback","authors":"C. Kirbas, Francis K. H. Quek","doi":"10.1109/BIBE.2003.1188944","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188944","url":null,"abstract":"This paper presents an approach for the extraction of vasculature from a volume of Magnetic Resonance Angiography (MRA) images by using a 3D wave propagation and traceback mechanism. We discuss both the theory and the implementation of the approach. Using a dual-sigmoidal filter, we label each voxel in the MRA volume with the likelihood that it is within a vessel. Representing the reciprocal of this likelihood image as an array of refractive indices, we propagate a digital wave through the volume from the base of the vascular tree. This wave 'washes' over the vasculature and extracts the vascular tree, ignoring local noise perturbations. While the approach is inherently SIMD we present an efficient sequential algorithm for the wave propagation, and discuss the traceback algorithm. We demonstrate the effectiveness of our integer image neighborhood-based algorithm and its robustness to image noise.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129697849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Mao, Weijia Xu, Neha Singh, Daniel P. Miranker
{"title":"An assessment of a metric space database index to support sequence homology","authors":"Rui Mao, Weijia Xu, Neha Singh, Daniel P. Miranker","doi":"10.1109/BIBE.2003.1188976","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188976","url":null,"abstract":"Hierarchical metric-space clustering methods have been commonly used to organize proteomes into taxonomies. Consequently, it is often anticipated that hierarchical clustering can be leveraged as a basis for scalable database index structures capable of managing the hyper-exponential growth of sequence data. M-tree is one such data structure specialized for the management of large data sets on disk. We explore the application of M-trees to the storage and retrieval of peptide sequence data. Exploiting a technique first suggested by Myers (1994), we organize the database as records of fixed length substrings. Empirical results are promising. However, metric-space indexes are subject to \"the curse of dimensionality\" and the ultimate performance of an index is sensitive to the quality of the initial construction of the index. We introduce new hierarchical bulk-load algorithm that alternates between top-down and bottom-up clustering to initialize the index. Using the Yeast Proteomes, the bi-directional bulk load produces a more effective index than the existing M-tree initialization algorithms.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128517346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}