{"title":"Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design","authors":"Hiroshi Mamitsuka","doi":"10.1109/BIBE.2003.1188959","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188959","url":null,"abstract":"Discovering a new drug is one of the most important goals in not only the pharmaceutical field but also a variety of fields including molecular biology, chemistry and medical science. The importance of computationally understanding the relationships between a given chemical compound and its drug activity has been pronounced. In the data set regarding drug activity of chemical compounds, each row corresponds to a chemical compound, and columns are the descriptors of the compound and a label indicating drug activity of the compound Recently, the size of the descriptors has become larger to obtain more detailed information from a given set of compounds. Actually, the number of columns (attributes or features) of some drug data sets reaches hundreds of thousands or a million. The purpose of this paper is to empirically evaluate the performance of ensemble feature subset selection strategies by applying them to such a high-dimensional data set actually used in the process of drug design. We examined the performance of three ensemble methods, including a query learning based method, comparing with that of one of the latest feature subset selection methods. The evaluation was performed on a data set which contains approximately 140,000 features. Our results show that the query learning based methodology outperformed the other three methods, in terms of the final prediction accuracy and time efficiency. We have also examined the effect of noise in the data and found that the advantage of the method becomes more pronounced for larger noise levels.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121138595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic detection of premature ventricular contraction using quantum neural networks","authors":"Jie Zhou","doi":"10.1109/BIBE.2003.1188943","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188943","url":null,"abstract":"Premature ventricular contractions (PVCs) are ectopic heart beats originating from ventricular area. It is a common form of heart arrhythmia. Electrocardiogram (ECG) recordings have been widely used to assist cardiologists to diagnose the problem. In this paper, we study the automatic detection of PVC using a fuzzy artificial neural network named Quantum Neural Network (QNN). With the quantum neurons in the network, trained QNN can model the levels of uncertainty arising from complex classification problems. This fuzzy feature is expected to enhance the reliability of the algorithm, which is critical for the applications in the biomedical domain. Experiments were conducted on ECG records in the MIT-BIH Arrhythmia Database. Results showed consistently higher or same reliability of QNN on all the available records compared to the backpropagation network. QNN, however, has a relatively higher resource requirement for training.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123775658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Eibrand, P. Kennedy, D. Cotter, U. MacEvilly, Bing Wu
{"title":"Analysis of Atlantic salmon skin mucus: COPS-a computer-based system for protein pattern analysis of 1D SDS-PAGE gels","authors":"Richard Eibrand, P. Kennedy, D. Cotter, U. MacEvilly, Bing Wu","doi":"10.1109/BIBE.2003.1188928","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188928","url":null,"abstract":"This paper presents an approach that applies a combination of computing techniques, including image processing and analysis, syntactic pattern matching, clustering techniques and artificial neural networks to interpret biological data. The application domain being is the analysis of 1D SDS-PAGE gels of Atlantic salmon skin mucus. Researchers in our group have visually identified protein band intensity patterns in the salmon's skin mucus. The objective is to produce a system to minimize the loss of livestock in the fish farming industry. Initial results of the gel image analysis application and manual data analysis have shown that reproducible patterns exist within the gel band data and can be classified as either increasing or decreasing patterns. This type of analysis is not restricted to the analysis of Atlantic salmon skin mucus proteins, but can be extended to other proteins that exhibit recurring patterns over a period of time that require identification and classification.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126742303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Jarman, W. Cannon, Kristin H. Jarman, A. Heredia-Langner
{"title":"A model of random sequences for de novo peptide sequencing","authors":"K. Jarman, W. Cannon, Kristin H. Jarman, A. Heredia-Langner","doi":"10.1109/BIBE.2003.1188948","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188948","url":null,"abstract":"We present a model for the probability of random sequences appearing in product ion spectra obtained from tandem mass spectrometry experiments using collision-induced dissociation. We demonstrate the use of these probabilities for ranking candidate peptide sequences obtained using a de novo algorithm. Sequence candidates are obtained from a spectrum graph that is greatly reduced in size from those in previous graph-theoretical de novo approaches. Evidence of multiple instances of subsequences of each candidate, due to different fragment ion type series as well as isotopic peaks, is incorporated in a hierarchical scoring scheme. This approach is shown to be useful for confirming results from database search and as a first step towards a statistically rigorous de novo algorithm.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121910434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A biological mapping of a learned avoidance behavior model to the basal ganglia","authors":"K. Biddell, Jinghong Li, Jeffrey D. Johnson","doi":"10.1109/BIBE.2003.1188963","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188963","url":null,"abstract":"In this paper we map a computational model of learned avoidance behavior in a one-way avoidance experiment to the biology of the basal ganglia. We extend our previous work to develop a more biologically accurate mapping. Learned avoidance behavior is a critical component of animal survival; thus, a model of animal learning should account for this phenomenon. Through long term potentiation and long term depression at the corticostriatal synapses, we propose that a prediction of the expected future benefit is generated by the animal. We map a reinforcement center of the model to the indirect pathway of the basal ganglia and a motor center to the direct pathway. Finally, we propose that an external reinforcement signal, in the form of pain caused by an electric shock, is transferred from the thalamus to the subthalamic nucleus.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129692587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GenoMosaic: on-demand multiple genome comparison and comparative annotation","authors":"C. Gibas, D. Sturgill, J. Weller","doi":"10.1109/BIBE.2003.1188942","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188942","url":null,"abstract":"GenoMosaic is a portable database application for on demand multiple genome comparison. We discuss the methods used to generate a GenoMosaic data set from genome sequence data, and present the relational data model used in the application. We define an abstraction of genome sequence data (the feature mosaic) that allows us to bridge between annotation that describes features within single genes and that which includes possibly multiple genes and intergenic features over long stretches of genomic sequence. The goal of this project is to support new method development for on-demand multiple genome comparison. Each genome to be compared can be modeled as a string of generic features of any type that can be computationally defined, related by adjacency information within and among genomes. The generic feature abstraction makes it possible to study the arrangement of features in the genome at a level of detail which includes RNA genes, putative regulatory regions, SNPs, overlapping transcripts, intron splice junctions, alternative polyadenylation signals-in short, to incorporate significant sequence details which are not necessarily within protein-coding regions. This abstraction is amenable to functional implementation as a relational data model upon which novel query capabilities can be built, and provides objects that can be analyzed using algorithms for comparison of strings and lists. As an initial effort, we have implemented a prototype using a representative set of comparative and content-based annotation methods to reduce a collection of prokaryotic genomes to a feature mosaic representation. Entity-Relationship modeling was then used to develop a data model capable of storing detailed results, including complete parameters for each instance of analysis.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"240 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113996544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Augmenting SSEs with structural properties for rapid protein structure comparison","authors":"C. Chionh, Zhiyong Huang, K. Tan, Zhen Yao","doi":"10.1109/BIBE.2003.1188972","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188972","url":null,"abstract":"Comparing protein structures in three dimensions is a computationally expensive process that makes a full scan of a protein against a library of known protein structures impractical. To reduce the cost, we can use an approximation of the three dimensional structure that allows protein comparison to be performed quickly to filter away dissimilar proteins. In this paper we present a new algorithm, called SCALE, for protein structure comparison. In SCALE, a protein is represented as a sequence of secondary structure elements (SSEs) augmented with 3D structural properties such as the distances and angles between the SSEs. As such, the comparison between two proteins is reduced to a sequence alignment problem between their corresponding sequences of SSEs. The 3-D structural properties of the proteins contribute to the similarity score between the two sequences. We have implemented SCALE, and compared its performance against existing schemes. Our performance study shows that SCALE outperforms existing methods in terms of both efficiency and effectiveness (measured in terms of precision and recall).","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134298418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Streamlining biological data analysis using BioFlow","authors":"Zhijie Guan, H. Jamil","doi":"10.1109/BIBE.2003.1188960","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188960","url":null,"abstract":"For several obvious and practical reasons, resources needed for biological data analysis are often geographically distributed and are accessible through the Internet. Such resources usually include data repositories, analysis tools, digital documents, and so on. Such an arrangement warrants sophisticated data and process integration tools in order to design ad hoc higher level applications using these online resources. In this paper we present such a system, called the BioFlow, that exploits recent advances in workflow technology and Internet computing in order to provide support for ad hoc application development by hiding aspects related to the heterogeneity and distributive nature of the resources required by user applications. We introduce the salient features of the BioFlow system, discuss briefly its architecture and implementation issues using simple but real life applications. We demonstrate that the declarative language on which BioFlow is based makes our system quite intuitive, easy to use, effective and efficient for ad hoc application design. The approach taken in BioFlow is somewhat similar to the idea of web services in semantic web computing for biological applications.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124449232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining few neural networks for effective secondary structure prediction","authors":"K. Guimaraes, J. Melo, George D. C. Cavalcanti","doi":"10.1109/BIBE.2003.1188981","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188981","url":null,"abstract":"The prediction of secondary structure is treated with a simple and efficient method. Combining only three neural networks, an average Q/sub 3/ accuracy prediction by residues of 75.93% is achieved. This value is better than the best results reported on the same test and training database, CB396, using the same validation method. For a second database, RS126, an average Q/sub 3/ accuracy of 74.13% is attained, which is better than each individual method, being defeated only by CONSENSUS, a rather intricate engine, which is a combination of several methods. The networks are trained with RPROP an efficient variation of the back-propagation algorithm. Five combination rules are applied independently afterwards. Each one increases the accuracy of prediction by at least 1%, due to the fact that each network used converges to a different local minimum. The Product rule derives the best results. The predictor described here can be accessed at http://biolab.cin.ufpe.br/tools/.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125838129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Tsai, Jan-Gowth Chang, S. H. Shih, Rong-Ming Chen, H. Hsiao, Rouh-Mei Hu, S. N. Chen, M. M. Lee, Falcon F. M. Liu, Wen-Ling Chan
{"title":"A framework for cancer-related genes mining over the Internet","authors":"J. Tsai, Jan-Gowth Chang, S. H. Shih, Rong-Ming Chen, H. Hsiao, Rouh-Mei Hu, S. N. Chen, M. M. Lee, Falcon F. M. Liu, Wen-Ling Chan","doi":"10.1109/BIBE.2003.1188983","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188983","url":null,"abstract":"Clinically, cancer is a complex family of diseases. From the view of molecular biology, cancer is a genetic disease resulting from abnormal gene expression. This alternation of gene expression could be resulting from DNA instability, such as translocation, amplification, deletion or point mutations. A large amplification or deletion of a chromosome region can be easily detected by two methods: loss of heterozygosity (LOH) and comparative genomic hybridization (CGH). The different gene expression pattern can be monitored by high throughput microarray analysis. Enormous data accumulated by practicing these technologies and the data pool is continuing enlarging with an amazing rate. To aid investigators mining useful information in these data deposits, new data storing and analysis tools must be developed. Two value-added databases are constructed to achieve this purpose. They contain information of genes in the unstable regions of cancer cells basing on the data accumulated from LOH and CGH experiments and information of cancer cell gene expression profiles according to microarray analysis, respectively. An automatic system to retrieve interesting gene information, to compare with the known databases, to analyze and predict the protein functions, and to group the genes of the same function will be integrated into the database circuit. An automatic update system will be installed and performed after the setup of the two databases. The system keeps also the probability to modify and to accept new data obtained from any new techniques. Our goal is to help biologists to find the needles in a haystack that is, to find the real cancer-related genes (oncogenes or tumor suppressor genes) for further research purpose.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128318015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}