{"title":"A framework for decision tree-based method to index data from large protein sequence databases","authors":"K. Jaber, R. Abdullah, N. Rashid","doi":"10.1109/IECBES.2010.5742212","DOIUrl":null,"url":null,"abstract":"Currently, the size of biological databases has increased significantly with the growing number of users and the rate of queries where some databases are of terabyte size. Hence, there is an increasing need to access databases at the fastest possible rate. Where biologists are concerned, the need is more of a means to fast, scalable and accuracy searching in biological databases. This may seem to be a simple task, given the speed of current available gigabytes processors. However, this is far from the truth as the growing number of data which are deposited into the database are ever increasing. Hence, searching the database becomes a difficult and time-consuming task. Here, the computer scientist can help to organize data in a way that allows biologists to quickly search existing information and to predict new entries. In this paper, a decision tree indexing method is presented. This method of indexing can effectively and rapidly retrieve all the similar proteins from a large database for a given protein query. A theoretical and conceptual frameworks is derived, based on published works using indexing techniques for different applications.","PeriodicalId":241343,"journal":{"name":"2010 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IECBES.2010.5742212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Currently, the size of biological databases has increased significantly with the growing number of users and the rate of queries where some databases are of terabyte size. Hence, there is an increasing need to access databases at the fastest possible rate. Where biologists are concerned, the need is more of a means to fast, scalable and accuracy searching in biological databases. This may seem to be a simple task, given the speed of current available gigabytes processors. However, this is far from the truth as the growing number of data which are deposited into the database are ever increasing. Hence, searching the database becomes a difficult and time-consuming task. Here, the computer scientist can help to organize data in a way that allows biologists to quickly search existing information and to predict new entries. In this paper, a decision tree indexing method is presented. This method of indexing can effectively and rapidly retrieve all the similar proteins from a large database for a given protein query. A theoretical and conceptual frameworks is derived, based on published works using indexing techniques for different applications.