{"title":"Else-Tree Classifier for Minimizing Misclassification of Biological Data","authors":"Truong X. Tran, M. Pusey, R. S. Aygün","doi":"10.1109/BIBM.2018.8621322","DOIUrl":null,"url":null,"abstract":"Misclassification has a high cost in biological research studies such as protein crystallization. For drug development, the 3D structure of a protein is obtained by first crystallizing the protein. Hence, missing a crystalline condition may hinder the development of a drug. It is important to develop classification algorithms that would avoid or minimize misclassifications. Traditional decision tree classifiers are based on an impurity measure that identifies the most informative attribute to be selected at the early levels of a decision tree. The class labels are chosen based on majority of class labels at a leaf node. We introduce a novel decision tree classifier, else-tree, by analyzing pure regions or ranges of an attribute per class. After identifying the longest or most populated contiguous range per class, the rest of the ranges are fed into else branch of the decision tree. Only conflicting or doubtful samples are passed to the lower levels of the decision tree. It does not necessarily assign a class for difficult samples to classify. We have used our protein crystallization trials data and three other publicly available datasets to evaluate else-tree. The experiments show that the else-tree may reduce the misclassification to 0% by labeling difficult samples as undecided when the training set is a good representation of the dataset.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Misclassification has a high cost in biological research studies such as protein crystallization. For drug development, the 3D structure of a protein is obtained by first crystallizing the protein. Hence, missing a crystalline condition may hinder the development of a drug. It is important to develop classification algorithms that would avoid or minimize misclassifications. Traditional decision tree classifiers are based on an impurity measure that identifies the most informative attribute to be selected at the early levels of a decision tree. The class labels are chosen based on majority of class labels at a leaf node. We introduce a novel decision tree classifier, else-tree, by analyzing pure regions or ranges of an attribute per class. After identifying the longest or most populated contiguous range per class, the rest of the ranges are fed into else branch of the decision tree. Only conflicting or doubtful samples are passed to the lower levels of the decision tree. It does not necessarily assign a class for difficult samples to classify. We have used our protein crystallization trials data and three other publicly available datasets to evaluate else-tree. The experiments show that the else-tree may reduce the misclassification to 0% by labeling difficult samples as undecided when the training set is a good representation of the dataset.