{"title":"基于Tversky相似度的欠采样高斯核决策残桩Adaboost算法用于不平衡医疗数据分类","authors":"M. Kamaladevi, V. Venkatraman","doi":"10.15837/ijccc.2021.6.4291","DOIUrl":null,"url":null,"abstract":"In recent years, imbalanced data classification are utilized in several domains including, detecting fraudulent activities in banking sector, disease prediction in healthcare sector and so on. To solve the Imbalanced classification problem at data level, strategy such as undersampling or oversampling are widely used. Sampling technique pose a challenge of significant information loss. The proposed method involves two processes namely, undersampling and classification. First, undersampling is performed by means of Tversky Similarity Indexive Regression model. Here, regression along with the Tversky similarity index is used in analyzing the relationship between two instances from the dataset. Next, Gaussian Kernelized Decision stump AdaBoosting is used for classifying the instances into two classes. Here, the root node in the Decision Stump takes a decision on the basis of the Gaussian Kernel function, considering average of neighboring points accordingly the results is obtained at the leaf node. Weights are also adjusted to minimizing the training errors occurring during classification to find the best classifier. Experimental assessment is performed with two different imbalanced dataset (Pima Indian diabetes and Hepatitis dataset). Various performance metrics such as precision, recall, AUC under ROC score and F1-score are compared with the existing undersampling methods. Experimental results showed that prediction accuracy of minority class has improved and therefore minimizing false positive and false negative.","PeriodicalId":179619,"journal":{"name":"Int. J. Comput. Commun. Control","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tversky Similarity based UnderSampling with Gaussian Kernelized Decision Stump Adaboost Algorithm for Imbalanced Medical Data Classification\",\"authors\":\"M. Kamaladevi, V. Venkatraman\",\"doi\":\"10.15837/ijccc.2021.6.4291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, imbalanced data classification are utilized in several domains including, detecting fraudulent activities in banking sector, disease prediction in healthcare sector and so on. To solve the Imbalanced classification problem at data level, strategy such as undersampling or oversampling are widely used. Sampling technique pose a challenge of significant information loss. The proposed method involves two processes namely, undersampling and classification. First, undersampling is performed by means of Tversky Similarity Indexive Regression model. Here, regression along with the Tversky similarity index is used in analyzing the relationship between two instances from the dataset. Next, Gaussian Kernelized Decision stump AdaBoosting is used for classifying the instances into two classes. Here, the root node in the Decision Stump takes a decision on the basis of the Gaussian Kernel function, considering average of neighboring points accordingly the results is obtained at the leaf node. Weights are also adjusted to minimizing the training errors occurring during classification to find the best classifier. Experimental assessment is performed with two different imbalanced dataset (Pima Indian diabetes and Hepatitis dataset). Various performance metrics such as precision, recall, AUC under ROC score and F1-score are compared with the existing undersampling methods. Experimental results showed that prediction accuracy of minority class has improved and therefore minimizing false positive and false negative.\",\"PeriodicalId\":179619,\"journal\":{\"name\":\"Int. J. Comput. Commun. Control\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Commun. Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15837/ijccc.2021.6.4291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Commun. Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15837/ijccc.2021.6.4291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Tversky Similarity based UnderSampling with Gaussian Kernelized Decision Stump Adaboost Algorithm for Imbalanced Medical Data Classification
In recent years, imbalanced data classification are utilized in several domains including, detecting fraudulent activities in banking sector, disease prediction in healthcare sector and so on. To solve the Imbalanced classification problem at data level, strategy such as undersampling or oversampling are widely used. Sampling technique pose a challenge of significant information loss. The proposed method involves two processes namely, undersampling and classification. First, undersampling is performed by means of Tversky Similarity Indexive Regression model. Here, regression along with the Tversky similarity index is used in analyzing the relationship between two instances from the dataset. Next, Gaussian Kernelized Decision stump AdaBoosting is used for classifying the instances into two classes. Here, the root node in the Decision Stump takes a decision on the basis of the Gaussian Kernel function, considering average of neighboring points accordingly the results is obtained at the leaf node. Weights are also adjusted to minimizing the training errors occurring during classification to find the best classifier. Experimental assessment is performed with two different imbalanced dataset (Pima Indian diabetes and Hepatitis dataset). Various performance metrics such as precision, recall, AUC under ROC score and F1-score are compared with the existing undersampling methods. Experimental results showed that prediction accuracy of minority class has improved and therefore minimizing false positive and false negative.