An Efficient Framework for classifying Cancer diseases using Ensemble machine learning over Cancer Gene Expression and Sequence Based Protein Interactions.
Prabhuraj Metipatil, P. Bhuvaneshwari, S. M. Basha, S. Patil
{"title":"An Efficient Framework for classifying Cancer diseases using Ensemble machine learning over Cancer Gene Expression and Sequence Based Protein Interactions.","authors":"Prabhuraj Metipatil, P. Bhuvaneshwari, S. M. Basha, S. Patil","doi":"10.1109/INOCON57975.2023.10101354","DOIUrl":null,"url":null,"abstract":"In recent years, a significant number of deaths worldwide have been due to cancer diseases. Analysis of Microarray gene expressions and protein interaction data facilitates early cancer identification. The accurate prediction of information for thousands of genes is made possible by using DNA microarray technology. Protein-Protein Interactions (PPIs) are the crucial protein activities involved in the cell cycle that replicates the DNA and cellular signaling. Determining whether a pair of proteins interacts is crucial for diagnosing an illness in molecular biology is therefore important. In existing machine learning classifiers have two-class problem that is limited and only be used to solve binary class problems, additionally, they can be prone to overfitting, as the classification framework may also become too specialized to the training data and not generalized to the varied data. To overcome this problem, this paper proposes an ensemble machine learning technique; ensembling combines the strengths of both classifiers that allow more robust and accurate framework. The better combination of both Support Vector machine and Naïve Bayes ensemble provides better performance in terms of various performance parameters. The proposed SVM-NB Ensemble classifier outperforms the existing classifiers by 15-20% over various performance parameters like classification accuracy, time taken for classification, precision, recall, and F-measure. The results were drawn by comparing the proposed ensemble (SVM+NB) classifier with the existing most applied classifiers like Logistic Regression (LR), Support Vector Machine and Naive Bayes techniques.","PeriodicalId":113637,"journal":{"name":"2023 2nd International Conference for Innovation in Technology (INOCON)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd International Conference for Innovation in Technology (INOCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INOCON57975.2023.10101354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In recent years, a significant number of deaths worldwide have been due to cancer diseases. Analysis of Microarray gene expressions and protein interaction data facilitates early cancer identification. The accurate prediction of information for thousands of genes is made possible by using DNA microarray technology. Protein-Protein Interactions (PPIs) are the crucial protein activities involved in the cell cycle that replicates the DNA and cellular signaling. Determining whether a pair of proteins interacts is crucial for diagnosing an illness in molecular biology is therefore important. In existing machine learning classifiers have two-class problem that is limited and only be used to solve binary class problems, additionally, they can be prone to overfitting, as the classification framework may also become too specialized to the training data and not generalized to the varied data. To overcome this problem, this paper proposes an ensemble machine learning technique; ensembling combines the strengths of both classifiers that allow more robust and accurate framework. The better combination of both Support Vector machine and Naïve Bayes ensemble provides better performance in terms of various performance parameters. The proposed SVM-NB Ensemble classifier outperforms the existing classifiers by 15-20% over various performance parameters like classification accuracy, time taken for classification, precision, recall, and F-measure. The results were drawn by comparing the proposed ensemble (SVM+NB) classifier with the existing most applied classifiers like Logistic Regression (LR), Support Vector Machine and Naive Bayes techniques.