{"title":"The Effect of Singular-Vectors Feature Selection (SVFS) Based Hyper Dimentionality Framework in Ligand-Based Virtual Screening","authors":"A. A. Mostafa, Sameh A. Salem, Amr E. Mohamed","doi":"10.1109/CCWC54503.2022.9720796","DOIUrl":null,"url":null,"abstract":"The enormous databases of ligand-based virtual screening contain numerous redundant and/or irrelevant features. This issue necessitates the development of a rapid and accurate pre-selection procedure for filtering these vast databases. A features selection method named Singular-Vectors Feature Selection (SVFS) records superior performance in dealing with datasets with extensive redundant and/or irrelevant features while applying different dimensionality reduction (DR) methodologies with various features selection shows an enhancement in their performance. This paper proposed a framework based on SVFS cascaded with several dimensionality reduction methods such as Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), Neighborhood Components Analysis (NCA) for improving the effectiveness of the pre-selection process in ligand-based virtual screening. The investigation results showed that integrating models of SVFS with DR methodologies outweighed the performance of SVFS alone. Furthermore, merging SVFS with UMAP with all chosen classifiers, especially SVM, displayed a great improvement in accuracy, precision, recall, MCC, and F1 score. Although combining SVFS with NCA recorded a moderate enhancement in all these metrics, incorporating PCA with SVFS exhibited the lowest improvement in accuracy and other measured matrices.","PeriodicalId":101590,"journal":{"name":"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)","volume":"38 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCWC54503.2022.9720796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The enormous databases of ligand-based virtual screening contain numerous redundant and/or irrelevant features. This issue necessitates the development of a rapid and accurate pre-selection procedure for filtering these vast databases. A features selection method named Singular-Vectors Feature Selection (SVFS) records superior performance in dealing with datasets with extensive redundant and/or irrelevant features while applying different dimensionality reduction (DR) methodologies with various features selection shows an enhancement in their performance. This paper proposed a framework based on SVFS cascaded with several dimensionality reduction methods such as Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), Neighborhood Components Analysis (NCA) for improving the effectiveness of the pre-selection process in ligand-based virtual screening. The investigation results showed that integrating models of SVFS with DR methodologies outweighed the performance of SVFS alone. Furthermore, merging SVFS with UMAP with all chosen classifiers, especially SVM, displayed a great improvement in accuracy, precision, recall, MCC, and F1 score. Although combining SVFS with NCA recorded a moderate enhancement in all these metrics, incorporating PCA with SVFS exhibited the lowest improvement in accuracy and other measured matrices.