The Effect of Singular-Vectors Feature Selection (SVFS) Based Hyper Dimentionality Framework in Ligand-Based Virtual Screening

A. A. Mostafa, Sameh A. Salem, Amr E. Mohamed
{"title":"The Effect of Singular-Vectors Feature Selection (SVFS) Based Hyper Dimentionality Framework in Ligand-Based Virtual Screening","authors":"A. A. Mostafa, Sameh A. Salem, Amr E. Mohamed","doi":"10.1109/CCWC54503.2022.9720796","DOIUrl":null,"url":null,"abstract":"The enormous databases of ligand-based virtual screening contain numerous redundant and/or irrelevant features. This issue necessitates the development of a rapid and accurate pre-selection procedure for filtering these vast databases. A features selection method named Singular-Vectors Feature Selection (SVFS) records superior performance in dealing with datasets with extensive redundant and/or irrelevant features while applying different dimensionality reduction (DR) methodologies with various features selection shows an enhancement in their performance. This paper proposed a framework based on SVFS cascaded with several dimensionality reduction methods such as Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), Neighborhood Components Analysis (NCA) for improving the effectiveness of the pre-selection process in ligand-based virtual screening. The investigation results showed that integrating models of SVFS with DR methodologies outweighed the performance of SVFS alone. Furthermore, merging SVFS with UMAP with all chosen classifiers, especially SVM, displayed a great improvement in accuracy, precision, recall, MCC, and F1 score. Although combining SVFS with NCA recorded a moderate enhancement in all these metrics, incorporating PCA with SVFS exhibited the lowest improvement in accuracy and other measured matrices.","PeriodicalId":101590,"journal":{"name":"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)","volume":"38 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCWC54503.2022.9720796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The enormous databases of ligand-based virtual screening contain numerous redundant and/or irrelevant features. This issue necessitates the development of a rapid and accurate pre-selection procedure for filtering these vast databases. A features selection method named Singular-Vectors Feature Selection (SVFS) records superior performance in dealing with datasets with extensive redundant and/or irrelevant features while applying different dimensionality reduction (DR) methodologies with various features selection shows an enhancement in their performance. This paper proposed a framework based on SVFS cascaded with several dimensionality reduction methods such as Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), Neighborhood Components Analysis (NCA) for improving the effectiveness of the pre-selection process in ligand-based virtual screening. The investigation results showed that integrating models of SVFS with DR methodologies outweighed the performance of SVFS alone. Furthermore, merging SVFS with UMAP with all chosen classifiers, especially SVM, displayed a great improvement in accuracy, precision, recall, MCC, and F1 score. Although combining SVFS with NCA recorded a moderate enhancement in all these metrics, incorporating PCA with SVFS exhibited the lowest improvement in accuracy and other measured matrices.
基于奇异向量特征选择(SVFS)的超维框架在配体虚拟筛选中的作用
基于配体的虚拟筛选的庞大数据库包含许多冗余和/或不相关的特征。这个问题需要发展一种快速和准确的预选程序来过滤这些庞大的数据库。奇异向量特征选择(SVFS)在处理具有大量冗余和/或不相关特征的数据集方面表现出优异的性能,而将不同的降维(DR)方法应用于不同的特征选择可以提高其性能。为了提高配体虚拟筛选中预选过程的有效性,提出了一种基于SVFS的框架,并与主成分分析(PCA)、均匀流形逼近与投影(UMAP)、邻域成分分析(NCA)等降维方法进行级联。研究结果表明,将SVFS模型与DR方法相结合的效果优于单独的SVFS模型。此外,将SVFS与UMAP与所有选择的分类器合并,特别是SVM,在准确性、精密度、召回率、MCC和F1分数方面都有很大的提高。虽然将SVFS与NCA结合在所有这些指标上都有适度的增强,但将PCA与SVFS结合在准确率和其他测量矩阵上表现出最低的改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信