Maged Nasser, N. Salim, Hamza Hentabli, Faisal Saeed, I. Rabiu
{"title":"Features Reweighting and Selection in ligand-based Virtual Screening for Molecular Similarity Searching Based on Deep Belief Networks","authors":"Maged Nasser, N. Salim, Hamza Hentabli, Faisal Saeed, I. Rabiu","doi":"10.1142/s2424922x20500096","DOIUrl":null,"url":null,"abstract":"Virtual screening (VS) is defined as the use of a compilation of computational procedures to grade, score and/or sort several chemical formations. The purpose of VS is to identify the molecules holding the greatest prior probabilities of activity. Many of the conventional similarity methods assume that molecular features that do not relate to the biological activity carry the same weight as the important ones. For this reason, the researchers on this paper investigated that some features are being more important than others through the chemist structure diagrams and the weight for each fragment should be taken into consideration by giving more weight to those fragments that are more important. In this paper, a deep learning method specifically known as Deep Belief Networks (DBN) has been used to reweight the molecule features and based on this new weigh, the reconstruction feature error has been calculated for all the features. Based on the reconstruction feature error values, Principal Component Analysis (PCA) has been used for the dimension’s reduction and only few hundreds of features have been selected based on the less error rate. The main aim of this research is to show an improvement of the similarity searching performance based on the selected features those have less error rate. The results derived through the DBN were compared with those derived through other similarity methods, such as the Tanimoto coefficient and the quantum-based methods. This comparison revealed the performance of the DBN with the structurally heterogeneous data sets (DS1 and DS3) to be superior to the performances of all the other techniques.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"26 1","pages":"2050009:1-2050009:28"},"PeriodicalIF":0.5000,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Science and Adaptive Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2424922x20500096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 3
Abstract
Virtual screening (VS) is defined as the use of a compilation of computational procedures to grade, score and/or sort several chemical formations. The purpose of VS is to identify the molecules holding the greatest prior probabilities of activity. Many of the conventional similarity methods assume that molecular features that do not relate to the biological activity carry the same weight as the important ones. For this reason, the researchers on this paper investigated that some features are being more important than others through the chemist structure diagrams and the weight for each fragment should be taken into consideration by giving more weight to those fragments that are more important. In this paper, a deep learning method specifically known as Deep Belief Networks (DBN) has been used to reweight the molecule features and based on this new weigh, the reconstruction feature error has been calculated for all the features. Based on the reconstruction feature error values, Principal Component Analysis (PCA) has been used for the dimension’s reduction and only few hundreds of features have been selected based on the less error rate. The main aim of this research is to show an improvement of the similarity searching performance based on the selected features those have less error rate. The results derived through the DBN were compared with those derived through other similarity methods, such as the Tanimoto coefficient and the quantum-based methods. This comparison revealed the performance of the DBN with the structurally heterogeneous data sets (DS1 and DS3) to be superior to the performances of all the other techniques.