María Jimena Martínez , Julieta Sol Dussaut , Ignacio Ponzoni
{"title":"Biclustering as Strategy for Improving Feature Selection in Consensus QSAR Modeling","authors":"María Jimena Martínez , Julieta Sol Dussaut , Ignacio Ponzoni","doi":"10.1016/j.endm.2018.07.016","DOIUrl":null,"url":null,"abstract":"<div><p>Feature selection applied to QSAR (Quantitative Structure-Activity Relationship) modeling is a challenging combinatorial optimization problem due to the high dimensionality of the chemical space associated with molecules and the complexity of the physicochemical properties usually studied in Cheminformatics. This derives commonly in classification models with a large number of variables, decreasing the generalization and interpretability of these classifiers. In this paper, a novel strategy based on biclustering analysis is proposed for addressing this problem. The new method is applied as a post-processing step for feature selection outputs generated by consensus feature selection methods. The approach was evaluated using datasets oriented to <em>ready biodegradation</em> prediction of chemical compounds. These preliminary results show that biclustering can help to identify features with low class-discrimination power, which it is useful for reducing the complexity of QSAR models without losing prediction accuracy.</p></div>","PeriodicalId":35408,"journal":{"name":"Electronic Notes in Discrete Mathematics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.endm.2018.07.016","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Notes in Discrete Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1571065318301604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 15
Abstract
Feature selection applied to QSAR (Quantitative Structure-Activity Relationship) modeling is a challenging combinatorial optimization problem due to the high dimensionality of the chemical space associated with molecules and the complexity of the physicochemical properties usually studied in Cheminformatics. This derives commonly in classification models with a large number of variables, decreasing the generalization and interpretability of these classifiers. In this paper, a novel strategy based on biclustering analysis is proposed for addressing this problem. The new method is applied as a post-processing step for feature selection outputs generated by consensus feature selection methods. The approach was evaluated using datasets oriented to ready biodegradation prediction of chemical compounds. These preliminary results show that biclustering can help to identify features with low class-discrimination power, which it is useful for reducing the complexity of QSAR models without losing prediction accuracy.
期刊介绍:
Electronic Notes in Discrete Mathematics is a venue for the rapid electronic publication of the proceedings of conferences, of lecture notes, monographs and other similar material for which quick publication is appropriate. Organizers of conferences whose proceedings appear in Electronic Notes in Discrete Mathematics, and authors of other material appearing as a volume in the series are allowed to make hard copies of the relevant volume for limited distribution. For example, conference proceedings may be distributed to participants at the meeting, and lecture notes can be distributed to those taking a course based on the material in the volume.