Quantitative Structure-Activity Relationships of Estrogen Receptor Alpha Based on Molecular Descriptors Selection and Extreme Gradient Boosting

Shaotong Liu, Zhewei Xu, Dongsheng Ye
{"title":"Quantitative Structure-Activity Relationships of Estrogen Receptor Alpha Based on Molecular Descriptors Selection and Extreme Gradient Boosting","authors":"Shaotong Liu, Zhewei Xu, Dongsheng Ye","doi":"10.1145/3583788.3583807","DOIUrl":null,"url":null,"abstract":"Quantitative Structure-Activity Relationships (QSAR), which aims to estimate the estrogen receptor alpha (ERα) activity of compounds through their chemical features and ERα, is a fundamental part in the process of drug discovery for breast cancer treatment. Due to the variety of data properties, the building of a suitable QSAR model is a challenging task. Meanwhile, the challenge of QSAR lies in the complexity of compound molecular descriptors which make it difficult to screen robust molecular descriptors. Previous studies select molecular descriptors manually based on expert knowledge and experience. However, they are highly subjective which could lead to ineffectiveness of molecular descriptors. In this paper, a novel approach is presented to address the problems in the context of regression modelling and feature selection. Firstly, two filtered and two embedded scoring metrics are proposed to jointly sort and select the most relevant and robust molecular descriptors. Then the selected features are used to build the supervised data-driven model, namely eXtreme Gradient Boosting (XGBoost) algorithm. Experimental results show that our selected molecular descriptors can give good predictions to the target ERα bioactivity and our regression approach outperform formal models.","PeriodicalId":292167,"journal":{"name":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583788.3583807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Quantitative Structure-Activity Relationships (QSAR), which aims to estimate the estrogen receptor alpha (ERα) activity of compounds through their chemical features and ERα, is a fundamental part in the process of drug discovery for breast cancer treatment. Due to the variety of data properties, the building of a suitable QSAR model is a challenging task. Meanwhile, the challenge of QSAR lies in the complexity of compound molecular descriptors which make it difficult to screen robust molecular descriptors. Previous studies select molecular descriptors manually based on expert knowledge and experience. However, they are highly subjective which could lead to ineffectiveness of molecular descriptors. In this paper, a novel approach is presented to address the problems in the context of regression modelling and feature selection. Firstly, two filtered and two embedded scoring metrics are proposed to jointly sort and select the most relevant and robust molecular descriptors. Then the selected features are used to build the supervised data-driven model, namely eXtreme Gradient Boosting (XGBoost) algorithm. Experimental results show that our selected molecular descriptors can give good predictions to the target ERα bioactivity and our regression approach outperform formal models.
基于分子描述子选择和极端梯度增强的雌激素受体α定量构效关系研究
定量构效关系(Quantitative Structure-Activity Relationships, QSAR)是通过化合物的化学特征和雌激素受体α (estrogen receptor α, ERα)的活性来估计化合物的活性,是乳腺癌治疗药物发现过程中的基础环节。由于数据属性的多样性,构建合适的QSAR模型是一项具有挑战性的任务。同时,QSAR的挑战在于复合分子描述子的复杂性,使得难以筛选出具有鲁棒性的分子描述子。以往的研究都是基于专家知识和经验手动选择分子描述符。然而,它们是高度主观的,这可能导致分子描述符的无效。本文提出了一种新的方法来解决回归建模和特征选择中的问题。首先,提出了两个过滤评分指标和两个嵌入评分指标,共同排序和选择最相关和鲁棒性最强的分子描述子。然后将选择的特征用于建立监督数据驱动模型,即极限梯度增强算法(eXtreme Gradient boost, XGBoost)。实验结果表明,我们选择的分子描述符可以很好地预测目标ERα的生物活性,我们的回归方法优于形式模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信