非平衡小样本抗乳腺癌药物的FL-Lightgbm预测方法

Chenxiao Zhou, Lianying Zou, Chuang Liu, Ziwei Song
{"title":"非平衡小样本抗乳腺癌药物的FL-Lightgbm预测方法","authors":"Chenxiao Zhou, Lianying Zou, Chuang Liu, Ziwei Song","doi":"10.1117/12.2667385","DOIUrl":null,"url":null,"abstract":"The problem of small amount data and sample imbalance exists in the machine learning prediction of the molecular properties of anti breast cancer candidate drugs. Proposing a FL-Lightgbm prediction model based on WGAN-GP data enchance model in order to solve this problem. Firstly, WGAN-GP model is used for data enhancement to increase the sample size of the training data set. Considering the small difference between positive and negative samples, the enhanced data of positive and negative samples are generated respectively, and then combined them according to the original order to ensure that the generated data and the original data maintain the same distribution; Then the Focal Loss function is introduced into the Lightgbm model to increase learning ability for unbalanced samples, the model constructed is called FL-Lightgbm prediction model. After the training of the enhanced data set, the proposed model shows excellent prediction accuracy for 178 randomly selected validation samples in the experiment, and its highest accuracy, AUC and F1 values reach 0.882, 0.851 and 0.7272 respectively. In these three indexes, the proposed model has better prediction ability than the original Lightgbm model with over sampling algorithms such as BorderlineSMOTE and ADASYN.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FL-Lightgbm prediction method of unbalanced small sample anti-breast cancer drugs\",\"authors\":\"Chenxiao Zhou, Lianying Zou, Chuang Liu, Ziwei Song\",\"doi\":\"10.1117/12.2667385\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of small amount data and sample imbalance exists in the machine learning prediction of the molecular properties of anti breast cancer candidate drugs. Proposing a FL-Lightgbm prediction model based on WGAN-GP data enchance model in order to solve this problem. Firstly, WGAN-GP model is used for data enhancement to increase the sample size of the training data set. Considering the small difference between positive and negative samples, the enhanced data of positive and negative samples are generated respectively, and then combined them according to the original order to ensure that the generated data and the original data maintain the same distribution; Then the Focal Loss function is introduced into the Lightgbm model to increase learning ability for unbalanced samples, the model constructed is called FL-Lightgbm prediction model. After the training of the enhanced data set, the proposed model shows excellent prediction accuracy for 178 randomly selected validation samples in the experiment, and its highest accuracy, AUC and F1 values reach 0.882, 0.851 and 0.7272 respectively. In these three indexes, the proposed model has better prediction ability than the original Lightgbm model with over sampling algorithms such as BorderlineSMOTE and ADASYN.\",\"PeriodicalId\":345723,\"journal\":{\"name\":\"Fifth International Conference on Computer Information Science and Artificial Intelligence\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fifth International Conference on Computer Information Science and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667385\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fifth International Conference on Computer Information Science and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在抗乳腺癌候选药物分子特性的机器学习预测中存在数据量少、样本不平衡的问题。为了解决这一问题,提出了基于WGAN-GP数据增强模型的FL-Lightgbm预测模型。首先,利用WGAN-GP模型进行数据增强,增加训练数据集的样本量;考虑到正、负样本差异较小,分别生成正、负样本的增强数据,然后按原始顺序组合,保证生成的数据与原始数据保持相同的分布;然后在Lightgbm模型中引入Focal Loss函数,提高对不平衡样本的学习能力,所构建的模型称为FL-Lightgbm预测模型。经过增强数据集的训练,该模型在实验中对随机选取的178个验证样本显示出良好的预测精度,其最高准确率、AUC和F1值分别达到0.882、0.851和0.7272。在这三个指标中,本文提出的模型比采用BorderlineSMOTE和ADASYN等过采样算法的原始Lightgbm模型具有更好的预测能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FL-Lightgbm prediction method of unbalanced small sample anti-breast cancer drugs
The problem of small amount data and sample imbalance exists in the machine learning prediction of the molecular properties of anti breast cancer candidate drugs. Proposing a FL-Lightgbm prediction model based on WGAN-GP data enchance model in order to solve this problem. Firstly, WGAN-GP model is used for data enhancement to increase the sample size of the training data set. Considering the small difference between positive and negative samples, the enhanced data of positive and negative samples are generated respectively, and then combined them according to the original order to ensure that the generated data and the original data maintain the same distribution; Then the Focal Loss function is introduced into the Lightgbm model to increase learning ability for unbalanced samples, the model constructed is called FL-Lightgbm prediction model. After the training of the enhanced data set, the proposed model shows excellent prediction accuracy for 178 randomly selected validation samples in the experiment, and its highest accuracy, AUC and F1 values reach 0.882, 0.851 and 0.7272 respectively. In these three indexes, the proposed model has better prediction ability than the original Lightgbm model with over sampling algorithms such as BorderlineSMOTE and ADASYN.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信