Chenxiao Zhou, Lianying Zou, Chuang Liu, Ziwei Song
{"title":"非平衡小样本抗乳腺癌药物的FL-Lightgbm预测方法","authors":"Chenxiao Zhou, Lianying Zou, Chuang Liu, Ziwei Song","doi":"10.1117/12.2667385","DOIUrl":null,"url":null,"abstract":"The problem of small amount data and sample imbalance exists in the machine learning prediction of the molecular properties of anti breast cancer candidate drugs. Proposing a FL-Lightgbm prediction model based on WGAN-GP data enchance model in order to solve this problem. Firstly, WGAN-GP model is used for data enhancement to increase the sample size of the training data set. Considering the small difference between positive and negative samples, the enhanced data of positive and negative samples are generated respectively, and then combined them according to the original order to ensure that the generated data and the original data maintain the same distribution; Then the Focal Loss function is introduced into the Lightgbm model to increase learning ability for unbalanced samples, the model constructed is called FL-Lightgbm prediction model. After the training of the enhanced data set, the proposed model shows excellent prediction accuracy for 178 randomly selected validation samples in the experiment, and its highest accuracy, AUC and F1 values reach 0.882, 0.851 and 0.7272 respectively. In these three indexes, the proposed model has better prediction ability than the original Lightgbm model with over sampling algorithms such as BorderlineSMOTE and ADASYN.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FL-Lightgbm prediction method of unbalanced small sample anti-breast cancer drugs\",\"authors\":\"Chenxiao Zhou, Lianying Zou, Chuang Liu, Ziwei Song\",\"doi\":\"10.1117/12.2667385\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of small amount data and sample imbalance exists in the machine learning prediction of the molecular properties of anti breast cancer candidate drugs. Proposing a FL-Lightgbm prediction model based on WGAN-GP data enchance model in order to solve this problem. Firstly, WGAN-GP model is used for data enhancement to increase the sample size of the training data set. Considering the small difference between positive and negative samples, the enhanced data of positive and negative samples are generated respectively, and then combined them according to the original order to ensure that the generated data and the original data maintain the same distribution; Then the Focal Loss function is introduced into the Lightgbm model to increase learning ability for unbalanced samples, the model constructed is called FL-Lightgbm prediction model. After the training of the enhanced data set, the proposed model shows excellent prediction accuracy for 178 randomly selected validation samples in the experiment, and its highest accuracy, AUC and F1 values reach 0.882, 0.851 and 0.7272 respectively. In these three indexes, the proposed model has better prediction ability than the original Lightgbm model with over sampling algorithms such as BorderlineSMOTE and ADASYN.\",\"PeriodicalId\":345723,\"journal\":{\"name\":\"Fifth International Conference on Computer Information Science and Artificial Intelligence\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fifth International Conference on Computer Information Science and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667385\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fifth International Conference on Computer Information Science and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FL-Lightgbm prediction method of unbalanced small sample anti-breast cancer drugs
The problem of small amount data and sample imbalance exists in the machine learning prediction of the molecular properties of anti breast cancer candidate drugs. Proposing a FL-Lightgbm prediction model based on WGAN-GP data enchance model in order to solve this problem. Firstly, WGAN-GP model is used for data enhancement to increase the sample size of the training data set. Considering the small difference between positive and negative samples, the enhanced data of positive and negative samples are generated respectively, and then combined them according to the original order to ensure that the generated data and the original data maintain the same distribution; Then the Focal Loss function is introduced into the Lightgbm model to increase learning ability for unbalanced samples, the model constructed is called FL-Lightgbm prediction model. After the training of the enhanced data set, the proposed model shows excellent prediction accuracy for 178 randomly selected validation samples in the experiment, and its highest accuracy, AUC and F1 values reach 0.882, 0.851 and 0.7272 respectively. In these three indexes, the proposed model has better prediction ability than the original Lightgbm model with over sampling algorithms such as BorderlineSMOTE and ADASYN.