Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance

IF 2.9 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics Pub Date : 2023-10-18 DOI:10.2174/0115748936269606231001140647

Osphanie Mentari, Muhammad Shujaat, Hilal Tayara, Kil To Chong

{"title":"Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance","authors":"Osphanie Mentari, Muhammad Shujaat, Hilal Tayara, Kil To Chong","doi":"10.2174/0115748936269606231001140647","DOIUrl":null,"url":null,"abstract":"Background: One of the problems in drug discovery that can be solved by artificial intelligence is toxicity prediction. In drug-induced immune thrombocytopenia, toxicity can arise in patients after five to ten days by significant bleeding caused by drugdependent antibodies. In clinical trials, when this condition occurs, all the drugs consumed by patients should be stopped, although sometimes this is not possible, especially for older patients who are dependent on their medication. Therefore, being able to predict toxicity in drug-induced immune thrombocytopenia is very important. Computational technologies, such as machine learning, can help predict toxicity better than empirical techniques owing to the lower cost and faster processing. Objective: Previous studies used the KNN method. However, the performance of these approaches needs to be enhanced. This study proposes a Logistic Regression to improve accuracy scores. Methods: In this study, we present a new model for drug-induced immune thrombocytopenia using a machine learning method. Our model extracts several features from the Simplified Molecular Input Line Entry System (SMILES). These features were fused and cleaned, and the important features were selected using the SelectKBest method. The model uses a Logistic Regression that is optimized and tuned by the Grid Search Cross Validation. Results: The highest accuracy occurred when using features from PADEL, CDK, RDKIT, MORDRED, BLUEDESC combinations, resulting in an accuracy of 80%. Conclusion: Our proposed model outperforms previous studies in accuracy categories. The information and source code is accessible online at Github: https://github.com/Osphanie/Thrombocytopenia.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"71 1","pages":"0"},"PeriodicalIF":2.9000,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/0115748936269606231001140647","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: One of the problems in drug discovery that can be solved by artificial intelligence is toxicity prediction. In drug-induced immune thrombocytopenia, toxicity can arise in patients after five to ten days by significant bleeding caused by drugdependent antibodies. In clinical trials, when this condition occurs, all the drugs consumed by patients should be stopped, although sometimes this is not possible, especially for older patients who are dependent on their medication. Therefore, being able to predict toxicity in drug-induced immune thrombocytopenia is very important. Computational technologies, such as machine learning, can help predict toxicity better than empirical techniques owing to the lower cost and faster processing. Objective: Previous studies used the KNN method. However, the performance of these approaches needs to be enhanced. This study proposes a Logistic Regression to improve accuracy scores. Methods: In this study, we present a new model for drug-induced immune thrombocytopenia using a machine learning method. Our model extracts several features from the Simplified Molecular Input Line Entry System (SMILES). These features were fused and cleaned, and the important features were selected using the SelectKBest method. The model uses a Logistic Regression that is optimized and tuned by the Grid Search Cross Validation. Results: The highest accuracy occurred when using features from PADEL, CDK, RDKIT, MORDRED, BLUEDESC combinations, resulting in an accuracy of 80%. Conclusion: Our proposed model outperforms previous studies in accuracy categories. The information and source code is accessible online at Github: https://github.com/Osphanie/Thrombocytopenia.

查看原文本刊更多论文

基于特征重要性Logistic回归的药物致免疫性血小板减少毒性预测

背景:药物发现中可以通过人工智能解决的问题之一是毒性预测。在药物性免疫性血小板减少症中，由于药物依赖性抗体引起的大量出血，患者在5至10天后可能出现毒性。在临床试验中，当这种情况发生时，患者应停止服用所有药物，尽管有时这是不可能的，特别是对于依赖药物的老年患者。因此，能够预测药物性免疫性血小板减少症的毒性是非常重要的。计算技术，如机器学习，可以比经验技术更好地预测毒性，因为成本更低，处理速度更快。目的:以往的研究采用KNN方法。然而，这些方法的性能需要得到提高。本研究提出一个逻辑回归，以提高准确性得分。方法:在本研究中，我们利用机器学习方法提出了一种新的药物性免疫性血小板减少模型。我们的模型从简化分子输入线输入系统(SMILES)中提取了几个特征。对这些特征进行融合和清理，并使用SelectKBest方法选择重要特征。该模型使用由网格搜索交叉验证优化和调整的逻辑回归。结果:使用PADEL、CDK、RDKIT、MORDRED、BLUEDESC组合的特征时准确率最高，达到80%。结论:我们提出的模型在准确率类别上优于以往的研究。信息和源代码可在Github上在线访问:https://github.com/Osphanie/Thrombocytopenia。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.