DeepD_DrugC: Deep and distributed workflow to predict drug- candidates

2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS) Pub Date : 2022-10-12 DOI:10.1109/PAIS56586.2022.9946898

Karima Sid, Soumia Zertal, Chaker Mezioud

{"title":"DeepD_DrugC: Deep and distributed workflow to predict drug- candidates","authors":"Karima Sid, Soumia Zertal, Chaker Mezioud","doi":"10.1109/PAIS56586.2022.9946898","DOIUrl":null,"url":null,"abstract":"The applications of computational tools at various stages of drug discovery is one of the most active axes of research. Virtual Screening (VS) is a very common application, which aims to screen and analyze large chemical libraries using algorithms and models to extract drug-candidates that can bind to therapeutic targets. Machine learning (ML) techniques are widely applied as a tool to analyze the chemical libraries in ligand-based virtual screening (LBVS). Deep learning (DL) is a novel mode of machine learning that provides several new architectures primarily based on classical Artificial Neural Network algorithms, but with many hidden layers to learn features with multiple levels of abstraction. Recently, chemical libraries are identified as Big Data, due to their huge size, the variety of data, and the speed at which they are created, streamed and aggregated. In this context, we need advanced tools to handle and treat this type of data. Apache Spark is the most widely used engine for big data processing, with many improvements that make it more suitable for virtual screening analysis. In this work, we propose a novel workflow named DeepD_ DrugC based on Spark and Deep Neural Network model implemented with Deeplearning4j (DL4J) to improve the prediction results in LBVS. To evaluate the workflow, we suggest a process to create training datasets using the PubChem Bioassay database for cancer disease. The evaluation results show a good precision more than 93%, with acceptable scaling behavior.","PeriodicalId":266229,"journal":{"name":"2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PAIS56586.2022.9946898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The applications of computational tools at various stages of drug discovery is one of the most active axes of research. Virtual Screening (VS) is a very common application, which aims to screen and analyze large chemical libraries using algorithms and models to extract drug-candidates that can bind to therapeutic targets. Machine learning (ML) techniques are widely applied as a tool to analyze the chemical libraries in ligand-based virtual screening (LBVS). Deep learning (DL) is a novel mode of machine learning that provides several new architectures primarily based on classical Artificial Neural Network algorithms, but with many hidden layers to learn features with multiple levels of abstraction. Recently, chemical libraries are identified as Big Data, due to their huge size, the variety of data, and the speed at which they are created, streamed and aggregated. In this context, we need advanced tools to handle and treat this type of data. Apache Spark is the most widely used engine for big data processing, with many improvements that make it more suitable for virtual screening analysis. In this work, we propose a novel workflow named DeepD_ DrugC based on Spark and Deep Neural Network model implemented with Deeplearning4j (DL4J) to improve the prediction results in LBVS. To evaluate the workflow, we suggest a process to create training datasets using the PubChem Bioassay database for cancer disease. The evaluation results show a good precision more than 93%, with acceptable scaling behavior.

查看原文本刊更多论文

深度和分布式工作流程预测候选药物

计算工具在药物发现的各个阶段的应用是最活跃的研究轴之一。虚拟筛选(VS)是一种非常常见的应用，旨在通过算法和模型筛选和分析大型化学文库，以提取可以结合治疗靶点的候选药物。在基于配体的虚拟筛选(LBVS)中，机器学习技术作为分析化学文库的工具得到了广泛的应用。深度学习(DL)是一种新颖的机器学习模式，它提供了几种主要基于经典人工神经网络算法的新架构，但有许多隐藏层来学习具有多层抽象的特征。最近，化学库由于其庞大的规模、数据的多样性以及创建、传输和聚合的速度而被确定为大数据。在这种情况下，我们需要高级工具来处理和处理这类数据。Apache Spark是大数据处理中使用最广泛的引擎，经过许多改进，使其更适合虚拟筛选分析。在这项工作中，我们提出了一种基于Spark和Deep Neural Network模型的DeepD_ DrugC工作流，该工作流使用Deeplearning4j (DL4J)实现，以改善LBVS中的预测结果。为了评估工作流程，我们建议使用PubChem Bioassay数据库创建癌症疾病的训练数据集。评价结果表明，该方法具有良好的精度，可达93%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS)

自引率

0.00%

发文量