Preparation of Simplified Molecular Input Line Entry System Notation Datasets for use in Convolutional Neural Networks

Sandi Baressi Segota, N. Anđelić, I. Lorencin, J. Musulin, D. Štifanić, Z. Car
{"title":"Preparation of Simplified Molecular Input Line Entry System Notation Datasets for use in Convolutional Neural Networks","authors":"Sandi Baressi Segota, N. Anđelić, I. Lorencin, J. Musulin, D. Štifanić, Z. Car","doi":"10.1109/BIBE52308.2021.9635320","DOIUrl":null,"url":null,"abstract":"Simplified Molecular Input Line Entry System (SMILES) is a type of chemical notation. The SMILES format allows the representation of chemical structures in a shape easily readable by computer programs. This allows many techniques, such as Artificial Neural Networks (ANNs) to be applied on the SMILES formatted data. One of the highest-performing ANN types is the Convolutional Neural Networks (CNNs), designed to work on images or matrix-shaped data. In this paper, the authors will present the preparation of the SMILES dataset for use by CNNs. The paper will start with a brief description of the SMILES format, followed by the explanation of the dataset transformation into an NPY matrix-based format, with an example of utilization via the application of popular CNN architectures on a transformed dataset. The proposed architecture achieves satisfactory results (AUC=0.92), with the transformation algorithm speed also proving satisfactory (0.08 seconds per data point)","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Simplified Molecular Input Line Entry System (SMILES) is a type of chemical notation. The SMILES format allows the representation of chemical structures in a shape easily readable by computer programs. This allows many techniques, such as Artificial Neural Networks (ANNs) to be applied on the SMILES formatted data. One of the highest-performing ANN types is the Convolutional Neural Networks (CNNs), designed to work on images or matrix-shaped data. In this paper, the authors will present the preparation of the SMILES dataset for use by CNNs. The paper will start with a brief description of the SMILES format, followed by the explanation of the dataset transformation into an NPY matrix-based format, with an example of utilization via the application of popular CNN architectures on a transformed dataset. The proposed architecture achieves satisfactory results (AUC=0.92), with the transformation algorithm speed also proving satisfactory (0.08 seconds per data point)
用于卷积神经网络的简化分子输入行输入系统符号数据集的制备
简化分子输入线输入系统(SMILES)是一种化学符号。SMILES格式允许以计算机程序容易读懂的形状表示化学结构。这允许许多技术,如人工神经网络(ann)应用于SMILES格式化的数据。表现最好的人工神经网络类型之一是卷积神经网络(cnn),设计用于处理图像或矩阵形数据。在本文中,作者将介绍cnn使用的SMILES数据集的准备工作。本文将从对SMILES格式的简要描述开始,然后解释数据集转换为基于NPY矩阵的格式,并通过在转换后的数据集上应用流行的CNN架构来使用示例。该架构取得了令人满意的结果(AUC=0.92),变换算法的速度也令人满意(每数据点0.08秒)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信