Making the InChI FAIR and sustainable while moving to Inorganics

IF 3.3 3区 化学 Q2 CHEMISTRY, PHYSICAL
Gerd Blanke, Jan Brammer, Djordje Baljozovic, Nauman Khan, Frank Lange, Felix Bänsch, Clare A. Tovee, Ulrich Schatzschneider, Richard M Hartshorn, Sonja Herres-Pawlis
{"title":"Making the InChI FAIR and sustainable while moving to Inorganics","authors":"Gerd Blanke, Jan Brammer, Djordje Baljozovic, Nauman Khan, Frank Lange, Felix Bänsch, Clare A. Tovee, Ulrich Schatzschneider, Richard M Hartshorn, Sonja Herres-Pawlis","doi":"10.1039/d4fd00145a","DOIUrl":null,"url":null,"abstract":"The InChI (International Chemical Identifier) standard stands as a cornerstone in chemical informatics, facilitating the structure-based identification and exchange of chemical compounds across various platforms and databases. The InChI as a unique canonical line notation has made chemical structures searchable on the internet at a broad scale. The largest repositories working with InChIs contain more than 1 billion structures. Central to the functionality of the InChI is its codebase, which orchestrates a series of intricate steps to generate unique identifiers for chemical compounds. Up to now, these steps have been sparsely documented and the InChI algorithm had to be seen as a black box. For the new v1.07 release, the code has been analyzed and the major steps documented, more than 3000 bugs and security issues, as well as nearly 60 Google OSS-Fuzz issues have been fixed. New test systems have been implemented that allow users to directly test the code developments. The move to GitHub has not only made the development more transparent but will also enable external contributors to join the further development of the InChI code. Motivation for this modernisation was the urgency to treat molecular inorganic compounds by the InChI in a meaningful way. Until now, no classic string representation fulfills this need of molecular inorganic chemistry. The connection of metal bonds is by definition disconnected which makes most inorganic InChIs meaningless at the moment. Herein, we propose new routines to remedy this problem in the representation of molecular inorganic compounds by the InChI.","PeriodicalId":76,"journal":{"name":"Faraday Discussions","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Faraday Discussions","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d4fd00145a","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

The InChI (International Chemical Identifier) standard stands as a cornerstone in chemical informatics, facilitating the structure-based identification and exchange of chemical compounds across various platforms and databases. The InChI as a unique canonical line notation has made chemical structures searchable on the internet at a broad scale. The largest repositories working with InChIs contain more than 1 billion structures. Central to the functionality of the InChI is its codebase, which orchestrates a series of intricate steps to generate unique identifiers for chemical compounds. Up to now, these steps have been sparsely documented and the InChI algorithm had to be seen as a black box. For the new v1.07 release, the code has been analyzed and the major steps documented, more than 3000 bugs and security issues, as well as nearly 60 Google OSS-Fuzz issues have been fixed. New test systems have been implemented that allow users to directly test the code developments. The move to GitHub has not only made the development more transparent but will also enable external contributors to join the further development of the InChI code. Motivation for this modernisation was the urgency to treat molecular inorganic compounds by the InChI in a meaningful way. Until now, no classic string representation fulfills this need of molecular inorganic chemistry. The connection of metal bonds is by definition disconnected which makes most inorganic InChIs meaningless at the moment. Herein, we propose new routines to remedy this problem in the representation of molecular inorganic compounds by the InChI.
在转向无机物的同时,使 InChI 具有 FAIR 和可持续性
InChI(国际化学标识符)标准是化学信息学的基石,有助于在各种平台和数据库中进行基于结构的化合物识别和交换。InChI 作为一种独特的规范行符号,使化学结构可以在互联网上进行广泛搜索。使用 InChIs 的最大资源库包含 10 亿多个结构。InChI 功能的核心是其代码库,它协调了一系列复杂的步骤来生成化合物的唯一标识符。到目前为止,对这些步骤的记录还很少,InChI 算法只能被看作是一个黑盒子。在新发布的 v1.07 版中,对代码进行了分析,并记录了主要步骤,修复了 3000 多个错误和安全问题,以及近 60 个 Google OSS-Fuzz 问题。此外,还实施了新的测试系统,允许用户直接测试代码开发。迁移到 GitHub 不仅使开发工作更加透明,还能让外部贡献者加入到 InChI 代码的进一步开发中。InChI之所以要进行现代化改造,是因为迫切需要以一种有意义的方式来处理分子无机化合物。到目前为止,还没有一种经典的字符串表示法能满足分子无机化学的这一需求。根据定义,金属键的连接是断开的,这使得大多数无机 InChI 目前毫无意义。在此,我们提出了新的例程,以弥补用 InChI 表示分子无机化合物的这一问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Faraday Discussions
Faraday Discussions 化学-物理化学
自引率
0.00%
发文量
259
期刊介绍: Discussion summary and research papers from discussion meetings that focus on rapidly developing areas of physical chemistry and its interfaces
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信