{"title":"四十个小分子数据库中的同分异构体冲突。","authors":"Devendra K Dhaked, Marc C Nicklaus","doi":"10.1021/acs.jcim.4c00700","DOIUrl":null,"url":null,"abstract":"<p><p>We have analyzed 40 different databases ranging in size from a few thousand to nearly 100 million molecules, comprising a total of over 210 million structures, for their tautomeric conflicts. A tautomeric conflict is defined as an occurrence of two or more structures within a data set identified by the tautomeric rules applied as being tautomers of each other. We tested a total of 119 detailed tautomeric transform rules expressed as SMIRKS, out of which 79 yielded at least one conflict. These transformations include three types of tautomerism: prototropic, ring-chain, and valence tautomerism. The databases analyzed spanned a wide variety of types including large aggregating databases, drug collections, and structure collections based on experimental data. All databases analyzed showed intra-database tautomeric conflicts. The conflict rates as percentage of the database were typically in the few tenths of a percent range, which for the largest databases amounts to >100,000 cases per database.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tautomeric Conflicts in Forty Small-Molecule Databases.\",\"authors\":\"Devendra K Dhaked, Marc C Nicklaus\",\"doi\":\"10.1021/acs.jcim.4c00700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We have analyzed 40 different databases ranging in size from a few thousand to nearly 100 million molecules, comprising a total of over 210 million structures, for their tautomeric conflicts. A tautomeric conflict is defined as an occurrence of two or more structures within a data set identified by the tautomeric rules applied as being tautomers of each other. We tested a total of 119 detailed tautomeric transform rules expressed as SMIRKS, out of which 79 yielded at least one conflict. These transformations include three types of tautomerism: prototropic, ring-chain, and valence tautomerism. The databases analyzed spanned a wide variety of types including large aggregating databases, drug collections, and structure collections based on experimental data. All databases analyzed showed intra-database tautomeric conflicts. The conflict rates as percentage of the database were typically in the few tenths of a percent range, which for the largest databases amounts to >100,000 cases per database.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jcim.4c00700\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c00700","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
Tautomeric Conflicts in Forty Small-Molecule Databases.
We have analyzed 40 different databases ranging in size from a few thousand to nearly 100 million molecules, comprising a total of over 210 million structures, for their tautomeric conflicts. A tautomeric conflict is defined as an occurrence of two or more structures within a data set identified by the tautomeric rules applied as being tautomers of each other. We tested a total of 119 detailed tautomeric transform rules expressed as SMIRKS, out of which 79 yielded at least one conflict. These transformations include three types of tautomerism: prototropic, ring-chain, and valence tautomerism. The databases analyzed spanned a wide variety of types including large aggregating databases, drug collections, and structure collections based on experimental data. All databases analyzed showed intra-database tautomeric conflicts. The conflict rates as percentage of the database were typically in the few tenths of a percent range, which for the largest databases amounts to >100,000 cases per database.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.