化合物表征中精确质量测量数据的完整性

IF 5 1区 化学 Q1 CHEMISTRY, ORGANIC
Marisa C. Kozlowski, Guillermo Correa Otero, Sarah Zhang
{"title":"化合物表征中精确质量测量数据的完整性","authors":"Marisa C. Kozlowski, Guillermo Correa Otero, Sarah Zhang","doi":"10.1021/acs.orglett.4c04730","DOIUrl":null,"url":null,"abstract":"Publications in chemistry contain vast amounts of data. Research relies heavily on reproducibility; therefore, inconsistent or invalid data can hinder scientific progress. For example, conclusions drawn from erroneous data can be misleading and may propagate errors through subsequent research. These issues underscore the importance of ensuring high data quality, particularly when sharing work among the scientific community. (1)<named-content content-type=\"anchor\" r type=\"simple\"></named-content> Accurate mass measurements (AMM), previously know as high resolution mass spectrometry (HRMS), are used to assign or verify a molecular formula to a given structure. The exact mass of a given molecule is characteristic because the exact masses of different molecular formulas with the same molecular weight differ slightly (typically in the third or fourth decimal place). At <i>Organic Letters</i>, AMM is one of the methods that can be used to establish the identity of a given compound but not its purity. (2) In the past, most mass spectrometry centers would require users to submit a paper form with their structure and proposed formula. The center staff would then provide a paper report with the found exact mass and the corresponding calculated exact mass (see Figure 2) while attaching the report generated by the instrument (Figure 3). Most centers have now moved to electronic submission of requests and return electronic forms of the resultant output (Figure 3). With these results, researchers either manually input the data into an experimental description or copy/paste it from the electronic report. The former is subject to transcription errors and even the latter can result in errors if the wrong data is copied/pasted. Figure 2. Paper data report of AMM measurement. Figure 3. Electronic data report of an AMM. Due to the size of the Supporting Information documents accompanying reports on organic chemistry which contain the experimental protocols and compound characterization data, manual identification of all inaccuracies is nearly impossible for human reviewers. Moreover, there is a significant lack of tools for automated and standardized data quality assessments. (3) In an accompanying publication in this issue, (4) Prof. Mathias Christmann from the Freie Universität Berlin discloses an analysis of over 3000 Supporting Information files from <i>Organic Letters</i> to gain an understanding of AMM error rates in these data. Notably, a significant number of errors were identified. Most of the errors arose from not taking into account the mass of an electron. This very minor error typically does not impact the measurement and occurs because the instrument manufacturers do not account for it in the software that they provide. Errors of concern, however, were found in ∼10% of the 101,883 compounds with AMM reports. The source of these errors varied and included typographical errors (transposition of digits or an incorrect digit), use of incorrect data in calculating masses (e.g., 1.0 for a proton, using molecular weights─see Figure 4), or using the incorrect molecular formulas (e.g., not including an H or Na when needed). Only 0.3% of the errors could not readily be attributed to input errors. Figure 4. Molecular weight vs exact mass from structural drawing software. To improve data quality in published work, it is recommended that researchers check their AMM data in a manner similar to how single crystal structural data is now checked with the checkCIF program. (5) Users can either download the software published by Christmann (4) or can use a web application (Figure 5) we have created using this code (<b>Check AMM</b>). (6,7) This application requires a pdf file containing the AMM reports in the standard format. (2) The complete Supporting Information file can be used, simplifying the process because the program is able to identify the AMM reports within this larger file. Users set the accuracy threshold to what they desire (at <i>Organic Letters</i>, 5 ppm is the threshold (2)). The application does not store any user data and provides a report with page numbers, the recalculated accurate mass from the formula given, and the nature of the error found. A help page shows a sample report (Figure 6) and describes how errors are classified from most serious (A Level Alert) to least serious (G Level Alert). We hope that this tool will provide researchers with an opportunity to identify and correct any inaccuracies as they prepare to publish their work. Figure 5. Landing page for Check AMM web application. Figure 6. Sample report from Check AMM. We thank Dr. Nathan L. Loud and Alice Wu (UPenn) for conceiving and rendering the TOC graphic (Figure 1). G.C.O. thanks the NSF for a fellowship (DGE-2236662). We are grateful to the support of the NSF (CHE 2400215) for this work. This article references 7 other publications. <i>Organic Letters</i> author guidelines: For a related tool which requires compatibility with a Java 1.1 applet and pasting in the text section to be analyzed: For code, see This article has not yet been cited by other publications.","PeriodicalId":54,"journal":{"name":"Organic Letters","volume":"28 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Integrity of Accurate Mass Measurement Data in Compound Characterization\",\"authors\":\"Marisa C. Kozlowski, Guillermo Correa Otero, Sarah Zhang\",\"doi\":\"10.1021/acs.orglett.4c04730\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Publications in chemistry contain vast amounts of data. Research relies heavily on reproducibility; therefore, inconsistent or invalid data can hinder scientific progress. For example, conclusions drawn from erroneous data can be misleading and may propagate errors through subsequent research. These issues underscore the importance of ensuring high data quality, particularly when sharing work among the scientific community. (1)<named-content content-type=\\\"anchor\\\" r type=\\\"simple\\\"></named-content> Accurate mass measurements (AMM), previously know as high resolution mass spectrometry (HRMS), are used to assign or verify a molecular formula to a given structure. The exact mass of a given molecule is characteristic because the exact masses of different molecular formulas with the same molecular weight differ slightly (typically in the third or fourth decimal place). At <i>Organic Letters</i>, AMM is one of the methods that can be used to establish the identity of a given compound but not its purity. (2) In the past, most mass spectrometry centers would require users to submit a paper form with their structure and proposed formula. The center staff would then provide a paper report with the found exact mass and the corresponding calculated exact mass (see Figure 2) while attaching the report generated by the instrument (Figure 3). Most centers have now moved to electronic submission of requests and return electronic forms of the resultant output (Figure 3). With these results, researchers either manually input the data into an experimental description or copy/paste it from the electronic report. The former is subject to transcription errors and even the latter can result in errors if the wrong data is copied/pasted. Figure 2. Paper data report of AMM measurement. Figure 3. Electronic data report of an AMM. Due to the size of the Supporting Information documents accompanying reports on organic chemistry which contain the experimental protocols and compound characterization data, manual identification of all inaccuracies is nearly impossible for human reviewers. Moreover, there is a significant lack of tools for automated and standardized data quality assessments. (3) In an accompanying publication in this issue, (4) Prof. Mathias Christmann from the Freie Universität Berlin discloses an analysis of over 3000 Supporting Information files from <i>Organic Letters</i> to gain an understanding of AMM error rates in these data. Notably, a significant number of errors were identified. Most of the errors arose from not taking into account the mass of an electron. This very minor error typically does not impact the measurement and occurs because the instrument manufacturers do not account for it in the software that they provide. Errors of concern, however, were found in ∼10% of the 101,883 compounds with AMM reports. The source of these errors varied and included typographical errors (transposition of digits or an incorrect digit), use of incorrect data in calculating masses (e.g., 1.0 for a proton, using molecular weights─see Figure 4), or using the incorrect molecular formulas (e.g., not including an H or Na when needed). Only 0.3% of the errors could not readily be attributed to input errors. Figure 4. Molecular weight vs exact mass from structural drawing software. To improve data quality in published work, it is recommended that researchers check their AMM data in a manner similar to how single crystal structural data is now checked with the checkCIF program. (5) Users can either download the software published by Christmann (4) or can use a web application (Figure 5) we have created using this code (<b>Check AMM</b>). (6,7) This application requires a pdf file containing the AMM reports in the standard format. (2) The complete Supporting Information file can be used, simplifying the process because the program is able to identify the AMM reports within this larger file. Users set the accuracy threshold to what they desire (at <i>Organic Letters</i>, 5 ppm is the threshold (2)). The application does not store any user data and provides a report with page numbers, the recalculated accurate mass from the formula given, and the nature of the error found. A help page shows a sample report (Figure 6) and describes how errors are classified from most serious (A Level Alert) to least serious (G Level Alert). We hope that this tool will provide researchers with an opportunity to identify and correct any inaccuracies as they prepare to publish their work. Figure 5. Landing page for Check AMM web application. Figure 6. Sample report from Check AMM. We thank Dr. Nathan L. Loud and Alice Wu (UPenn) for conceiving and rendering the TOC graphic (Figure 1). G.C.O. thanks the NSF for a fellowship (DGE-2236662). We are grateful to the support of the NSF (CHE 2400215) for this work. This article references 7 other publications. <i>Organic Letters</i> author guidelines: For a related tool which requires compatibility with a Java 1.1 applet and pasting in the text section to be analyzed: For code, see This article has not yet been cited by other publications.\",\"PeriodicalId\":54,\"journal\":{\"name\":\"Organic Letters\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Organic Letters\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.orglett.4c04730\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, ORGANIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Organic Letters","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.orglett.4c04730","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ORGANIC","Score":null,"Total":0}
引用次数: 0

摘要

化学方面的出版物包含大量的数据。研究很大程度上依赖于可重复性;因此,不一致或无效的数据会阻碍科学进步。例如,从错误数据中得出的结论可能具有误导性,并可能在随后的研究中传播错误。这些问题强调了确保高数据质量的重要性,特别是在科学界共享工作时。(1)精确质量测量(AMM),以前称为高分辨率质谱法(HRMS),用于指定或验证给定结构的分子式。给定分子的确切质量是有特点的,因为具有相同分子量的不同分子式的确切质量略有不同(通常在小数点后第三位或第四位)。在《有机快报》上,AMM是一种可以用来确定特定化合物身份的方法,但不能用来确定其纯度。(2)过去,大多数质谱分析中心要求用户提交一份包含其结构和建议公式的纸质表格。然后,中心工作人员将提供一份纸质报告,其中包含发现的精确质量和相应计算的精确质量(见图2),同时附上仪器生成的报告(图3)。大多数中心现在已经转向电子提交请求,并返回结果输出的电子形式(图3)。有了这些结果,研究人员要么手动将数据输入实验描述中,要么从电子报告中复制/粘贴数据。前者容易出现转录错误,如果复制/粘贴错误的数据,甚至后者也可能导致错误。图2。AMM测量纸质数据报告。图3。AMM的电子数据报告。由于有机化学报告中包含实验方案和化合物表征数据的支持信息文档的大小,人工识别所有不准确的地方几乎是不可能的。此外,还严重缺乏自动化和标准化数据质量评估的工具。(4) Freie Universität Berlin的Mathias Christmann教授在本期的随笔中披露了对来自Organic Letters的3000多个支持信息文件的分析,以了解这些数据中的AMM错误率。值得注意的是,发现了大量错误。大多数误差是由于没有考虑到电子的质量而产生的。这种非常小的误差通常不会影响测量,因为仪器制造商没有在他们提供的软件中考虑到它。然而,在101883个有AMM报告的化合物中,有10%存在值得关注的错误。这些错误的来源各不相同,包括排版错误(数字调换或数字不正确),在计算质量时使用不正确的数据(例如,质子为1.0,使用分子量─见图4),或使用不正确的分子式(例如,在需要时不包括H或Na)。只有0.3%的错误不能轻易归因于输入错误。图4。分子质量与结构绘图软件的精确质量。为了提高已发表作品的数据质量,建议研究人员以类似于现在使用checkCIF程序检查单晶结构数据的方式检查他们的AMM数据。(5)用户可以下载Christmann发布的软件(4),也可以使用我们使用此代码创建的web应用程序(图5)(检查AMM)。(6,7)此申请需要一个pdf文件,其中包含标准格式的AMM报告。(2)可以使用完整的支持信息文件,简化流程,因为程序能够识别这个较大文件中的AMM报告。用户可以设置他们想要的准确度阈值(在Organic Letters, 5 ppm是阈值(2))。该应用程序不存储任何用户数据,并提供一个报告,其中包含页码、根据给定公式重新计算的准确质量以及发现的错误的性质。帮助页面显示了一个示例报告(图6),并描述了如何将错误从最严重(A级警报)到最不严重(G级警报)进行分类。我们希望这个工具将为研究人员提供一个机会来识别和纠正任何不准确的地方,因为他们准备发表他们的工作。图5。登陆页检查AMM web应用程序。图6。检查AMM的样本报告。我们感谢Nathan L. Loud博士和Alice Wu(宾夕法尼亚大学)构思并绘制TOC图表(图1)。G.C.O.感谢NSF的奖学金(DGE-2236662)。我们非常感谢NSF (CHE 2400215)对这项工作的支持。本文引用了其他7个出版物。Organic Letters作者指南:针对需要与Java 1兼容的相关工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

On the Integrity of Accurate Mass Measurement Data in Compound Characterization

On the Integrity of Accurate Mass Measurement Data in Compound Characterization
Publications in chemistry contain vast amounts of data. Research relies heavily on reproducibility; therefore, inconsistent or invalid data can hinder scientific progress. For example, conclusions drawn from erroneous data can be misleading and may propagate errors through subsequent research. These issues underscore the importance of ensuring high data quality, particularly when sharing work among the scientific community. (1) Accurate mass measurements (AMM), previously know as high resolution mass spectrometry (HRMS), are used to assign or verify a molecular formula to a given structure. The exact mass of a given molecule is characteristic because the exact masses of different molecular formulas with the same molecular weight differ slightly (typically in the third or fourth decimal place). At Organic Letters, AMM is one of the methods that can be used to establish the identity of a given compound but not its purity. (2) In the past, most mass spectrometry centers would require users to submit a paper form with their structure and proposed formula. The center staff would then provide a paper report with the found exact mass and the corresponding calculated exact mass (see Figure 2) while attaching the report generated by the instrument (Figure 3). Most centers have now moved to electronic submission of requests and return electronic forms of the resultant output (Figure 3). With these results, researchers either manually input the data into an experimental description or copy/paste it from the electronic report. The former is subject to transcription errors and even the latter can result in errors if the wrong data is copied/pasted. Figure 2. Paper data report of AMM measurement. Figure 3. Electronic data report of an AMM. Due to the size of the Supporting Information documents accompanying reports on organic chemistry which contain the experimental protocols and compound characterization data, manual identification of all inaccuracies is nearly impossible for human reviewers. Moreover, there is a significant lack of tools for automated and standardized data quality assessments. (3) In an accompanying publication in this issue, (4) Prof. Mathias Christmann from the Freie Universität Berlin discloses an analysis of over 3000 Supporting Information files from Organic Letters to gain an understanding of AMM error rates in these data. Notably, a significant number of errors were identified. Most of the errors arose from not taking into account the mass of an electron. This very minor error typically does not impact the measurement and occurs because the instrument manufacturers do not account for it in the software that they provide. Errors of concern, however, were found in ∼10% of the 101,883 compounds with AMM reports. The source of these errors varied and included typographical errors (transposition of digits or an incorrect digit), use of incorrect data in calculating masses (e.g., 1.0 for a proton, using molecular weights─see Figure 4), or using the incorrect molecular formulas (e.g., not including an H or Na when needed). Only 0.3% of the errors could not readily be attributed to input errors. Figure 4. Molecular weight vs exact mass from structural drawing software. To improve data quality in published work, it is recommended that researchers check their AMM data in a manner similar to how single crystal structural data is now checked with the checkCIF program. (5) Users can either download the software published by Christmann (4) or can use a web application (Figure 5) we have created using this code (Check AMM). (6,7) This application requires a pdf file containing the AMM reports in the standard format. (2) The complete Supporting Information file can be used, simplifying the process because the program is able to identify the AMM reports within this larger file. Users set the accuracy threshold to what they desire (at Organic Letters, 5 ppm is the threshold (2)). The application does not store any user data and provides a report with page numbers, the recalculated accurate mass from the formula given, and the nature of the error found. A help page shows a sample report (Figure 6) and describes how errors are classified from most serious (A Level Alert) to least serious (G Level Alert). We hope that this tool will provide researchers with an opportunity to identify and correct any inaccuracies as they prepare to publish their work. Figure 5. Landing page for Check AMM web application. Figure 6. Sample report from Check AMM. We thank Dr. Nathan L. Loud and Alice Wu (UPenn) for conceiving and rendering the TOC graphic (Figure 1). G.C.O. thanks the NSF for a fellowship (DGE-2236662). We are grateful to the support of the NSF (CHE 2400215) for this work. This article references 7 other publications. Organic Letters author guidelines: For a related tool which requires compatibility with a Java 1.1 applet and pasting in the text section to be analyzed: For code, see This article has not yet been cited by other publications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Organic Letters
Organic Letters 化学-有机化学
CiteScore
9.30
自引率
11.50%
发文量
1607
审稿时长
1.5 months
期刊介绍: Organic Letters invites original reports of fundamental research in all branches of the theory and practice of organic, physical organic, organometallic,medicinal, and bioorganic chemistry. Organic Letters provides rapid disclosure of the key elements of significant studies that are of interest to a large portion of the organic community. In selecting manuscripts for publication, the Editors place emphasis on the originality, quality and wide interest of the work. Authors should provide enough background information to place the new disclosure in context and to justify the rapid publication format. Back-to-back Letters will be considered. Full details should be reserved for an Article, which should appear in due course.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信