Marisa C. Kozlowski, Guillermo Correa Otero, Sarah Zhang
{"title":"On the Integrity of Accurate Mass Measurement Data in Compound Characterization","authors":"Marisa C. Kozlowski, Guillermo Correa Otero, Sarah Zhang","doi":"10.1021/acs.orglett.4c04730","DOIUrl":null,"url":null,"abstract":"Publications in chemistry contain vast amounts of data. Research relies heavily on reproducibility; therefore, inconsistent or invalid data can hinder scientific progress. For example, conclusions drawn from erroneous data can be misleading and may propagate errors through subsequent research. These issues underscore the importance of ensuring high data quality, particularly when sharing work among the scientific community. (1)<named-content content-type=\"anchor\" r type=\"simple\"></named-content> Accurate mass measurements (AMM), previously know as high resolution mass spectrometry (HRMS), are used to assign or verify a molecular formula to a given structure. The exact mass of a given molecule is characteristic because the exact masses of different molecular formulas with the same molecular weight differ slightly (typically in the third or fourth decimal place). At <i>Organic Letters</i>, AMM is one of the methods that can be used to establish the identity of a given compound but not its purity. (2) In the past, most mass spectrometry centers would require users to submit a paper form with their structure and proposed formula. The center staff would then provide a paper report with the found exact mass and the corresponding calculated exact mass (see Figure 2) while attaching the report generated by the instrument (Figure 3). Most centers have now moved to electronic submission of requests and return electronic forms of the resultant output (Figure 3). With these results, researchers either manually input the data into an experimental description or copy/paste it from the electronic report. The former is subject to transcription errors and even the latter can result in errors if the wrong data is copied/pasted. Figure 2. Paper data report of AMM measurement. Figure 3. Electronic data report of an AMM. Due to the size of the Supporting Information documents accompanying reports on organic chemistry which contain the experimental protocols and compound characterization data, manual identification of all inaccuracies is nearly impossible for human reviewers. Moreover, there is a significant lack of tools for automated and standardized data quality assessments. (3) In an accompanying publication in this issue, (4) Prof. Mathias Christmann from the Freie Universität Berlin discloses an analysis of over 3000 Supporting Information files from <i>Organic Letters</i> to gain an understanding of AMM error rates in these data. Notably, a significant number of errors were identified. Most of the errors arose from not taking into account the mass of an electron. This very minor error typically does not impact the measurement and occurs because the instrument manufacturers do not account for it in the software that they provide. Errors of concern, however, were found in ∼10% of the 101,883 compounds with AMM reports. The source of these errors varied and included typographical errors (transposition of digits or an incorrect digit), use of incorrect data in calculating masses (e.g., 1.0 for a proton, using molecular weights─see Figure 4), or using the incorrect molecular formulas (e.g., not including an H or Na when needed). Only 0.3% of the errors could not readily be attributed to input errors. Figure 4. Molecular weight vs exact mass from structural drawing software. To improve data quality in published work, it is recommended that researchers check their AMM data in a manner similar to how single crystal structural data is now checked with the checkCIF program. (5) Users can either download the software published by Christmann (4) or can use a web application (Figure 5) we have created using this code (<b>Check AMM</b>). (6,7) This application requires a pdf file containing the AMM reports in the standard format. (2) The complete Supporting Information file can be used, simplifying the process because the program is able to identify the AMM reports within this larger file. Users set the accuracy threshold to what they desire (at <i>Organic Letters</i>, 5 ppm is the threshold (2)). The application does not store any user data and provides a report with page numbers, the recalculated accurate mass from the formula given, and the nature of the error found. A help page shows a sample report (Figure 6) and describes how errors are classified from most serious (A Level Alert) to least serious (G Level Alert). We hope that this tool will provide researchers with an opportunity to identify and correct any inaccuracies as they prepare to publish their work. Figure 5. Landing page for Check AMM web application. Figure 6. Sample report from Check AMM. We thank Dr. Nathan L. Loud and Alice Wu (UPenn) for conceiving and rendering the TOC graphic (Figure 1). G.C.O. thanks the NSF for a fellowship (DGE-2236662). We are grateful to the support of the NSF (CHE 2400215) for this work. This article references 7 other publications. <i>Organic Letters</i> author guidelines: For a related tool which requires compatibility with a Java 1.1 applet and pasting in the text section to be analyzed: For code, see This article has not yet been cited by other publications.","PeriodicalId":54,"journal":{"name":"Organic Letters","volume":"28 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Organic Letters","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.orglett.4c04730","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ORGANIC","Score":null,"Total":0}
引用次数: 0
Abstract
Publications in chemistry contain vast amounts of data. Research relies heavily on reproducibility; therefore, inconsistent or invalid data can hinder scientific progress. For example, conclusions drawn from erroneous data can be misleading and may propagate errors through subsequent research. These issues underscore the importance of ensuring high data quality, particularly when sharing work among the scientific community. (1) Accurate mass measurements (AMM), previously know as high resolution mass spectrometry (HRMS), are used to assign or verify a molecular formula to a given structure. The exact mass of a given molecule is characteristic because the exact masses of different molecular formulas with the same molecular weight differ slightly (typically in the third or fourth decimal place). At Organic Letters, AMM is one of the methods that can be used to establish the identity of a given compound but not its purity. (2) In the past, most mass spectrometry centers would require users to submit a paper form with their structure and proposed formula. The center staff would then provide a paper report with the found exact mass and the corresponding calculated exact mass (see Figure 2) while attaching the report generated by the instrument (Figure 3). Most centers have now moved to electronic submission of requests and return electronic forms of the resultant output (Figure 3). With these results, researchers either manually input the data into an experimental description or copy/paste it from the electronic report. The former is subject to transcription errors and even the latter can result in errors if the wrong data is copied/pasted. Figure 2. Paper data report of AMM measurement. Figure 3. Electronic data report of an AMM. Due to the size of the Supporting Information documents accompanying reports on organic chemistry which contain the experimental protocols and compound characterization data, manual identification of all inaccuracies is nearly impossible for human reviewers. Moreover, there is a significant lack of tools for automated and standardized data quality assessments. (3) In an accompanying publication in this issue, (4) Prof. Mathias Christmann from the Freie Universität Berlin discloses an analysis of over 3000 Supporting Information files from Organic Letters to gain an understanding of AMM error rates in these data. Notably, a significant number of errors were identified. Most of the errors arose from not taking into account the mass of an electron. This very minor error typically does not impact the measurement and occurs because the instrument manufacturers do not account for it in the software that they provide. Errors of concern, however, were found in ∼10% of the 101,883 compounds with AMM reports. The source of these errors varied and included typographical errors (transposition of digits or an incorrect digit), use of incorrect data in calculating masses (e.g., 1.0 for a proton, using molecular weights─see Figure 4), or using the incorrect molecular formulas (e.g., not including an H or Na when needed). Only 0.3% of the errors could not readily be attributed to input errors. Figure 4. Molecular weight vs exact mass from structural drawing software. To improve data quality in published work, it is recommended that researchers check their AMM data in a manner similar to how single crystal structural data is now checked with the checkCIF program. (5) Users can either download the software published by Christmann (4) or can use a web application (Figure 5) we have created using this code (Check AMM). (6,7) This application requires a pdf file containing the AMM reports in the standard format. (2) The complete Supporting Information file can be used, simplifying the process because the program is able to identify the AMM reports within this larger file. Users set the accuracy threshold to what they desire (at Organic Letters, 5 ppm is the threshold (2)). The application does not store any user data and provides a report with page numbers, the recalculated accurate mass from the formula given, and the nature of the error found. A help page shows a sample report (Figure 6) and describes how errors are classified from most serious (A Level Alert) to least serious (G Level Alert). We hope that this tool will provide researchers with an opportunity to identify and correct any inaccuracies as they prepare to publish their work. Figure 5. Landing page for Check AMM web application. Figure 6. Sample report from Check AMM. We thank Dr. Nathan L. Loud and Alice Wu (UPenn) for conceiving and rendering the TOC graphic (Figure 1). G.C.O. thanks the NSF for a fellowship (DGE-2236662). We are grateful to the support of the NSF (CHE 2400215) for this work. This article references 7 other publications. Organic Letters author guidelines: For a related tool which requires compatibility with a Java 1.1 applet and pasting in the text section to be analyzed: For code, see This article has not yet been cited by other publications.
期刊介绍:
Organic Letters invites original reports of fundamental research in all branches of the theory and practice of organic, physical organic, organometallic,medicinal, and bioorganic chemistry. Organic Letters provides rapid disclosure of the key elements of significant studies that are of interest to a large portion of the organic community. In selecting manuscripts for publication, the Editors place emphasis on the originality, quality and wide interest of the work. Authors should provide enough background information to place the new disclosure in context and to justify the rapid publication format. Back-to-back Letters will be considered. Full details should be reserved for an Article, which should appear in due course.