Winnie Uritboonthai, Linh Hoang, Aries Aisporna, Martin Giera, Gary Siuzdak
{"title":"The Dark Metabolome/Lipidome and In-Source Fragmentation","authors":"Winnie Uritboonthai, Linh Hoang, Aries Aisporna, Martin Giera, Gary Siuzdak","doi":"10.1002/ansa.70012","DOIUrl":null,"url":null,"abstract":"<p>To the editor,</p><p>Tandem mass spectrometry (MS/MS) is valued for its ability to facilitate molecular identification and deliver highly consistent data across a wide range of mass spectrometry platforms. Distinct from MS/MS is the fragmentation that occurs during electrospray ionization (ESI), commonly referred to as in-source fragmentation (ISF) (Figure 1). ISF was first observed in the 1950s with electron ionization and has been recognized as an inherent yet often overlooked feature of the ESI process, albeit less prevalent than with electron ionization. Recently, ISF has been associated with the overrepresentation of peaks in liquid chromatography mass spectrometry (LC/MS) data, where it accounts for the majority of observed unfiltered peaks [<span>1</span>]. Due to its overrepresentation in LC/MS data, and the subsequent inability to identify the molecules associated with these peaks using MS/MS data, ISF has been linked to the so-called “dark metabolome” [<span>2, 3</span>] (also encompassing the lipidome), a term used to describe uncharacterized molecular species in metabolomics and lipidomics. This association [<span>1</span>] was determined by an examination of MS/MS data acquired at 0 eV collision energy from METLIN's extensive library of over 931,000 molecular standards. However, while the similarity of ISF and MS/MS at 0 eV data has been described in previous studies [<span>1, 4</span>–<span>6</span>], it has yet to be directly established that they correlate with each other. We explored the consistency between MS/MS (0 eV) data and ISF across various molecular species to assess whether mining METLIN's MS/MS (0 eV) data—comprising over 931,000 molecular standards—can effectively link ISF to the dark metabolome and lipidome.</p><p>Liquid chromatography-tandem mass spectrometry (LC-MS/MS) with ESI has become a cornerstone in metabolomics, lipidomics, and clinical analysis due to its accuracy in identifying small molecules within complex biological matrices. With LC-MS/MS, after ionization occurs in the ESI source, charged molecules are directed into a collision cell where they undergo fragmentation for structural analysis. This procedure is typically repeated for all charged analytes present in a sample. However, despite its utility, this method has revealed an unexpectedly vast array of spectral features associated with the “dark metabolome.” However, given the limited number of protein-coding genes [<span>7, 8</span>] with only a fraction producing enzymes, the chemical diversity [<span>3, 9, 10</span>] detected through LC-MS/MS—potentially hundreds of thousands or even millions of metabolites—far exceeds biological expectations. Current estimates suggest that less than 2% of observed LC-MS/MS spectra can be annotated, a potentially broad spectrum of unknown compounds [<span>3</span>]. Recent research [<span>1</span>] using the METLIN database and its data at 0 eV has shed light on this discrepancy, and much of the perceived complexity may stem from technological factors, particularly ISF, rather than from biological diversity itself.</p><p>Our laboratory, along with several others [<span>11</span>], has observed the widespread occurrence of ISF [<span>12, 13</span>]. This process involves the fragmentation of analytes during the initial ionization stage within the ESI source, occurring before they reach the collision cell. Essentially, ISF can transform a single analyte into multiple molecular ions and fragments, creating a complex array of ions from what was initially a single entity. Consequently, the mass analyzer indiscriminately isolates and further fragments whatever enters the collision cell. Given this understanding, we suspect that ISF may play a significant role in contributing to the so-called dark metabolome.</p><p>In order to correlate the observation of peaks and ISF, we examined the METLIN MS/MS database [<span>14</span>], which consists of over 931,000 molecular standards representing over 350 chemical classes in which we mined METLIN's MS/MS data at 0 eV, an energy designed to simulate the absence of CID. This analysis was performed to assess whether MS/MS spectra acquired at 0 eV collision energy in METLIN could reflect ISF-related fragments. The analysis revealed that ISF could account for over 70% of the peaks observed in typical LC-MS/MS metabolomic datasets when using a 5% cutoff threshold. This number rises when the threshold is reduced to less than 3%. The 5% and 3% thresholds represent a conservative range of peak intensities across LC/MS experiments, where the typical intensity count numbers range from 10000 to millions, well over two orders of magnitude.</p><p>While the METLIN study provides a large statistical snapshot of the number of ISF peaks in a typical LC/MS experiment, it lacked example data directly comparing the similarity between ISF and MS/MS (0 eV) data. Here, we examined both types of data (METLIN MS/MS 0 eV and ISF) from 10 molecules (Figures 2 and 3). ISF data were acquired using both an Agilent QTOF (collision cell off) mass spectrometer and an Agilent TOF mass spectrometer. The data revealed a high level of consistency between METLIN MS/MS 0 eV and ISF produced fragment ions, although the intensities were generally higher for the ISF generated fragment ion peaks. These examples suggest that (1) the original comparison between ISF and MS/MS (0 eV) is valid, and (2) the higher intensities observed for ISF fragments indicate that ISF process is slightly more energetic than MS/MS (0 eV), at least with these two instrument platforms (Agilent QTOF and Agilent TOF).</p><p>Overall, these comparative examples between METLIN MS/MS (0 eV) data and ISF provide another level of evidence that the peaks observed in LC/MS experiments are predominantly associated with ISF. Figure 4 also illustrates the conceptual reasoning behind this logic, where MS/MS data are generated on all the unfiltered observable LC/MS peaks. Given the prevalence of ISF, most of the MS/MS data do not represent real molecules but instead fragment ions. This would explain why so many peaks are not identifiable in current tandem mass spectrometry databases.</p><p><b>Winnie Uritboonthai</b>: Data curation, formal analysis. <b>Linh Hoang</b>: Data acquisition. <b>Aries Aisporna</b>: Software. <b>Martin Giera</b>: Writing–review & editing. <b>Gary Siuzdak</b>: Experimental design, formal analysis, writing.</p><p>The authors declare no conflicts of interest.</p>","PeriodicalId":93411,"journal":{"name":"Analytical science advances","volume":"6 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ansa.70012","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical science advances","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ansa.70012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
To the editor,
Tandem mass spectrometry (MS/MS) is valued for its ability to facilitate molecular identification and deliver highly consistent data across a wide range of mass spectrometry platforms. Distinct from MS/MS is the fragmentation that occurs during electrospray ionization (ESI), commonly referred to as in-source fragmentation (ISF) (Figure 1). ISF was first observed in the 1950s with electron ionization and has been recognized as an inherent yet often overlooked feature of the ESI process, albeit less prevalent than with electron ionization. Recently, ISF has been associated with the overrepresentation of peaks in liquid chromatography mass spectrometry (LC/MS) data, where it accounts for the majority of observed unfiltered peaks [1]. Due to its overrepresentation in LC/MS data, and the subsequent inability to identify the molecules associated with these peaks using MS/MS data, ISF has been linked to the so-called “dark metabolome” [2, 3] (also encompassing the lipidome), a term used to describe uncharacterized molecular species in metabolomics and lipidomics. This association [1] was determined by an examination of MS/MS data acquired at 0 eV collision energy from METLIN's extensive library of over 931,000 molecular standards. However, while the similarity of ISF and MS/MS at 0 eV data has been described in previous studies [1, 4–6], it has yet to be directly established that they correlate with each other. We explored the consistency between MS/MS (0 eV) data and ISF across various molecular species to assess whether mining METLIN's MS/MS (0 eV) data—comprising over 931,000 molecular standards—can effectively link ISF to the dark metabolome and lipidome.
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) with ESI has become a cornerstone in metabolomics, lipidomics, and clinical analysis due to its accuracy in identifying small molecules within complex biological matrices. With LC-MS/MS, after ionization occurs in the ESI source, charged molecules are directed into a collision cell where they undergo fragmentation for structural analysis. This procedure is typically repeated for all charged analytes present in a sample. However, despite its utility, this method has revealed an unexpectedly vast array of spectral features associated with the “dark metabolome.” However, given the limited number of protein-coding genes [7, 8] with only a fraction producing enzymes, the chemical diversity [3, 9, 10] detected through LC-MS/MS—potentially hundreds of thousands or even millions of metabolites—far exceeds biological expectations. Current estimates suggest that less than 2% of observed LC-MS/MS spectra can be annotated, a potentially broad spectrum of unknown compounds [3]. Recent research [1] using the METLIN database and its data at 0 eV has shed light on this discrepancy, and much of the perceived complexity may stem from technological factors, particularly ISF, rather than from biological diversity itself.
Our laboratory, along with several others [11], has observed the widespread occurrence of ISF [12, 13]. This process involves the fragmentation of analytes during the initial ionization stage within the ESI source, occurring before they reach the collision cell. Essentially, ISF can transform a single analyte into multiple molecular ions and fragments, creating a complex array of ions from what was initially a single entity. Consequently, the mass analyzer indiscriminately isolates and further fragments whatever enters the collision cell. Given this understanding, we suspect that ISF may play a significant role in contributing to the so-called dark metabolome.
In order to correlate the observation of peaks and ISF, we examined the METLIN MS/MS database [14], which consists of over 931,000 molecular standards representing over 350 chemical classes in which we mined METLIN's MS/MS data at 0 eV, an energy designed to simulate the absence of CID. This analysis was performed to assess whether MS/MS spectra acquired at 0 eV collision energy in METLIN could reflect ISF-related fragments. The analysis revealed that ISF could account for over 70% of the peaks observed in typical LC-MS/MS metabolomic datasets when using a 5% cutoff threshold. This number rises when the threshold is reduced to less than 3%. The 5% and 3% thresholds represent a conservative range of peak intensities across LC/MS experiments, where the typical intensity count numbers range from 10000 to millions, well over two orders of magnitude.
While the METLIN study provides a large statistical snapshot of the number of ISF peaks in a typical LC/MS experiment, it lacked example data directly comparing the similarity between ISF and MS/MS (0 eV) data. Here, we examined both types of data (METLIN MS/MS 0 eV and ISF) from 10 molecules (Figures 2 and 3). ISF data were acquired using both an Agilent QTOF (collision cell off) mass spectrometer and an Agilent TOF mass spectrometer. The data revealed a high level of consistency between METLIN MS/MS 0 eV and ISF produced fragment ions, although the intensities were generally higher for the ISF generated fragment ion peaks. These examples suggest that (1) the original comparison between ISF and MS/MS (0 eV) is valid, and (2) the higher intensities observed for ISF fragments indicate that ISF process is slightly more energetic than MS/MS (0 eV), at least with these two instrument platforms (Agilent QTOF and Agilent TOF).
Overall, these comparative examples between METLIN MS/MS (0 eV) data and ISF provide another level of evidence that the peaks observed in LC/MS experiments are predominantly associated with ISF. Figure 4 also illustrates the conceptual reasoning behind this logic, where MS/MS data are generated on all the unfiltered observable LC/MS peaks. Given the prevalence of ISF, most of the MS/MS data do not represent real molecules but instead fragment ions. This would explain why so many peaks are not identifiable in current tandem mass spectrometry databases.
Winnie Uritboonthai: Data curation, formal analysis. Linh Hoang: Data acquisition. Aries Aisporna: Software. Martin Giera: Writing–review & editing. Gary Siuzdak: Experimental design, formal analysis, writing.