[A review and research prospects on the application of the XCMS mass-spectrometry data-processing software in the environmental science field].

Cheng Yang, Ao Zhang, Zhan-Qi Gao, Guan-Yong Su
{"title":"[A review and research prospects on the application of the XCMS mass-spectrometry data-processing software in the environmental science field].","authors":"Cheng Yang, Ao Zhang, Zhan-Qi Gao, Guan-Yong Su","doi":"10.3724/SP.J.1123.2025.01019","DOIUrl":null,"url":null,"abstract":"<p><p>Biological and environmental samples are complex and contain a highly diverse range of compounds. Analyzing these samples by chromatography-high-resolution mass spectrometry generates a substantial volume of mass-spectrometry data that are composed of mass-to-charge-ratio (<i>m/z</i>), retention-time (RT), and peak-intensity information that require considerable time and energy to process. Consequently, employing software to process mass-spectrometry data for identification and analysis purposes is imperative. Among the many mass-spectrometry data-processing options, XCMS (various forms (X) of chromatography mass spectrometry), which is highly efficient, precise, and freely accessible software for processing mass-spectrometry data, is broadly used in the environmental science field. This study aimed to explore the use of XCMS in environmental science applications by comprehensively reviewing the workflow, underlying principles, and parameter-optimization measures of XCMS. The workflow mainly includes importing, processing, and exporting data. Importing data requires the use of format conversion tools, such as MSConvert, which converts data generated by various instruments into a format acceptable by XCMS, while data processing includes peak detection, alignment, and filling. The various XCMS functions are mainly realized via its built-in algorithms, with the Matched Filter, CentWave, Obiwarp, and Peak Density algorithms most commonly used. The first two algorithms implement the peak-detection function, while the latter two implement the peak-alignment function. XCMS identifies compound peaks from mass-spectrometry data during peak-detection; it first filters for noise and corrects the baseline. An algorithm then detects peaks based on their shapes and intensities. XCMS can also de-emphasize and de-distort to filter out interfering information in each peak signal. The CentWave algorithm is particularly effective for processing high-resolution mass-spectrometry data by improving detection accuracy and recall. Peak-detection is followed by alignment. Here, XCMS uses kernel density estimations to match peaks between samples by estimating the retention-time distribution of matched peaks, which corrects for any nonlinear deviations in retention-times. This step is critical for accurately comparing samples. The peak-filling step resolves missing peaks in the data, and XCMS uses information from other samples to fill these gaps. This process enhances the integrity of the dataset and improves analysis accuracy. In terms of applications, XCMS has demonstrated significant progress for the non-targeted screening of environmental pollutants, identifying exogenous metabolic pollutant transformations, and exploring the endogenous metabolisms of biomolecules. For example, XCMS efficiently extracts the mass spectrometry of complex samples during the non-targeted screening of environmental pollutants, thereby providing a reliable database for subsequent identification. Although the use of XCMS in the environmental science field has delivered particular results, some limitations still exist, including the use of large amounts of memory, problems associated with the software crashing when dealing with large-scale data, and the misclassification of noise as valid signals during feature detection, which results in a large number of false positives, errors, and missed detections when processing data for compounds with complex chemical compositions and structural types. In addition, the degree of user interaction and automation requires further improvement. XCMS offers significant developmental potential in the environmental science field. Continuing algorithmic optimization and database expansion through improvements in algorithmic robustness, data compatibility, and user experience, are expected to see XCMS develop broadly and provide more powerful support for the environmental science field in the future.</p>","PeriodicalId":101336,"journal":{"name":"Se pu = Chinese journal of chromatography","volume":"43 6","pages":"585-593"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12093214/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Se pu = Chinese journal of chromatography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3724/SP.J.1123.2025.01019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Biological and environmental samples are complex and contain a highly diverse range of compounds. Analyzing these samples by chromatography-high-resolution mass spectrometry generates a substantial volume of mass-spectrometry data that are composed of mass-to-charge-ratio (m/z), retention-time (RT), and peak-intensity information that require considerable time and energy to process. Consequently, employing software to process mass-spectrometry data for identification and analysis purposes is imperative. Among the many mass-spectrometry data-processing options, XCMS (various forms (X) of chromatography mass spectrometry), which is highly efficient, precise, and freely accessible software for processing mass-spectrometry data, is broadly used in the environmental science field. This study aimed to explore the use of XCMS in environmental science applications by comprehensively reviewing the workflow, underlying principles, and parameter-optimization measures of XCMS. The workflow mainly includes importing, processing, and exporting data. Importing data requires the use of format conversion tools, such as MSConvert, which converts data generated by various instruments into a format acceptable by XCMS, while data processing includes peak detection, alignment, and filling. The various XCMS functions are mainly realized via its built-in algorithms, with the Matched Filter, CentWave, Obiwarp, and Peak Density algorithms most commonly used. The first two algorithms implement the peak-detection function, while the latter two implement the peak-alignment function. XCMS identifies compound peaks from mass-spectrometry data during peak-detection; it first filters for noise and corrects the baseline. An algorithm then detects peaks based on their shapes and intensities. XCMS can also de-emphasize and de-distort to filter out interfering information in each peak signal. The CentWave algorithm is particularly effective for processing high-resolution mass-spectrometry data by improving detection accuracy and recall. Peak-detection is followed by alignment. Here, XCMS uses kernel density estimations to match peaks between samples by estimating the retention-time distribution of matched peaks, which corrects for any nonlinear deviations in retention-times. This step is critical for accurately comparing samples. The peak-filling step resolves missing peaks in the data, and XCMS uses information from other samples to fill these gaps. This process enhances the integrity of the dataset and improves analysis accuracy. In terms of applications, XCMS has demonstrated significant progress for the non-targeted screening of environmental pollutants, identifying exogenous metabolic pollutant transformations, and exploring the endogenous metabolisms of biomolecules. For example, XCMS efficiently extracts the mass spectrometry of complex samples during the non-targeted screening of environmental pollutants, thereby providing a reliable database for subsequent identification. Although the use of XCMS in the environmental science field has delivered particular results, some limitations still exist, including the use of large amounts of memory, problems associated with the software crashing when dealing with large-scale data, and the misclassification of noise as valid signals during feature detection, which results in a large number of false positives, errors, and missed detections when processing data for compounds with complex chemical compositions and structural types. In addition, the degree of user interaction and automation requires further improvement. XCMS offers significant developmental potential in the environmental science field. Continuing algorithmic optimization and database expansion through improvements in algorithmic robustness, data compatibility, and user experience, are expected to see XCMS develop broadly and provide more powerful support for the environmental science field in the future.

Abstract Image

[XCMS质谱数据处理软件在环境科学领域的应用综述与研究展望]。
生物和环境样品是复杂的,含有高度多样化的化合物。通过色谱-高分辨率质谱分析这些样品会产生大量的质谱数据,这些数据由质荷比(m/z)、保留时间(RT)和峰强度信息组成,需要大量的时间和精力来处理。因此,采用软件来处理质谱数据进行鉴定和分析是势在必行的。在众多质谱数据处理选项中,XCMS(各种形式的色谱质谱)是一种高效、精确、可免费获取的质谱数据处理软件,在环境科学领域得到了广泛的应用。本研究旨在通过对XCMS的工作流程、基本原理和参数优化措施的综合综述,探讨XCMS在环境科学中的应用。工作流主要包括数据的导入、处理和导出。导入数据需要使用格式转换工具,例如MSConvert,它将各种仪器生成的数据转换为XCMS可接受的格式,而数据处理包括峰值检测、对齐和填充。XCMS的各种功能主要通过其内置算法实现,其中最常用的是Matched Filter、CentWave、Obiwarp和Peak Density算法。前两种算法实现峰值检测功能,后两种算法实现峰值对齐功能。XCMS在峰检测期间从质谱数据中识别复合峰,首先过滤噪声并校正基线。然后,一个算法根据它们的形状和强度检测到峰值。XCMS还可以去强调和去扭曲,以过滤掉每个峰值信号中的干扰信息。通过提高检测精度和召回率,CentWave算法在处理高分辨率质谱数据方面特别有效。峰值检测之后是校准。在这里,XCMS使用核密度估计通过估计匹配峰的保留时间分布来匹配样本之间的峰,这纠正了保留时间中的任何非线性偏差。这一步对于准确比较样品至关重要。峰填充步骤解决了数据中缺失的峰,XCMS使用来自其他样本的信息来填补这些空白。这一过程增强了数据集的完整性,提高了分析的准确性。在应用方面,XCMS在环境污染物的非靶向筛选、外源代谢污染物转化的识别、生物分子内源代谢的探索等方面取得了显著进展。例如,在环境污染物的非靶向筛选过程中,XCMS可以高效提取复杂样品的质谱,为后续鉴定提供可靠的数据库。虽然XCMS在环境科学领域的应用已经取得了特殊的成果,但仍然存在一些局限性,包括使用大量内存,处理大规模数据时软件崩溃的相关问题,以及在特征检测过程中将噪声误分类为有效信号,从而导致大量误报、错误、在处理具有复杂化学成分和结构类型的化合物的数据时遗漏了检测。此外,用户交互和自动化程度还有待进一步提高。XCMS在环境科学领域具有巨大的发展潜力。通过改进算法鲁棒性、数据兼容性和用户体验,持续优化算法和扩展数据库,XCMS有望在未来得到广泛发展,并为环境科学领域提供更强大的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信