Corinna Brungs, Robin Schmid, Steffen Heuckeroth, Aninda Mazumdar, Matúš Drexler, Pavel Šácha, Pieter C. Dorrestein, Daniel Petras, Louis-Felix Nothias, Václav Veverka, Radim Nencka, Zdeněk Kameník, Tomáš Pluskal
{"title":"MSnLib: efficient generation of open multi-stage fragmentation mass spectral libraries","authors":"Corinna Brungs, Robin Schmid, Steffen Heuckeroth, Aninda Mazumdar, Matúš Drexler, Pavel Šácha, Pieter C. Dorrestein, Daniel Petras, Louis-Felix Nothias, Václav Veverka, Radim Nencka, Zdeněk Kameník, Tomáš Pluskal","doi":"10.1038/s41592-025-02813-0","DOIUrl":null,"url":null,"abstract":"Untargeted high-resolution mass spectrometry is a key tool in clinical metabolomics, natural product discovery and exposomics, with compound identification remaining the major bottleneck. Currently, the standard workflow applies spectral library matching against tandem mass spectrometry (MS2) fragmentation data. Multi-stage fragmentation (MSn) yields more profound insights into substructures, enabling validation of fragmentation pathways; however, the community lacks open MSn reference data of diverse natural products and other chemicals. Here we describe MSnLib, a machine learning-ready open resource of >2 million spectra in MSn trees of 30,008 unique small molecules, built with a high-throughput data acquisition and processing pipeline in the open-source software mzmine. MSnLib is a large-scale, open MSn spectral library featuring >2.3 million MSn and >357,000 MS2 spectra for 30,008 unique small molecules.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 10","pages":"2028-2031"},"PeriodicalIF":32.1000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s41592-025-02813-0.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Methods","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41592-025-02813-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Untargeted high-resolution mass spectrometry is a key tool in clinical metabolomics, natural product discovery and exposomics, with compound identification remaining the major bottleneck. Currently, the standard workflow applies spectral library matching against tandem mass spectrometry (MS2) fragmentation data. Multi-stage fragmentation (MSn) yields more profound insights into substructures, enabling validation of fragmentation pathways; however, the community lacks open MSn reference data of diverse natural products and other chemicals. Here we describe MSnLib, a machine learning-ready open resource of >2 million spectra in MSn trees of 30,008 unique small molecules, built with a high-throughput data acquisition and processing pipeline in the open-source software mzmine. MSnLib is a large-scale, open MSn spectral library featuring >2.3 million MSn and >357,000 MS2 spectra for 30,008 unique small molecules.
期刊介绍:
Nature Methods is a monthly journal that focuses on publishing innovative methods and substantial enhancements to fundamental life sciences research techniques. Geared towards a diverse, interdisciplinary readership of researchers in academia and industry engaged in laboratory work, the journal offers new tools for research and emphasizes the immediate practical significance of the featured work. It publishes primary research papers and reviews recent technical and methodological advancements, with a particular interest in primary methods papers relevant to the biological and biomedical sciences. This includes methods rooted in chemistry with practical applications for studying biological problems.