利用基于自然语言处理的方法确定人工制品残留物的成分。

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2024-09-27 DOI:10.1186/s12859-024-05888-2

Tung Tho Nguyen, Korey J Brownstein

{"title":"利用基于自然语言处理的方法确定人工制品残留物的成分。","authors":"Tung Tho Nguyen, Korey J Brownstein","doi":"10.1186/s12859-024-05888-2","DOIUrl":null,"url":null,"abstract":"Background: Determining the composition of artifact residues is a central problem in ancient residue metabolomics. This is done by comparing mass spectral features in common with an experimental artifact and an ancient artifact (standard method). While this method is simple and straightforward, we sought to increase the accuracy of predicting which plant species had been used in which artifacts.Results: Here, we introduce an algorithm (new method) based on ideas from the field of natural language processing (NLP) to solve this problem. We tested our strategy on a set of modern clay pipes. To limit biases, we were not provided information on which plant species had been smoked in which clay pipes. The results indicate that our new method performed 12.5% better than the standard method in predicting the plant species smoked in each artifact.Conclusions: Utilizing an NLP-based approach, we developed a robust algorithm for characterizing the composition of artifact residues. This work also discusses other general applications in which our algorithm could be used in the field of metabolomics, such as datasets where there are a limited number of replicates.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"311"},"PeriodicalIF":2.9000,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437931/pdf/","citationCount":"0","resultStr":"{\"title\":\"Utilization of a natural language processing-based approach to determine the composition of artifact residues.\",\"authors\":\"Tung Tho Nguyen, Korey J Brownstein\",\"doi\":\"10.1186/s12859-024-05888-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Determining the composition of artifact residues is a central problem in ancient residue metabolomics. This is done by comparing mass spectral features in common with an experimental artifact and an ancient artifact (standard method). While this method is simple and straightforward, we sought to increase the accuracy of predicting which plant species had been used in which artifacts.Results: Here, we introduce an algorithm (new method) based on ideas from the field of natural language processing (NLP) to solve this problem. We tested our strategy on a set of modern clay pipes. To limit biases, we were not provided information on which plant species had been smoked in which clay pipes. The results indicate that our new method performed 12.5% better than the standard method in predicting the plant species smoked in each artifact.Conclusions: Utilizing an NLP-based approach, we developed a robust algorithm for characterizing the composition of artifact residues. This work also discusses other general applications in which our algorithm could be used in the field of metabolomics, such as datasets where there are a limited number of replicates.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"25 1\",\"pages\":\"311\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437931/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-024-05888-2\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05888-2","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：确定人工残留物的组成是古残留物代谢组学的核心问题。其方法是比较实验人工残留物和古代人工残留物的共同质谱特征（标准方法）。虽然这种方法简单明了，但我们仍试图提高预测哪些植物物种曾用于哪些人工制品的准确性：在此，我们介绍了一种基于自然语言处理（NLP）理念的算法（新方法）来解决这一问题。我们在一组现代陶管上测试了我们的策略。为了限制偏差，我们没有提供关于哪些植物物种曾在哪些陶制烟斗中熏制过的信息。结果表明，我们的新方法在预测每个文物中熏制的植物种类方面比标准方法好 12.5%：结论：利用基于 NLP 的方法，我们开发了一种稳健的算法，用于确定器物残留物成分的特征。这项工作还讨论了我们的算法在代谢组学领域的其他一般应用，如重复次数有限的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Utilization of a natural language processing-based approach to determine the composition of artifact residues.

Background: Determining the composition of artifact residues is a central problem in ancient residue metabolomics. This is done by comparing mass spectral features in common with an experimental artifact and an ancient artifact (standard method). While this method is simple and straightforward, we sought to increase the accuracy of predicting which plant species had been used in which artifacts.

Results: Here, we introduce an algorithm (new method) based on ideas from the field of natural language processing (NLP) to solve this problem. We tested our strategy on a set of modern clay pipes. To limit biases, we were not provided information on which plant species had been smoked in which clay pipes. The results indicate that our new method performed 12.5% better than the standard method in predicting the plant species smoked in each artifact.

Conclusions: Utilizing an NLP-based approach, we developed a robust algorithm for characterizing the composition of artifact residues. This work also discusses other general applications in which our algorithm could be used in the field of metabolomics, such as datasets where there are a limited number of replicates.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.