Identification of environmental microplastics using large language models: DeepSeek-R1-Distill-Llama-8B, GPT-4o, and GPT-4o-mini

IF 3.1 3区 化学 Q2 CHEMISTRY, ANALYTICAL
Zijiang Yang , Hisayuki Arakawa
{"title":"Identification of environmental microplastics using large language models: DeepSeek-R1-Distill-Llama-8B, GPT-4o, and GPT-4o-mini","authors":"Zijiang Yang ,&nbsp;Hisayuki Arakawa","doi":"10.1016/j.vibspec.2025.103842","DOIUrl":null,"url":null,"abstract":"<div><div>Microplastic pollution in the environment poses increasing risks to both ecological and human health. Identifying microplastics in environmental samples is important for monitoring and mitigation. However, current methods rely on manual interpretation of infrared (IR) spectra, which is time-consuming and labor-intensive. Thus, this study investigates the potential of large language models (LLMs) for identifying microplastics using IR spectra from environmental samples. Three models, DeepSeek-R1-Distill-Llama-8B, GPT-4o-2024–08–06 (GPT-4o), and GPT-4o-mini-2024–07–18 (GPT-4o-mini), were evaluated within a structured workflow that integrates spectral processing and model implementation. A performance evaluation framework was developed to measure identification accuracy. Results indicate that DeepSeek-R1-Distill-Llama-8B outperformed others, achieving an accuracy exceeding 0.93 across all tested polymer types, making it the preferred choice. GPT-4o proved a strong alternative, particularly when local execution is impractical, with accuracy above 0.86. GPT-4o-mini underperformed and is not recommended. Despite these promising outcomes, challenges persist, including the need to optimize spectral processing parameters and refine prompt design. As the first study to apply LLMs to microplastic identification, this work offers a foundational reference for leveraging LLM-driven spectral analysis in environmental monitoring.</div></div>","PeriodicalId":23656,"journal":{"name":"Vibrational Spectroscopy","volume":"140 ","pages":"Article 103842"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vibrational Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924203125000761","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Microplastic pollution in the environment poses increasing risks to both ecological and human health. Identifying microplastics in environmental samples is important for monitoring and mitigation. However, current methods rely on manual interpretation of infrared (IR) spectra, which is time-consuming and labor-intensive. Thus, this study investigates the potential of large language models (LLMs) for identifying microplastics using IR spectra from environmental samples. Three models, DeepSeek-R1-Distill-Llama-8B, GPT-4o-2024–08–06 (GPT-4o), and GPT-4o-mini-2024–07–18 (GPT-4o-mini), were evaluated within a structured workflow that integrates spectral processing and model implementation. A performance evaluation framework was developed to measure identification accuracy. Results indicate that DeepSeek-R1-Distill-Llama-8B outperformed others, achieving an accuracy exceeding 0.93 across all tested polymer types, making it the preferred choice. GPT-4o proved a strong alternative, particularly when local execution is impractical, with accuracy above 0.86. GPT-4o-mini underperformed and is not recommended. Despite these promising outcomes, challenges persist, including the need to optimize spectral processing parameters and refine prompt design. As the first study to apply LLMs to microplastic identification, this work offers a foundational reference for leveraging LLM-driven spectral analysis in environmental monitoring.
使用大型语言模型识别环境微塑料:deepseek - r1 -蒸馏- llama - 8b, gpt - 40和gpt - 40 -mini
环境中的微塑料污染对生态和人类健康构成越来越大的风险。鉴定环境样品中的微塑料对于监测和缓解至关重要。然而,目前的方法依赖于人工解释红外光谱,这是费时费力的。因此,本研究探讨了利用环境样品的红外光谱识别微塑料的大语言模型(LLMs)的潜力。DeepSeek-R1-Distill-Llama-8B、gpt - 40 -2024 - 08 - 06 (gpt - 40)和gpt - 40 -mini-2024 - 07 - 18 (gpt - 40 -mini)三种模型在集成了光谱处理和模型实现的结构化工作流程中进行了评估。开发了一个性能评估框架来衡量识别的准确性。结果表明,DeepSeek-R1-Distill-Llama-8B优于其他方法,在所有测试的聚合物类型中实现了超过0.93的精度,使其成为首选。gpt - 40被证明是一个强大的替代方案,特别是当本地执行不切实际时,其精度高于0.86。gpt - 40 -mini表现不佳,不推荐使用。尽管取得了这些有希望的成果,但挑战依然存在,包括需要优化光谱处理参数和改进提示设计。作为第一个将llm应用于微塑料识别的研究,本工作为利用llm驱动的光谱分析在环境监测中提供了基础参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Vibrational Spectroscopy
Vibrational Spectroscopy 化学-分析化学
CiteScore
4.70
自引率
4.00%
发文量
103
审稿时长
52 days
期刊介绍: Vibrational Spectroscopy provides a vehicle for the publication of original research that focuses on vibrational spectroscopy. This covers infrared, near-infrared and Raman spectroscopies and publishes papers dealing with developments in applications, theory, techniques and instrumentation. The topics covered by the journal include: Sampling techniques, Vibrational spectroscopy coupled with separation techniques, Instrumentation (Fourier transform, conventional and laser based), Data manipulation, Spectra-structure correlation and group frequencies. The application areas covered include: Analytical chemistry, Bio-organic and bio-inorganic chemistry, Organic chemistry, Inorganic chemistry, Catalysis, Environmental science, Industrial chemistry, Materials science, Physical chemistry, Polymer science, Process control, Specialized problem solving.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信