{"title":"使用大型语言模型识别环境微塑料:deepseek - r1 -蒸馏- llama - 8b, gpt - 40和gpt - 40 -mini","authors":"Zijiang Yang , Hisayuki Arakawa","doi":"10.1016/j.vibspec.2025.103842","DOIUrl":null,"url":null,"abstract":"<div><div>Microplastic pollution in the environment poses increasing risks to both ecological and human health. Identifying microplastics in environmental samples is important for monitoring and mitigation. However, current methods rely on manual interpretation of infrared (IR) spectra, which is time-consuming and labor-intensive. Thus, this study investigates the potential of large language models (LLMs) for identifying microplastics using IR spectra from environmental samples. Three models, DeepSeek-R1-Distill-Llama-8B, GPT-4o-2024–08–06 (GPT-4o), and GPT-4o-mini-2024–07–18 (GPT-4o-mini), were evaluated within a structured workflow that integrates spectral processing and model implementation. A performance evaluation framework was developed to measure identification accuracy. Results indicate that DeepSeek-R1-Distill-Llama-8B outperformed others, achieving an accuracy exceeding 0.93 across all tested polymer types, making it the preferred choice. GPT-4o proved a strong alternative, particularly when local execution is impractical, with accuracy above 0.86. GPT-4o-mini underperformed and is not recommended. Despite these promising outcomes, challenges persist, including the need to optimize spectral processing parameters and refine prompt design. As the first study to apply LLMs to microplastic identification, this work offers a foundational reference for leveraging LLM-driven spectral analysis in environmental monitoring.</div></div>","PeriodicalId":23656,"journal":{"name":"Vibrational Spectroscopy","volume":"140 ","pages":"Article 103842"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification of environmental microplastics using large language models: DeepSeek-R1-Distill-Llama-8B, GPT-4o, and GPT-4o-mini\",\"authors\":\"Zijiang Yang , Hisayuki Arakawa\",\"doi\":\"10.1016/j.vibspec.2025.103842\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Microplastic pollution in the environment poses increasing risks to both ecological and human health. Identifying microplastics in environmental samples is important for monitoring and mitigation. However, current methods rely on manual interpretation of infrared (IR) spectra, which is time-consuming and labor-intensive. Thus, this study investigates the potential of large language models (LLMs) for identifying microplastics using IR spectra from environmental samples. Three models, DeepSeek-R1-Distill-Llama-8B, GPT-4o-2024–08–06 (GPT-4o), and GPT-4o-mini-2024–07–18 (GPT-4o-mini), were evaluated within a structured workflow that integrates spectral processing and model implementation. A performance evaluation framework was developed to measure identification accuracy. Results indicate that DeepSeek-R1-Distill-Llama-8B outperformed others, achieving an accuracy exceeding 0.93 across all tested polymer types, making it the preferred choice. GPT-4o proved a strong alternative, particularly when local execution is impractical, with accuracy above 0.86. GPT-4o-mini underperformed and is not recommended. Despite these promising outcomes, challenges persist, including the need to optimize spectral processing parameters and refine prompt design. As the first study to apply LLMs to microplastic identification, this work offers a foundational reference for leveraging LLM-driven spectral analysis in environmental monitoring.</div></div>\",\"PeriodicalId\":23656,\"journal\":{\"name\":\"Vibrational Spectroscopy\",\"volume\":\"140 \",\"pages\":\"Article 103842\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Vibrational Spectroscopy\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0924203125000761\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vibrational Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924203125000761","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
Identification of environmental microplastics using large language models: DeepSeek-R1-Distill-Llama-8B, GPT-4o, and GPT-4o-mini
Microplastic pollution in the environment poses increasing risks to both ecological and human health. Identifying microplastics in environmental samples is important for monitoring and mitigation. However, current methods rely on manual interpretation of infrared (IR) spectra, which is time-consuming and labor-intensive. Thus, this study investigates the potential of large language models (LLMs) for identifying microplastics using IR spectra from environmental samples. Three models, DeepSeek-R1-Distill-Llama-8B, GPT-4o-2024–08–06 (GPT-4o), and GPT-4o-mini-2024–07–18 (GPT-4o-mini), were evaluated within a structured workflow that integrates spectral processing and model implementation. A performance evaluation framework was developed to measure identification accuracy. Results indicate that DeepSeek-R1-Distill-Llama-8B outperformed others, achieving an accuracy exceeding 0.93 across all tested polymer types, making it the preferred choice. GPT-4o proved a strong alternative, particularly when local execution is impractical, with accuracy above 0.86. GPT-4o-mini underperformed and is not recommended. Despite these promising outcomes, challenges persist, including the need to optimize spectral processing parameters and refine prompt design. As the first study to apply LLMs to microplastic identification, this work offers a foundational reference for leveraging LLM-driven spectral analysis in environmental monitoring.
期刊介绍:
Vibrational Spectroscopy provides a vehicle for the publication of original research that focuses on vibrational spectroscopy. This covers infrared, near-infrared and Raman spectroscopies and publishes papers dealing with developments in applications, theory, techniques and instrumentation.
The topics covered by the journal include:
Sampling techniques,
Vibrational spectroscopy coupled with separation techniques,
Instrumentation (Fourier transform, conventional and laser based),
Data manipulation,
Spectra-structure correlation and group frequencies.
The application areas covered include:
Analytical chemistry,
Bio-organic and bio-inorganic chemistry,
Organic chemistry,
Inorganic chemistry,
Catalysis,
Environmental science,
Industrial chemistry,
Materials science,
Physical chemistry,
Polymer science,
Process control,
Specialized problem solving.