利用大语言模型自动从科学文献中提取化学食品安全危害

Applied Food Research Pub Date : 2024-12-27 DOI:10.1016/j.afres.2024.100679

Neris Özen, Wenjuan Mu, Esther D. van Asselt, Leonieke M. van den Bulk

{"title":"利用大语言模型自动从科学文献中提取化学食品安全危害","authors":"Neris Özen, Wenjuan Mu, Esther D. van Asselt, Leonieke M. van den Bulk","doi":"10.1016/j.afres.2024.100679","DOIUrl":null,"url":null,"abstract":"<div><div>The number of scientific articles published in the domain of food safety has consistently been increasing over the last few decades. It has therefore become unfeasible for food safety experts to read all relevant literature related to food safety and the occurrence of hazards in the food chain. However, it is important that food safety experts are aware of the newest findings and can access this information in an easy and concise way. In this study, an approach is presented to automate the extraction of chemical hazards from the scientific literature through large language models. The large language model was used out-of-the-box and applied on scientific abstracts; no extra training of the models or a large computing cluster was required. Three different styles of prompting the model were tested to assess which was the most optimal for the task at hand. The prompts were optimized with two validation foods (leafy greens and shellfish) and the final performance of the best prompt was evaluated using three test foods (dairy, maize and salmon). The specific wording of the prompt was found to have a considerable effect on the results. A prompt breaking the task down into smaller steps performed best overall. This prompt reached an average accuracy of 93 % and contained many chemical contaminants already included in food monitoring programs, validating the successful retrieval of relevant hazards for the food safety domain. The results showcase how valuable large language models can be for the task of automatic information extraction from the scientific literature.</div></div>","PeriodicalId":8168,"journal":{"name":"Applied Food Research","volume":"5 1","pages":"Article 100679"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extracting chemical food safety hazards from the scientific literature automatically using large language models\",\"authors\":\"Neris Özen, Wenjuan Mu, Esther D. van Asselt, Leonieke M. van den Bulk\",\"doi\":\"10.1016/j.afres.2024.100679\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The number of scientific articles published in the domain of food safety has consistently been increasing over the last few decades. It has therefore become unfeasible for food safety experts to read all relevant literature related to food safety and the occurrence of hazards in the food chain. However, it is important that food safety experts are aware of the newest findings and can access this information in an easy and concise way. In this study, an approach is presented to automate the extraction of chemical hazards from the scientific literature through large language models. The large language model was used out-of-the-box and applied on scientific abstracts; no extra training of the models or a large computing cluster was required. Three different styles of prompting the model were tested to assess which was the most optimal for the task at hand. The prompts were optimized with two validation foods (leafy greens and shellfish) and the final performance of the best prompt was evaluated using three test foods (dairy, maize and salmon). The specific wording of the prompt was found to have a considerable effect on the results. A prompt breaking the task down into smaller steps performed best overall. This prompt reached an average accuracy of 93 % and contained many chemical contaminants already included in food monitoring programs, validating the successful retrieval of relevant hazards for the food safety domain. The results showcase how valuable large language models can be for the task of automatic information extraction from the scientific literature.</div></div>\",\"PeriodicalId\":8168,\"journal\":{\"name\":\"Applied Food Research\",\"volume\":\"5 1\",\"pages\":\"Article 100679\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Food Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772502224002890\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Food Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772502224002890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在过去的几十年里，在食品安全领域发表的科学文章的数量一直在增加。因此，食品安全专家不可能阅读所有与食品安全和食品链中危害发生有关的相关文献。然而，重要的是食品安全专家了解最新的发现，并能以一种简单明了的方式获取这些信息。在本研究中，提出了一种通过大型语言模型从科学文献中自动提取化学危害的方法。大语言模型开箱即用，应用于科学摘要；不需要对模型进行额外的训练，也不需要大型计算集群。我们测试了三种不同的提示模式，以评估哪一种最适合手头的任务。使用两种验证食品（绿叶蔬菜和贝类）对提示进行优化，并使用三种测试食品（乳制品、玉米和鲑鱼）对最佳提示的最终性能进行评估。提示的具体措辞被发现对结果有相当大的影响。将任务分解成更小的步骤的提示总体上表现最好。这个提示达到了93%的平均准确率，并且包含了许多已经包含在食品监测程序中的化学污染物，验证了成功检索食品安全领域的相关危害。结果表明，大型语言模型对于从科学文献中自动提取信息的任务是多么有价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extracting chemical food safety hazards from the scientific literature automatically using large language models

The number of scientific articles published in the domain of food safety has consistently been increasing over the last few decades. It has therefore become unfeasible for food safety experts to read all relevant literature related to food safety and the occurrence of hazards in the food chain. However, it is important that food safety experts are aware of the newest findings and can access this information in an easy and concise way. In this study, an approach is presented to automate the extraction of chemical hazards from the scientific literature through large language models. The large language model was used out-of-the-box and applied on scientific abstracts; no extra training of the models or a large computing cluster was required. Three different styles of prompting the model were tested to assess which was the most optimal for the task at hand. The prompts were optimized with two validation foods (leafy greens and shellfish) and the final performance of the best prompt was evaluated using three test foods (dairy, maize and salmon). The specific wording of the prompt was found to have a considerable effect on the results. A prompt breaking the task down into smaller steps performed best overall. This prompt reached an average accuracy of 93 % and contained many chemical contaminants already included in food monitoring programs, validating the successful retrieval of relevant hazards for the food safety domain. The results showcase how valuable large language models can be for the task of automatic information extraction from the scientific literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Food Research

CiteScore

4.50

自引率

0.00%

发文量