{"title":"Exploring New Frontiers in Agricultural NLP: Investigating the Potential of Large Language Models for Food Applications","authors":"Saed Rezayi;Zhengliang Liu;Zihao Wu;Chandra Dhakal;Bao Ge;Haixing Dai;Gengchen Mai;Ninghao Liu;Chen Zhen;Tianming Liu;Sheng Li","doi":"10.1109/TBDATA.2024.3442542","DOIUrl":null,"url":null,"abstract":"This paper explores new frontiers in agricultural natural language processing (NLP) by investigating the effectiveness of food-related text corpora for pretraining transformer-based language models. Specifically, we focus on semantic matching, establishing mappings between food descriptions and nutrition data through fine-tuning AgriBERT with the FoodOn ontology. Our work introduces an expanded comparison with state-of-the-art language models such as GPT-4, Mistral-large, Claude 3 Sonnet, and Gemini 1.0 Ultra. This exploratory investigation, rather than a direct comparison, aims to understand how AgriBERT, a domain-specific, fine-tuned, open-source model, complements the broad knowledge and generative abilities of these advanced LLMs in addressing the unique challenges of the agricultural sector. We also experiment with other applications, such as cuisine prediction from ingredients, expanding our research to include various NLP tasks beyond semantic matching. Overall, this paper underscores the potential of integrating domain-specific models like AgriBERT with advanced LLMs to enhance the performance and applicability of agricultural NLP applications.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1235-1246"},"PeriodicalIF":7.5000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10637955/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper explores new frontiers in agricultural natural language processing (NLP) by investigating the effectiveness of food-related text corpora for pretraining transformer-based language models. Specifically, we focus on semantic matching, establishing mappings between food descriptions and nutrition data through fine-tuning AgriBERT with the FoodOn ontology. Our work introduces an expanded comparison with state-of-the-art language models such as GPT-4, Mistral-large, Claude 3 Sonnet, and Gemini 1.0 Ultra. This exploratory investigation, rather than a direct comparison, aims to understand how AgriBERT, a domain-specific, fine-tuned, open-source model, complements the broad knowledge and generative abilities of these advanced LLMs in addressing the unique challenges of the agricultural sector. We also experiment with other applications, such as cuisine prediction from ingredients, expanding our research to include various NLP tasks beyond semantic matching. Overall, this paper underscores the potential of integrating domain-specific models like AgriBERT with advanced LLMs to enhance the performance and applicability of agricultural NLP applications.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.