Danillo Gontijo , Douglas Rolins Santana , Gustavo de Assis Costa , Victor E. Cabrera , Eduardo Noronha de Andrade Freitas
{"title":"Dairy GPT: Empowering dairy farmers to interact with numerical databases through natural language conversations","authors":"Danillo Gontijo , Douglas Rolins Santana , Gustavo de Assis Costa , Victor E. Cabrera , Eduardo Noronha de Andrade Freitas","doi":"10.1016/j.atech.2025.101097","DOIUrl":null,"url":null,"abstract":"<div><div>Large language models (LLMs), like GPT-4, have revolutionized artificial intelligence by enabling intuitive text and voice interactions, simplifying complex tasks, and democratizing access to AI-driven tools. However, one of their primary limitations lies in their ability to effectively handle interactions with strictly numerical data. This limitation has led to innovative solutions such as Retrieval Augmented Generation (RAG) and Natural Language to SQL (NL2SQL), which enhance their applicability in data-intensive domains. This study investigated the possibility and feasibility of using large language models (LLMs) to allow natural language interactions of dairy farmers with purely numerical databases. To support the proposed study, we constructed a dataset consisting of 25,925 daily milk production records from 85 cows, derived from real data collected at the University of Wisconsin-Madison Agricultural Research Station. Three analyses pipelines were proposed to assess the effectiveness of LLMs handling of numerical databases: Prompt Engineering (zero-shot), Retrieval-Augmented Generation (RAG), and NL2SQL with Decomposition, evaluated using a set of quantitative (5) and qualitative (5) questions. Based on these 10 questions, the NL2SQL with Decomposition achieved 80% accuracy for quantitative questions and the Zero-shot achieved 100% for qualitative questions. These results demonstrate the potential of LLMs to enhance data utilization in dairy farming. Future work will focus on refining the proposed methods and expanding their applicability to other livestock purposes.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"12 ","pages":"Article 101097"},"PeriodicalIF":6.3000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375525003302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Large language models (LLMs), like GPT-4, have revolutionized artificial intelligence by enabling intuitive text and voice interactions, simplifying complex tasks, and democratizing access to AI-driven tools. However, one of their primary limitations lies in their ability to effectively handle interactions with strictly numerical data. This limitation has led to innovative solutions such as Retrieval Augmented Generation (RAG) and Natural Language to SQL (NL2SQL), which enhance their applicability in data-intensive domains. This study investigated the possibility and feasibility of using large language models (LLMs) to allow natural language interactions of dairy farmers with purely numerical databases. To support the proposed study, we constructed a dataset consisting of 25,925 daily milk production records from 85 cows, derived from real data collected at the University of Wisconsin-Madison Agricultural Research Station. Three analyses pipelines were proposed to assess the effectiveness of LLMs handling of numerical databases: Prompt Engineering (zero-shot), Retrieval-Augmented Generation (RAG), and NL2SQL with Decomposition, evaluated using a set of quantitative (5) and qualitative (5) questions. Based on these 10 questions, the NL2SQL with Decomposition achieved 80% accuracy for quantitative questions and the Zero-shot achieved 100% for qualitative questions. These results demonstrate the potential of LLMs to enhance data utilization in dairy farming. Future work will focus on refining the proposed methods and expanding their applicability to other livestock purposes.