A knowledge graph dataset for broiler farming automatically constructed based on a large language model

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief Pub Date : 2025-09-09 DOI:10.1016/j.dib.2025.112018

Nan Ma , Fantao Kong , Jifang Liu , Chenyang Zhang , Chenxv Zhao , Shanshan Cao , Wei Sun

{"title":"A knowledge graph dataset for broiler farming automatically constructed based on a large language model","authors":"Nan Ma , Fantao Kong , Jifang Liu , Chenyang Zhang , Chenxv Zhao , Shanshan Cao , Wei Sun","doi":"10.1016/j.dib.2025.112018","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid advancement of artificial intelligence, intelligent farming has become a key trend in modern agriculture. In particular, the application of intelligent systems in broiler farming is essential for enhancing production efficiency and optimizing management practices. Broiler farming is a complex process involving multiple interrelated components. However, existing knowledge graphs primarily focus on disease and prevention, making it difficult to capture the intricate interdependencies within the farming process. This limits the effectiveness of knowledge-based support in decision-making. To develop a high-quality broiler farming knowledge system, this study adopts large language modeling technology to integrate a Chinese corpus and construct a comprehensive knowledge graph dataset covering four core dimensions: broiler breeds, farming environment, feeding management, and disease prevention.</div><div>The construction of the dataset involved three key stages. First, text scanning was used to extract information from farming-related literature, while web crawlers collected data from authoritative online sources. The data were then cleaned and manually validated to ensure accuracy and consistency. Second, the DeepKE knowledge extraction framework is used to automatically extract triples related to broiler farming from the text. These are then used as prompts to guide large-scale pre-trained language models (LLMs) to complete and optimize the knowledge, ultimately constructing a relatively complete knowledge graph of broiler farming. Finally, the structured knowledge was stored in a Neo4j graph database to support efficient querying and reasoning.</div><div>The dataset not only provides researchers and farms with multidimensional knowledge of the broiler farming domain, but also supports visual management and analysis, enables data-driven inference through large models, and offers new approaches to optimize farming strategies and enhance production efficiency.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112018"},"PeriodicalIF":1.4000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925007401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

With the rapid advancement of artificial intelligence, intelligent farming has become a key trend in modern agriculture. In particular, the application of intelligent systems in broiler farming is essential for enhancing production efficiency and optimizing management practices. Broiler farming is a complex process involving multiple interrelated components. However, existing knowledge graphs primarily focus on disease and prevention, making it difficult to capture the intricate interdependencies within the farming process. This limits the effectiveness of knowledge-based support in decision-making. To develop a high-quality broiler farming knowledge system, this study adopts large language modeling technology to integrate a Chinese corpus and construct a comprehensive knowledge graph dataset covering four core dimensions: broiler breeds, farming environment, feeding management, and disease prevention.

The construction of the dataset involved three key stages. First, text scanning was used to extract information from farming-related literature, while web crawlers collected data from authoritative online sources. The data were then cleaned and manually validated to ensure accuracy and consistency. Second, the DeepKE knowledge extraction framework is used to automatically extract triples related to broiler farming from the text. These are then used as prompts to guide large-scale pre-trained language models (LLMs) to complete and optimize the knowledge, ultimately constructing a relatively complete knowledge graph of broiler farming. Finally, the structured knowledge was stored in a Neo4j graph database to support efficient querying and reasoning.

The dataset not only provides researchers and farms with multidimensional knowledge of the broiler farming domain, but also supports visual management and analysis, enables data-driven inference through large models, and offers new approaches to optimize farming strategies and enhance production efficiency.

查看原文本刊更多论文

基于大型语言模型自动构建的肉鸡养殖知识图谱数据集

随着人工智能的飞速发展，智能农业已成为现代农业发展的重要趋势。特别是，智能系统在肉鸡养殖中的应用对于提高生产效率和优化管理实践至关重要。肉鸡养殖是一个复杂的过程，涉及多个相互关联的组成部分。然而，现有的知识图谱主要关注疾病和预防，因此很难捕捉到农业过程中错综复杂的相互依赖关系。这限制了基于知识的决策支持的有效性。为构建高质量的肉鸡养殖知识体系，本研究采用大语言建模技术整合中文语料库，构建了涵盖肉鸡品种、养殖环境、饲养管理和疾病预防四个核心维度的综合性知识图谱数据集。数据集的构建涉及三个关键阶段。首先，使用文本扫描从与农业相关的文献中提取信息，而网络爬虫从权威的在线资源中收集数据。然后对数据进行清理和手动验证，以确保准确性和一致性。其次，利用DeepKE知识提取框架，从文本中自动提取与肉鸡养殖相关的三元组；然后使用这些提示来指导大规模预训练语言模型（llm）完成和优化知识，最终构建相对完整的肉鸡养殖知识图谱。最后，将结构化知识存储在Neo4j图形数据库中，以支持高效的查询和推理。该数据集不仅为研究人员和养殖场提供了肉鸡养殖领域的多维知识，还支持可视化管理和分析，通过大型模型实现数据驱动的推理，并为优化养殖策略和提高生产效率提供了新的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data in Brief MULTIDISCIPLINARY SCIENCES-

CiteScore

3.10

自引率

0.00%

发文量

996

审稿时长

70 days

期刊介绍： Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.