Nan Ma , Fantao Kong , Jifang Liu , Chenyang Zhang , Chenxv Zhao , Shanshan Cao , Wei Sun
{"title":"基于大型语言模型自动构建的肉鸡养殖知识图谱数据集","authors":"Nan Ma , Fantao Kong , Jifang Liu , Chenyang Zhang , Chenxv Zhao , Shanshan Cao , Wei Sun","doi":"10.1016/j.dib.2025.112018","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid advancement of artificial intelligence, intelligent farming has become a key trend in modern agriculture. In particular, the application of intelligent systems in broiler farming is essential for enhancing production efficiency and optimizing management practices. Broiler farming is a complex process involving multiple interrelated components. However, existing knowledge graphs primarily focus on disease and prevention, making it difficult to capture the intricate interdependencies within the farming process. This limits the effectiveness of knowledge-based support in decision-making. To develop a high-quality broiler farming knowledge system, this study adopts large language modeling technology to integrate a Chinese corpus and construct a comprehensive knowledge graph dataset covering four core dimensions: broiler breeds, farming environment, feeding management, and disease prevention.</div><div>The construction of the dataset involved three key stages. First, text scanning was used to extract information from farming-related literature, while web crawlers collected data from authoritative online sources. The data were then cleaned and manually validated to ensure accuracy and consistency. Second, the DeepKE knowledge extraction framework is used to automatically extract triples related to broiler farming from the text. These are then used as prompts to guide large-scale pre-trained language models (LLMs) to complete and optimize the knowledge, ultimately constructing a relatively complete knowledge graph of broiler farming. Finally, the structured knowledge was stored in a Neo4j graph database to support efficient querying and reasoning.</div><div>The dataset not only provides researchers and farms with multidimensional knowledge of the broiler farming domain, but also supports visual management and analysis, enables data-driven inference through large models, and offers new approaches to optimize farming strategies and enhance production efficiency.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112018"},"PeriodicalIF":1.4000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A knowledge graph dataset for broiler farming automatically constructed based on a large language model\",\"authors\":\"Nan Ma , Fantao Kong , Jifang Liu , Chenyang Zhang , Chenxv Zhao , Shanshan Cao , Wei Sun\",\"doi\":\"10.1016/j.dib.2025.112018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid advancement of artificial intelligence, intelligent farming has become a key trend in modern agriculture. In particular, the application of intelligent systems in broiler farming is essential for enhancing production efficiency and optimizing management practices. Broiler farming is a complex process involving multiple interrelated components. However, existing knowledge graphs primarily focus on disease and prevention, making it difficult to capture the intricate interdependencies within the farming process. This limits the effectiveness of knowledge-based support in decision-making. To develop a high-quality broiler farming knowledge system, this study adopts large language modeling technology to integrate a Chinese corpus and construct a comprehensive knowledge graph dataset covering four core dimensions: broiler breeds, farming environment, feeding management, and disease prevention.</div><div>The construction of the dataset involved three key stages. First, text scanning was used to extract information from farming-related literature, while web crawlers collected data from authoritative online sources. The data were then cleaned and manually validated to ensure accuracy and consistency. Second, the DeepKE knowledge extraction framework is used to automatically extract triples related to broiler farming from the text. These are then used as prompts to guide large-scale pre-trained language models (LLMs) to complete and optimize the knowledge, ultimately constructing a relatively complete knowledge graph of broiler farming. Finally, the structured knowledge was stored in a Neo4j graph database to support efficient querying and reasoning.</div><div>The dataset not only provides researchers and farms with multidimensional knowledge of the broiler farming domain, but also supports visual management and analysis, enables data-driven inference through large models, and offers new approaches to optimize farming strategies and enhance production efficiency.</div></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":\"62 \",\"pages\":\"Article 112018\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352340925007401\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925007401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
A knowledge graph dataset for broiler farming automatically constructed based on a large language model
With the rapid advancement of artificial intelligence, intelligent farming has become a key trend in modern agriculture. In particular, the application of intelligent systems in broiler farming is essential for enhancing production efficiency and optimizing management practices. Broiler farming is a complex process involving multiple interrelated components. However, existing knowledge graphs primarily focus on disease and prevention, making it difficult to capture the intricate interdependencies within the farming process. This limits the effectiveness of knowledge-based support in decision-making. To develop a high-quality broiler farming knowledge system, this study adopts large language modeling technology to integrate a Chinese corpus and construct a comprehensive knowledge graph dataset covering four core dimensions: broiler breeds, farming environment, feeding management, and disease prevention.
The construction of the dataset involved three key stages. First, text scanning was used to extract information from farming-related literature, while web crawlers collected data from authoritative online sources. The data were then cleaned and manually validated to ensure accuracy and consistency. Second, the DeepKE knowledge extraction framework is used to automatically extract triples related to broiler farming from the text. These are then used as prompts to guide large-scale pre-trained language models (LLMs) to complete and optimize the knowledge, ultimately constructing a relatively complete knowledge graph of broiler farming. Finally, the structured knowledge was stored in a Neo4j graph database to support efficient querying and reasoning.
The dataset not only provides researchers and farms with multidimensional knowledge of the broiler farming domain, but also supports visual management and analysis, enables data-driven inference through large models, and offers new approaches to optimize farming strategies and enhance production efficiency.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.