利用高效卷积神经网络和视觉变压器对视频衍生猪体重进行工业规模预测

IF 5.3 1区农林科学 Q1 AGRICULTURAL ENGINEERING

Biosystems Engineering Pub Date : 2025-08-05 DOI:10.1016/j.biosystemseng.2025.104243

Ye Bi , Yijian Huang , Jianhua Xuan , Gota Morota

{"title":"利用高效卷积神经网络和视觉变压器对视频衍生猪体重进行工业规模预测","authors":"Ye Bi , Yijian Huang , Jianhua Xuan , Gota Morota","doi":"10.1016/j.biosystemseng.2025.104243","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate pig body weight measurement is critical for pig growth, health, and marketing. Although there is a growing trend towards the use of computer vision approaches for pig body weight prediction, their validation with large-scale data collected in commercial environments is still limited. Therefore, the main objective of this study was to predict pig body weight collected at multiple timepoints from a commercial environment using efficient convolutional neural networks and efficient vision transformers. Top-view videos were collected from over 600 pigs at six time points over three months. Scale-based body weight records were simultaneously recorded by a digital weighing system. An automated video conversion pipeline and fine-tuned YOLOv8 were applied to preprocess the raw depth videos. Two families of lightweight deep neural networks, MobileNet and MobileViT, were initialised with the pre-trained weights from ImageNet and customised to predict pig body weight directly from depth images. Two cross-validation strategies were used: single time point random subsampling and time series forecasting with a sparse design considering limited budget scenarios. In single time point random subsampling, the best prediction mean absolute percentage error for each time point was 4.71%, 3.80%, 3.08%, 5.60%, 3.42%, and 3.77%, respectively. On average, the MobileViT-S model produced the best prediction mean absolute percentage error. In time series forecasting, although a sparse design resulted in some performance loss compared to the full design, the use of ViT models mitigated this degradation. These results suggest that efficient deep learning-based supervised learning models are a promising approach for predicting pig body weight from industry-scale depth video data.</div></div>","PeriodicalId":9173,"journal":{"name":"Biosystems Engineering","volume":"257 ","pages":"Article 104243"},"PeriodicalIF":5.3000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Industry-scale prediction of video-derived pig body weight using efficient convolutional neural networks and vision transformers\",\"authors\":\"Ye Bi , Yijian Huang , Jianhua Xuan , Gota Morota\",\"doi\":\"10.1016/j.biosystemseng.2025.104243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate pig body weight measurement is critical for pig growth, health, and marketing. Although there is a growing trend towards the use of computer vision approaches for pig body weight prediction, their validation with large-scale data collected in commercial environments is still limited. Therefore, the main objective of this study was to predict pig body weight collected at multiple timepoints from a commercial environment using efficient convolutional neural networks and efficient vision transformers. Top-view videos were collected from over 600 pigs at six time points over three months. Scale-based body weight records were simultaneously recorded by a digital weighing system. An automated video conversion pipeline and fine-tuned YOLOv8 were applied to preprocess the raw depth videos. Two families of lightweight deep neural networks, MobileNet and MobileViT, were initialised with the pre-trained weights from ImageNet and customised to predict pig body weight directly from depth images. Two cross-validation strategies were used: single time point random subsampling and time series forecasting with a sparse design considering limited budget scenarios. In single time point random subsampling, the best prediction mean absolute percentage error for each time point was 4.71%, 3.80%, 3.08%, 5.60%, 3.42%, and 3.77%, respectively. On average, the MobileViT-S model produced the best prediction mean absolute percentage error. In time series forecasting, although a sparse design resulted in some performance loss compared to the full design, the use of ViT models mitigated this degradation. These results suggest that efficient deep learning-based supervised learning models are a promising approach for predicting pig body weight from industry-scale depth video data.</div></div>\",\"PeriodicalId\":9173,\"journal\":{\"name\":\"Biosystems Engineering\",\"volume\":\"257 \",\"pages\":\"Article 104243\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biosystems Engineering\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1537511025001795\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURAL ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biosystems Engineering","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1537511025001795","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

准确的猪体重测量对猪的生长、健康和销售至关重要。尽管使用计算机视觉方法预测猪体重的趋势越来越多，但它们在商业环境中收集的大规模数据的验证仍然有限。因此，本研究的主要目的是利用高效卷积神经网络和高效视觉变压器，预测在商业环境中多个时间点采集的猪体重。在三个月内的六个时间点收集了600多头猪的俯视图视频。数字称重系统同时记录基于体重计的体重记录。采用自动视频转换流水线和微调YOLOv8对原始深度视频进行预处理。MobileNet和MobileViT这两个轻量级深度神经网络家族，使用ImageNet预训练的权重进行初始化，并进行定制，直接从深度图像中预测猪的体重。交叉验证策略采用单时间点随机子抽样和考虑有限预算情景的稀疏设计时间序列预测。在单时间点随机子抽样中，各时间点的最佳预测平均绝对百分比误差分别为4.71%、3.80%、3.08%、5.60%、3.42%和3.77%。平均而言，MobileViT-S模型产生了最好的预测平均绝对百分比误差。在时间序列预测中，尽管与完整设计相比，稀疏设计会导致一些性能损失，但ViT模型的使用减轻了这种退化。这些结果表明，高效的基于深度学习的监督学习模型是一种很有前途的方法，可以从工业规模的深度视频数据中预测猪的体重。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Industry-scale prediction of video-derived pig body weight using efficient convolutional neural networks and vision transformers

Accurate pig body weight measurement is critical for pig growth, health, and marketing. Although there is a growing trend towards the use of computer vision approaches for pig body weight prediction, their validation with large-scale data collected in commercial environments is still limited. Therefore, the main objective of this study was to predict pig body weight collected at multiple timepoints from a commercial environment using efficient convolutional neural networks and efficient vision transformers. Top-view videos were collected from over 600 pigs at six time points over three months. Scale-based body weight records were simultaneously recorded by a digital weighing system. An automated video conversion pipeline and fine-tuned YOLOv8 were applied to preprocess the raw depth videos. Two families of lightweight deep neural networks, MobileNet and MobileViT, were initialised with the pre-trained weights from ImageNet and customised to predict pig body weight directly from depth images. Two cross-validation strategies were used: single time point random subsampling and time series forecasting with a sparse design considering limited budget scenarios. In single time point random subsampling, the best prediction mean absolute percentage error for each time point was 4.71%, 3.80%, 3.08%, 5.60%, 3.42%, and 3.77%, respectively. On average, the MobileViT-S model produced the best prediction mean absolute percentage error. In time series forecasting, although a sparse design resulted in some performance loss compared to the full design, the use of ViT models mitigated this degradation. These results suggest that efficient deep learning-based supervised learning models are a promising approach for predicting pig body weight from industry-scale depth video data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biosystems Engineering 农林科学-农业工程

CiteScore

10.60

自引率

7.80%

发文量

239

审稿时长

53 days

期刊介绍： Biosystems Engineering publishes research in engineering and the physical sciences that represent advances in understanding or modelling of the performance of biological systems for sustainable developments in land use and the environment, agriculture and amenity, bioproduction processes and the food chain. The subject matter of the journal reflects the wide range and interdisciplinary nature of research in engineering for biological systems.