{"title":"利用高效卷积神经网络和视觉变压器对视频衍生猪体重进行工业规模预测","authors":"Ye Bi , Yijian Huang , Jianhua Xuan , Gota Morota","doi":"10.1016/j.biosystemseng.2025.104243","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate pig body weight measurement is critical for pig growth, health, and marketing. Although there is a growing trend towards the use of computer vision approaches for pig body weight prediction, their validation with large-scale data collected in commercial environments is still limited. Therefore, the main objective of this study was to predict pig body weight collected at multiple timepoints from a commercial environment using efficient convolutional neural networks and efficient vision transformers. Top-view videos were collected from over 600 pigs at six time points over three months. Scale-based body weight records were simultaneously recorded by a digital weighing system. An automated video conversion pipeline and fine-tuned YOLOv8 were applied to preprocess the raw depth videos. Two families of lightweight deep neural networks, MobileNet and MobileViT, were initialised with the pre-trained weights from ImageNet and customised to predict pig body weight directly from depth images. Two cross-validation strategies were used: single time point random subsampling and time series forecasting with a sparse design considering limited budget scenarios. In single time point random subsampling, the best prediction mean absolute percentage error for each time point was 4.71%, 3.80%, 3.08%, 5.60%, 3.42%, and 3.77%, respectively. On average, the MobileViT-S model produced the best prediction mean absolute percentage error. In time series forecasting, although a sparse design resulted in some performance loss compared to the full design, the use of ViT models mitigated this degradation. These results suggest that efficient deep learning-based supervised learning models are a promising approach for predicting pig body weight from industry-scale depth video data.</div></div>","PeriodicalId":9173,"journal":{"name":"Biosystems Engineering","volume":"257 ","pages":"Article 104243"},"PeriodicalIF":5.3000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Industry-scale prediction of video-derived pig body weight using efficient convolutional neural networks and vision transformers\",\"authors\":\"Ye Bi , Yijian Huang , Jianhua Xuan , Gota Morota\",\"doi\":\"10.1016/j.biosystemseng.2025.104243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate pig body weight measurement is critical for pig growth, health, and marketing. Although there is a growing trend towards the use of computer vision approaches for pig body weight prediction, their validation with large-scale data collected in commercial environments is still limited. Therefore, the main objective of this study was to predict pig body weight collected at multiple timepoints from a commercial environment using efficient convolutional neural networks and efficient vision transformers. Top-view videos were collected from over 600 pigs at six time points over three months. Scale-based body weight records were simultaneously recorded by a digital weighing system. An automated video conversion pipeline and fine-tuned YOLOv8 were applied to preprocess the raw depth videos. Two families of lightweight deep neural networks, MobileNet and MobileViT, were initialised with the pre-trained weights from ImageNet and customised to predict pig body weight directly from depth images. Two cross-validation strategies were used: single time point random subsampling and time series forecasting with a sparse design considering limited budget scenarios. In single time point random subsampling, the best prediction mean absolute percentage error for each time point was 4.71%, 3.80%, 3.08%, 5.60%, 3.42%, and 3.77%, respectively. On average, the MobileViT-S model produced the best prediction mean absolute percentage error. In time series forecasting, although a sparse design resulted in some performance loss compared to the full design, the use of ViT models mitigated this degradation. These results suggest that efficient deep learning-based supervised learning models are a promising approach for predicting pig body weight from industry-scale depth video data.</div></div>\",\"PeriodicalId\":9173,\"journal\":{\"name\":\"Biosystems Engineering\",\"volume\":\"257 \",\"pages\":\"Article 104243\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biosystems Engineering\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1537511025001795\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURAL ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biosystems Engineering","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1537511025001795","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
Industry-scale prediction of video-derived pig body weight using efficient convolutional neural networks and vision transformers
Accurate pig body weight measurement is critical for pig growth, health, and marketing. Although there is a growing trend towards the use of computer vision approaches for pig body weight prediction, their validation with large-scale data collected in commercial environments is still limited. Therefore, the main objective of this study was to predict pig body weight collected at multiple timepoints from a commercial environment using efficient convolutional neural networks and efficient vision transformers. Top-view videos were collected from over 600 pigs at six time points over three months. Scale-based body weight records were simultaneously recorded by a digital weighing system. An automated video conversion pipeline and fine-tuned YOLOv8 were applied to preprocess the raw depth videos. Two families of lightweight deep neural networks, MobileNet and MobileViT, were initialised with the pre-trained weights from ImageNet and customised to predict pig body weight directly from depth images. Two cross-validation strategies were used: single time point random subsampling and time series forecasting with a sparse design considering limited budget scenarios. In single time point random subsampling, the best prediction mean absolute percentage error for each time point was 4.71%, 3.80%, 3.08%, 5.60%, 3.42%, and 3.77%, respectively. On average, the MobileViT-S model produced the best prediction mean absolute percentage error. In time series forecasting, although a sparse design resulted in some performance loss compared to the full design, the use of ViT models mitigated this degradation. These results suggest that efficient deep learning-based supervised learning models are a promising approach for predicting pig body weight from industry-scale depth video data.
期刊介绍:
Biosystems Engineering publishes research in engineering and the physical sciences that represent advances in understanding or modelling of the performance of biological systems for sustainable developments in land use and the environment, agriculture and amenity, bioproduction processes and the food chain. The subject matter of the journal reflects the wide range and interdisciplinary nature of research in engineering for biological systems.