{"title":"CattleDiT: A Distillation-Driven Transformer for Cattle Identification","authors":"Niraj Kumar;Sanjay Kumar Singh","doi":"10.1109/TBIOM.2025.3565516","DOIUrl":null,"url":null,"abstract":"Rising standards for biosecurity, disease prevention, and livestock tracing are driving the need for an efficient identification system within the livestock supply chain. Traditional methods for cattle identification are invasive and unreliable due to issues like fraud, theft, and duplication. While deep learning-based methods, particularly Vision Transformers (ViTs), have demonstrated superior accuracy compared to traditional Convolutional Neural Networks (CNNs), but they require significantly larger datasets for training and have high computational demands. To address the challenges of large data requirements and to achieve faster convergence with fewer parameters, this paper proposes a novel distillation-based transformer approach for cattle identification. In this paper, we extract the muzzle region from a publicly available front-face cattle image dataset containing 300 cattle-face data and perform a distillation process to ensure that the student transformer model effectively learns from the teacher model through a proposed Adaptive Stochastic Depth mechanism. The teacher model, based on a lightweight custom convolutional network, extracts key features, which are then used to train the student Vision Transformer model, named CattleDiT. This approach reduces the data requirements and computational complexity of the ViT while maintaining high accuracy. The proposed model outperforms conventional ViT models and other state-of-the-art methods, achieving 99.81% accuracy on the training set and 96.67% on the test set. Additionally, several Explainable AI methods are employed to enhance interpretability of the prediction results.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 4","pages":"824-836"},"PeriodicalIF":5.0000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10979917/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Rising standards for biosecurity, disease prevention, and livestock tracing are driving the need for an efficient identification system within the livestock supply chain. Traditional methods for cattle identification are invasive and unreliable due to issues like fraud, theft, and duplication. While deep learning-based methods, particularly Vision Transformers (ViTs), have demonstrated superior accuracy compared to traditional Convolutional Neural Networks (CNNs), but they require significantly larger datasets for training and have high computational demands. To address the challenges of large data requirements and to achieve faster convergence with fewer parameters, this paper proposes a novel distillation-based transformer approach for cattle identification. In this paper, we extract the muzzle region from a publicly available front-face cattle image dataset containing 300 cattle-face data and perform a distillation process to ensure that the student transformer model effectively learns from the teacher model through a proposed Adaptive Stochastic Depth mechanism. The teacher model, based on a lightweight custom convolutional network, extracts key features, which are then used to train the student Vision Transformer model, named CattleDiT. This approach reduces the data requirements and computational complexity of the ViT while maintaining high accuracy. The proposed model outperforms conventional ViT models and other state-of-the-art methods, achieving 99.81% accuracy on the training set and 96.67% on the test set. Additionally, several Explainable AI methods are employed to enhance interpretability of the prediction results.