{"title":"利用全局洗牌卷积实现轻量级食品图像识别","authors":"Guorui Sheng;Weiqing Min;Tao Yao;Jingru Song;Yancun Yang;Lili Wang;Shuqiang Jiang","doi":"10.1109/TAFE.2024.3386713","DOIUrl":null,"url":null,"abstract":"Consumer behaviors and habits in food choices impact their physical health and have implications for climate change and global warming. Efficient food image recognition can assist individuals in making more environmentally friendly and healthier dietary choices using end devices, such as smartphones. Simultaneously, it can enhance the efficiency of server-side training, thereby reducing carbon emissions. We propose a lightweight deep neural network named Global Shuffle Net (GSNet) that can efficiently recognize food images. In GSNet, we develop a novel convolution method called global shuffle convolution, which captures the dependence between long-range pixels. Merging global shuffle convolution with classic local convolution yields a framework that works as the backbone of GSNet. Through GSNet's ability to capture the dependence between long-range pixels at the start of the network, by restricting the number of layers in the middle and rear, the parameters and floating operation operations (FLOPs) can be minimized without compromising the performance, thus permitting a lightweight goal to be achieved. Experimental results on four popular food recognition datasets demonstrate that our approach achieves state-of-the-art performance with higher accuracy and fewer FLOPs and parameters. For example, in comparison to the current state-of-the-art model of MobileViTv2, GSNet achieved 87.9% accuracy of the top-1 level on the Eidgenössische Technische Hochschule Zürich (ETHZ) Food-101 dataset with 28% reduction in the parameters, 37% reduction in the FLOPs, but a 0.7% more accuracy.","PeriodicalId":100637,"journal":{"name":"IEEE Transactions on AgriFood Electronics","volume":"2 2","pages":"392-402"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lightweight Food Image Recognition With Global Shuffle Convolution\",\"authors\":\"Guorui Sheng;Weiqing Min;Tao Yao;Jingru Song;Yancun Yang;Lili Wang;Shuqiang Jiang\",\"doi\":\"10.1109/TAFE.2024.3386713\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Consumer behaviors and habits in food choices impact their physical health and have implications for climate change and global warming. Efficient food image recognition can assist individuals in making more environmentally friendly and healthier dietary choices using end devices, such as smartphones. Simultaneously, it can enhance the efficiency of server-side training, thereby reducing carbon emissions. We propose a lightweight deep neural network named Global Shuffle Net (GSNet) that can efficiently recognize food images. In GSNet, we develop a novel convolution method called global shuffle convolution, which captures the dependence between long-range pixels. Merging global shuffle convolution with classic local convolution yields a framework that works as the backbone of GSNet. Through GSNet's ability to capture the dependence between long-range pixels at the start of the network, by restricting the number of layers in the middle and rear, the parameters and floating operation operations (FLOPs) can be minimized without compromising the performance, thus permitting a lightweight goal to be achieved. Experimental results on four popular food recognition datasets demonstrate that our approach achieves state-of-the-art performance with higher accuracy and fewer FLOPs and parameters. For example, in comparison to the current state-of-the-art model of MobileViTv2, GSNet achieved 87.9% accuracy of the top-1 level on the Eidgenössische Technische Hochschule Zürich (ETHZ) Food-101 dataset with 28% reduction in the parameters, 37% reduction in the FLOPs, but a 0.7% more accuracy.\",\"PeriodicalId\":100637,\"journal\":{\"name\":\"IEEE Transactions on AgriFood Electronics\",\"volume\":\"2 2\",\"pages\":\"392-402\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on AgriFood Electronics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10517765/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on AgriFood Electronics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10517765/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Lightweight Food Image Recognition With Global Shuffle Convolution
Consumer behaviors and habits in food choices impact their physical health and have implications for climate change and global warming. Efficient food image recognition can assist individuals in making more environmentally friendly and healthier dietary choices using end devices, such as smartphones. Simultaneously, it can enhance the efficiency of server-side training, thereby reducing carbon emissions. We propose a lightweight deep neural network named Global Shuffle Net (GSNet) that can efficiently recognize food images. In GSNet, we develop a novel convolution method called global shuffle convolution, which captures the dependence between long-range pixels. Merging global shuffle convolution with classic local convolution yields a framework that works as the backbone of GSNet. Through GSNet's ability to capture the dependence between long-range pixels at the start of the network, by restricting the number of layers in the middle and rear, the parameters and floating operation operations (FLOPs) can be minimized without compromising the performance, thus permitting a lightweight goal to be achieved. Experimental results on four popular food recognition datasets demonstrate that our approach achieves state-of-the-art performance with higher accuracy and fewer FLOPs and parameters. For example, in comparison to the current state-of-the-art model of MobileViTv2, GSNet achieved 87.9% accuracy of the top-1 level on the Eidgenössische Technische Hochschule Zürich (ETHZ) Food-101 dataset with 28% reduction in the parameters, 37% reduction in the FLOPs, but a 0.7% more accuracy.