ConVision 基准：对 CNN 和 ViT 模型进行基准测试的当代框架

AI Pub Date : 2024-07-11 DOI:10.3390/ai5030056

Shreyas Bangalore Vijayakumar, Krishna Teja Chitty-Venkata, Kanishk Arya, Arun Somani

{"title":"ConVision 基准：对 CNN 和 ViT 模型进行基准测试的当代框架","authors":"Shreyas Bangalore Vijayakumar, Krishna Teja Chitty-Venkata, Kanishk Arya, Arun Somani","doi":"10.3390/ai5030056","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown remarkable performance in computer vision tasks, including object detection and image recognition. These models have evolved significantly in architecture, efficiency, and versatility. Concurrently, deep-learning frameworks have diversified, with versions that often complicate reproducibility and unified benchmarking. We propose ConVision Benchmark, a comprehensive framework in PyTorch, to standardize the implementation and evaluation of state-of-the-art CNN and ViT models. This framework addresses common challenges such as version mismatches and inconsistent validation metrics. As a proof of concept, we performed an extensive benchmark analysis on a COVID-19 dataset, encompassing nearly 200 CNN and ViT models in which DenseNet-161 and MaxViT-Tiny achieved exceptional accuracy with a peak performance of around 95%. Although we primarily used the COVID-19 dataset for image classification, the framework is adaptable to a variety of datasets, enhancing its applicability across different domains. Our methodology includes rigorous performance evaluations, highlighting metrics such as accuracy, precision, recall, F1 score, and computational efficiency (FLOPs, MACs, CPU, and GPU latency). The ConVision Benchmark facilitates a comprehensive understanding of model efficacy, aiding researchers in deploying high-performance models for diverse applications.","PeriodicalId":503525,"journal":{"name":"AI","volume":"61 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ConVision Benchmark: A Contemporary Framework to Benchmark CNN and ViT Models\",\"authors\":\"Shreyas Bangalore Vijayakumar, Krishna Teja Chitty-Venkata, Kanishk Arya, Arun Somani\",\"doi\":\"10.3390/ai5030056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown remarkable performance in computer vision tasks, including object detection and image recognition. These models have evolved significantly in architecture, efficiency, and versatility. Concurrently, deep-learning frameworks have diversified, with versions that often complicate reproducibility and unified benchmarking. We propose ConVision Benchmark, a comprehensive framework in PyTorch, to standardize the implementation and evaluation of state-of-the-art CNN and ViT models. This framework addresses common challenges such as version mismatches and inconsistent validation metrics. As a proof of concept, we performed an extensive benchmark analysis on a COVID-19 dataset, encompassing nearly 200 CNN and ViT models in which DenseNet-161 and MaxViT-Tiny achieved exceptional accuracy with a peak performance of around 95%. Although we primarily used the COVID-19 dataset for image classification, the framework is adaptable to a variety of datasets, enhancing its applicability across different domains. Our methodology includes rigorous performance evaluations, highlighting metrics such as accuracy, precision, recall, F1 score, and computational efficiency (FLOPs, MACs, CPU, and GPU latency). The ConVision Benchmark facilitates a comprehensive understanding of model efficacy, aiding researchers in deploying high-performance models for diverse applications.\",\"PeriodicalId\":503525,\"journal\":{\"name\":\"AI\",\"volume\":\"61 9\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/ai5030056\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/ai5030056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络（CNN）和视觉变换器（ViT）在计算机视觉任务（包括物体检测和图像识别）中表现出色。这些模型在架构、效率和多功能性方面都有了长足的发展。与此同时，深度学习框架也变得多样化，其版本往往使可重复性和统一基准变得复杂。我们提出的 ConVision Benchmark 是 PyTorch 中的一个综合框架，旨在对最先进的 CNN 和 ViT 模型的实现和评估进行标准化。该框架解决了版本不匹配和验证指标不一致等常见难题。作为概念验证，我们在 COVID-19 数据集上进行了广泛的基准分析，该数据集包含近 200 个 CNN 和 ViT 模型，其中 DenseNet-161 和 MaxViT-Tiny 实现了约 95% 的峰值准确率。虽然我们主要将 COVID-19 数据集用于图像分类，但该框架可适用于各种数据集，从而增强了其在不同领域的适用性。我们的方法包括严格的性能评估，重点关注准确率、精确度、召回率、F1 分数和计算效率（FLOPs、MACs、CPU 和 GPU 延迟）等指标。ConVision 基准有助于全面了解模型的功效，帮助研究人员为各种应用部署高性能模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ConVision Benchmark: A Contemporary Framework to Benchmark CNN and ViT Models

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown remarkable performance in computer vision tasks, including object detection and image recognition. These models have evolved significantly in architecture, efficiency, and versatility. Concurrently, deep-learning frameworks have diversified, with versions that often complicate reproducibility and unified benchmarking. We propose ConVision Benchmark, a comprehensive framework in PyTorch, to standardize the implementation and evaluation of state-of-the-art CNN and ViT models. This framework addresses common challenges such as version mismatches and inconsistent validation metrics. As a proof of concept, we performed an extensive benchmark analysis on a COVID-19 dataset, encompassing nearly 200 CNN and ViT models in which DenseNet-161 and MaxViT-Tiny achieved exceptional accuracy with a peak performance of around 95%. Although we primarily used the COVID-19 dataset for image classification, the framework is adaptable to a variety of datasets, enhancing its applicability across different domains. Our methodology includes rigorous performance evaluations, highlighting metrics such as accuracy, precision, recall, F1 score, and computational efficiency (FLOPs, MACs, CPU, and GPU latency). The ConVision Benchmark facilitates a comprehensive understanding of model efficacy, aiding researchers in deploying high-performance models for diverse applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

自引率

0.00%

发文量