Object detection and activity recognition in video surveillance using neural networks

IF 2.5 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Web Information Systems Pub Date : 2023-04-20 DOI:10.1108/ijwis-01-2023-0006

Vishva Payghode, Ayush Goyal, Anupama Bhan, S. Iyer, Ashwani Kumar Dubey

{"title":"Object detection and activity recognition in video surveillance using neural networks","authors":"Vishva Payghode, Ayush Goyal, Anupama Bhan, S. Iyer, Ashwani Kumar Dubey","doi":"10.1108/ijwis-01-2023-0006","DOIUrl":null,"url":null,"abstract":"\nPurpose\nThis paper aims to implement and extend the You Only Live Once (YOLO) algorithm for detection of objects and activities. The advantage of YOLO is that it only runs a neural network once to detect the objects in an image, which is why it is powerful and fast. Cameras are found at many different crossroads and locations, but video processing of the feed through an object detection algorithm allows determining and tracking what is captured. Video Surveillance has many applications such as Car Tracking and tracking of people related to crime prevention. This paper provides exhaustive comparison between the existing methods and proposed method. Proposed method is found to have highest object detection accuracy.\n\n\nDesign/methodology/approach\nThe goal of this research is to develop a deep learning framework to automate the task of analyzing video footage through object detection in images. This framework processes video feed or image frames from CCTV, webcam or a DroidCam, which allows the camera in a mobile phone to be used as a webcam for a laptop. The object detection algorithm, with its model trained on a large data set of images, is able to load in each image given as an input, process the image and determine the categories of the matching objects that it finds. As a proof of concept, this research demonstrates the algorithm on images of several different objects. This research implements and extends the YOLO algorithm for detection of objects and activities. The advantage of YOLO is that it only runs a neural network once to detect the objects in an image, which is why it is powerful and fast. Cameras are found at many different crossroads and locations, but video processing of the feed through an object detection algorithm allows determining and tracking what is captured. For video surveillance of traffic cameras, this has many applications, such as car tracking and person tracking for crime prevention. In this research, the implemented algorithm with the proposed methodology is compared against several different prior existing methods in literature. The proposed method was found to have the highest object detection accuracy for object detection and activity recognition, better than other existing methods.\n\n\nFindings\nThe results indicate that the proposed deep learning–based model can be implemented in real-time for object detection and activity recognition. The added features of car crash detection, fall detection and social distancing detection can be used to implement a real-time video surveillance system that can help save lives and protect people. Such a real-time video surveillance system could be installed at street and traffic cameras and in CCTV systems. When this system would detect a car crash or a fatal human or pedestrian fall with injury, it can be programmed to send automatic messages to the nearest local police, emergency and fire stations. When this system would detect a social distancing violation, it can be programmed to inform the local authorities or sound an alarm with a warning message to alert the public to maintain their distance and avoid spreading their aerosol particles that may cause the spread of viruses, including the COVID-19 virus.\n\n\nOriginality/value\nThis paper proposes an improved and augmented version of the YOLOv3 model that has been extended to perform activity recognition, such as car crash detection, human fall detection and social distancing detection. The proposed model is based on a deep learning convolutional neural network model used to detect objects in images. The model is trained using the widely used and publicly available Common Objects in Context data set. The proposed model, being an extension of YOLO, can be implemented for real-time object and activity recognition. The proposed model had higher accuracies for both large-scale and all-scale object detection. This proposed model also exceeded all the other previous methods that were compared in extending and augmenting the object detection to activity recognition. The proposed model resulted in the highest accuracy for car crash detection, fall detection and social distancing detection.\n","PeriodicalId":44153,"journal":{"name":"International Journal of Web Information Systems","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Web Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijwis-01-2023-0006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 2

Abstract

Purpose This paper aims to implement and extend the You Only Live Once (YOLO) algorithm for detection of objects and activities. The advantage of YOLO is that it only runs a neural network once to detect the objects in an image, which is why it is powerful and fast. Cameras are found at many different crossroads and locations, but video processing of the feed through an object detection algorithm allows determining and tracking what is captured. Video Surveillance has many applications such as Car Tracking and tracking of people related to crime prevention. This paper provides exhaustive comparison between the existing methods and proposed method. Proposed method is found to have highest object detection accuracy. Design/methodology/approach The goal of this research is to develop a deep learning framework to automate the task of analyzing video footage through object detection in images. This framework processes video feed or image frames from CCTV, webcam or a DroidCam, which allows the camera in a mobile phone to be used as a webcam for a laptop. The object detection algorithm, with its model trained on a large data set of images, is able to load in each image given as an input, process the image and determine the categories of the matching objects that it finds. As a proof of concept, this research demonstrates the algorithm on images of several different objects. This research implements and extends the YOLO algorithm for detection of objects and activities. The advantage of YOLO is that it only runs a neural network once to detect the objects in an image, which is why it is powerful and fast. Cameras are found at many different crossroads and locations, but video processing of the feed through an object detection algorithm allows determining and tracking what is captured. For video surveillance of traffic cameras, this has many applications, such as car tracking and person tracking for crime prevention. In this research, the implemented algorithm with the proposed methodology is compared against several different prior existing methods in literature. The proposed method was found to have the highest object detection accuracy for object detection and activity recognition, better than other existing methods. Findings The results indicate that the proposed deep learning–based model can be implemented in real-time for object detection and activity recognition. The added features of car crash detection, fall detection and social distancing detection can be used to implement a real-time video surveillance system that can help save lives and protect people. Such a real-time video surveillance system could be installed at street and traffic cameras and in CCTV systems. When this system would detect a car crash or a fatal human or pedestrian fall with injury, it can be programmed to send automatic messages to the nearest local police, emergency and fire stations. When this system would detect a social distancing violation, it can be programmed to inform the local authorities or sound an alarm with a warning message to alert the public to maintain their distance and avoid spreading their aerosol particles that may cause the spread of viruses, including the COVID-19 virus. Originality/value This paper proposes an improved and augmented version of the YOLOv3 model that has been extended to perform activity recognition, such as car crash detection, human fall detection and social distancing detection. The proposed model is based on a deep learning convolutional neural network model used to detect objects in images. The model is trained using the widely used and publicly available Common Objects in Context data set. The proposed model, being an extension of YOLO, can be implemented for real-time object and activity recognition. The proposed model had higher accuracies for both large-scale and all-scale object detection. This proposed model also exceeded all the other previous methods that were compared in extending and augmenting the object detection to activity recognition. The proposed model resulted in the highest accuracy for car crash detection, fall detection and social distancing detection.

查看原文本刊更多论文

基于神经网络的视频监控目标检测与活动识别

本文旨在实现和扩展“你只活一次”(You Only Live Once, YOLO)算法，用于检测物体和活动。YOLO的优点是它只运行一次神经网络来检测图像中的物体，这就是为什么它强大而快速的原因。在许多不同的十字路口和地点都可以找到摄像头，但通过物体检测算法对馈送的视频进行处理，可以确定和跟踪捕获的内容。视频监控有许多应用，如汽车跟踪和与预防犯罪有关的人员跟踪。本文对现有方法和提出的方法进行了详尽的比较。结果表明，该方法具有较高的目标检测精度。设计/方法/方法本研究的目标是开发一个深度学习框架，通过图像中的目标检测来自动分析视频片段。这个框架处理来自闭路电视、网络摄像头或DroidCam的视频或图像帧，这使得手机中的摄像头可以用作笔记本电脑的网络摄像头。物体检测算法的模型是在一个大的图像数据集上训练的，它能够加载作为输入的每个图像，处理图像并确定它找到的匹配物体的类别。作为概念验证，本研究在多个不同物体的图像上演示了该算法。本研究实现并扩展了用于目标和活动检测的YOLO算法。YOLO的优点是它只运行一次神经网络来检测图像中的物体，这就是为什么它强大而快速的原因。在许多不同的十字路口和地点都可以找到摄像头，但通过物体检测算法对馈送的视频进行处理，可以确定和跟踪捕获的内容。对于交通摄像机的视频监控，这有许多应用，如车辆跟踪和预防犯罪的人员跟踪。在本研究中，采用所提出的方法实现的算法与文献中几种不同的现有方法进行了比较。结果表明，该方法在目标检测和活动识别方面具有最高的目标检测精度，优于现有方法。结果表明，所提出的基于深度学习的模型可以实时实现目标检测和活动识别。增加的碰撞检测、坠落检测和社交距离检测功能可用于实现实时视频监控系统，有助于挽救生命和保护人民。这种实时录像监视系统可以安装在街道和交通摄影机以及闭路电视系统中。当这个系统检测到车祸或致命的人或行人摔倒受伤时，它可以被编程为向最近的当地警察局、急救站和消防站发送自动信息。当该系统发现违反社交距离的行为时，可以通过编程通知地方当局或发出警报，提醒公众保持距离，避免传播可能导致新冠病毒等病毒传播的气溶胶颗粒。原创性/价值本文提出了YOLOv3模型的改进和增强版本，该模型已扩展到进行活动识别，例如汽车碰撞检测，人体跌倒检测和社交距离检测。该模型基于深度学习卷积神经网络模型，用于检测图像中的物体。该模型使用广泛使用和公开可用的上下文公共对象数据集进行训练。该模型是YOLO的扩展，可以实现实时目标和活动识别。该模型在大尺度和全尺度目标检测中均具有较高的精度。该模型在将目标检测扩展和增强到活动识别方面也超越了以往所有比较的方法。该模型在汽车碰撞检测、跌倒检测和社交距离检测中具有最高的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Web Information Systems COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

4.60

自引率

0.00%

发文量

期刊介绍： The Global Information Infrastructure is a daily reality. In spite of the many applications in all domains of our societies: e-business, e-commerce, e-learning, e-science, and e-government, for instance, and in spite of the tremendous advances by engineers and scientists, the seamless development of Web information systems and services remains a major challenge. The journal examines how current shared vision for the future is one of semantically-rich information and service oriented architecture for global information systems. This vision is at the convergence of progress in technologies such as XML, Web services, RDF, OWL, of multimedia, multimodal, and multilingual information retrieval, and of distributed, mobile and ubiquitous computing. Topicality While the International Journal of Web Information Systems covers a broad range of topics, the journal welcomes papers that provide a perspective on all aspects of Web information systems: Web semantics and Web dynamics, Web mining and searching, Web databases and Web data integration, Web-based commerce and e-business, Web collaboration and distributed computing, Internet computing and networks, performance of Web applications, and Web multimedia services and Web-based education.