{"title":"Task recognition integrating worker actions and machine operations: A video-based sensing approach without physical sensors","authors":"Shotaro Kataoka , Masashi Oba , Hirofumi Nonaka","doi":"10.1016/j.engappai.2025.110232","DOIUrl":null,"url":null,"abstract":"<div><div>Automating work process analysis is crucial in manufacturing to improve efficiency and productivity. However, traditional deep learning methods often fail to capture subtle temporal changes in machine operations, such as varying speeds. We propose a cost-effective approach called pseudo-sensing, which simulates sensor data by measuring machine speeds directly from video using wavelet transformation, a mathematical tool for time-frequency analysis. This approach eliminates the need for physical sensors.</div><div>We evaluated pseudo-sensing by integrating it into two task classification models. The first is a convolutional neural network-long short-term memory (CNN-LSTM) model, which extracts spatial features via a CNN and learns temporal patterns using an LSTM. The second is a three-dimensional residual network (3D ResNet, R3D), designed to process spatiotemporal data simultaneously. With pseudo-sensing, the CNN-LSTM’s micro-F1 score—an accuracy metric averaging precision and recall across all classes—improved from 0.712 to 0.736 (+2.4 points), while R3D’s score rose from 0.675 to 0.701 (+2.7 points).</div><div>To assess general applicability, we tested pseudo-sensing on another dataset featuring diverse machine motions: unidirectional movements (e.g., conveyor belts), oscillatory movements (e.g., pendulum-like motions), rotational movements (e.g., rotary presses), and intermittent movements (e.g., blinking or toggling mechanisms). The method achieved an 83% success rate in identifying machine dynamics.</div><div>By leveraging deep learning, this method integrates video-based machine operation sensing with task recognition, considering both human actions and machine states. Eliminating additional sensors while enhancing accuracy and efficiency, pseudo-sensing offers broad potential for advancing manufacturing process analysis.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"147 ","pages":"Article 110232"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625002325","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Automating work process analysis is crucial in manufacturing to improve efficiency and productivity. However, traditional deep learning methods often fail to capture subtle temporal changes in machine operations, such as varying speeds. We propose a cost-effective approach called pseudo-sensing, which simulates sensor data by measuring machine speeds directly from video using wavelet transformation, a mathematical tool for time-frequency analysis. This approach eliminates the need for physical sensors.
We evaluated pseudo-sensing by integrating it into two task classification models. The first is a convolutional neural network-long short-term memory (CNN-LSTM) model, which extracts spatial features via a CNN and learns temporal patterns using an LSTM. The second is a three-dimensional residual network (3D ResNet, R3D), designed to process spatiotemporal data simultaneously. With pseudo-sensing, the CNN-LSTM’s micro-F1 score—an accuracy metric averaging precision and recall across all classes—improved from 0.712 to 0.736 (+2.4 points), while R3D’s score rose from 0.675 to 0.701 (+2.7 points).
To assess general applicability, we tested pseudo-sensing on another dataset featuring diverse machine motions: unidirectional movements (e.g., conveyor belts), oscillatory movements (e.g., pendulum-like motions), rotational movements (e.g., rotary presses), and intermittent movements (e.g., blinking or toggling mechanisms). The method achieved an 83% success rate in identifying machine dynamics.
By leveraging deep learning, this method integrates video-based machine operation sensing with task recognition, considering both human actions and machine states. Eliminating additional sensors while enhancing accuracy and efficiency, pseudo-sensing offers broad potential for advancing manufacturing process analysis.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.