Augmented reality and deep learning based system for assisting assembly process

IF 2.1 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal on Multimodal User Interfaces Pub Date : 2023-12-14 DOI:10.1007/s12193-023-00428-3

{"title":"Augmented reality and deep learning based system for assisting assembly process","authors":"","doi":"10.1007/s12193-023-00428-3","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>In Industry 4.0, manufacturing entails a rapid change in customer demands which leads to mass customization. The variation in customer requirements leads to small batch sizes and several process variations. Assembly task is one of most important steps in any manufacturing process. A factory floor worker often needs a guidance system due to variations in product or process, to assist them in assembly task. Existing Augmented Reality (AR) based systems use markers for each assembly component for detection which is time consuming and laborious. This paper proposed utilizing state-of-the-art deep learning based object detection technique and employed a regression based mapping technique to obtain the 3D locations of assembly components. Automatic detection of machine parts was followed by a multimodal interface involving both eye gaze and hand tracking to guide the manual assembly process. We proposed eye cursor to guide the user through the task and utilized fingertip distances along with object sizes to detect any error committed during the task. We analyzed the proposed mapping method and found that the mean mapping error was 1.842 cm. We also investigated the effectiveness of the proposed multimodal user interface by conducting two user studies. The first study indicated that the current interface design with eye cursor enabled participants to perform the task significantly faster compared to the interface without eye cursor. The shop floor workers during the second user study reported that the proposed guidance system was comprehendible and easy to use to complete the assembly task. Results showed that the proposed guidance system enabled 11 end users to finish the assembly of one pneumatic cylinder within 55 s with average TLX score less than 25 in a scale of 100 and Cronbach alpha score of 0.8 indicating convergence of learning experience.</p>","PeriodicalId":17529,"journal":{"name":"Journal on Multimodal User Interfaces","volume":"9 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal on Multimodal User Interfaces","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12193-023-00428-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In Industry 4.0, manufacturing entails a rapid change in customer demands which leads to mass customization. The variation in customer requirements leads to small batch sizes and several process variations. Assembly task is one of most important steps in any manufacturing process. A factory floor worker often needs a guidance system due to variations in product or process, to assist them in assembly task. Existing Augmented Reality (AR) based systems use markers for each assembly component for detection which is time consuming and laborious. This paper proposed utilizing state-of-the-art deep learning based object detection technique and employed a regression based mapping technique to obtain the 3D locations of assembly components. Automatic detection of machine parts was followed by a multimodal interface involving both eye gaze and hand tracking to guide the manual assembly process. We proposed eye cursor to guide the user through the task and utilized fingertip distances along with object sizes to detect any error committed during the task. We analyzed the proposed mapping method and found that the mean mapping error was 1.842 cm. We also investigated the effectiveness of the proposed multimodal user interface by conducting two user studies. The first study indicated that the current interface design with eye cursor enabled participants to perform the task significantly faster compared to the interface without eye cursor. The shop floor workers during the second user study reported that the proposed guidance system was comprehendible and easy to use to complete the assembly task. Results showed that the proposed guidance system enabled 11 end users to finish the assembly of one pneumatic cylinder within 55 s with average TLX score less than 25 in a scale of 100 and Cronbach alpha score of 0.8 indicating convergence of learning experience.

查看原文本刊更多论文

基于增强现实和深度学习的装配辅助系统

摘要在 "工业 4.0 "时代，生产过程中客户需求的快速变化导致了大规模定制。客户需求的变化导致小批量生产和多种工艺变化。装配任务是所有制造流程中最重要的步骤之一。由于产品或流程的变化，工厂工人往往需要一个指导系统来协助他们完成装配任务。现有的基于增强现实（AR）的系统使用标记来检测每个装配组件，既费时又费力。本文提出利用最先进的基于深度学习的物体检测技术，并采用基于回归的映射技术来获取装配组件的三维位置。在自动检测机器零件后，我们采用了多模态界面（包括眼球注视和手部跟踪）来指导手动装配过程。我们建议使用眼球光标来引导用户完成任务，并利用指尖距离和物体尺寸来检测任务中的任何错误。我们分析了提议的映射方法，发现平均映射误差为 1.842 厘米。我们还通过两项用户研究调查了所提议的多模态用户界面的有效性。第一项研究表明，与不带眼球光标的界面相比，目前带有眼球光标的界面设计能让参与者更快地完成任务。在第二项用户研究中，车间工人表示，建议的引导系统易于理解和使用，可以轻松完成装配任务。结果显示，拟议的引导系统使 11 名最终用户在 55 秒内完成了一个气缸的组装，在 100 分的评分中，TLX 平均得分低于 25 分，Cronbach alpha 得分为 0.8，表明学习经验趋同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal on Multimodal User Interfaces COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

6.90

自引率

3.40%

发文量

审稿时长

>12 weeks

期刊介绍： The Journal of Multimodal User Interfaces publishes work in the design, implementation and evaluation of multimodal interfaces. Research in the domain of multimodal interaction is by its very essence a multidisciplinary area involving several fields including signal processing, human-machine interaction, computer science, cognitive science and ergonomics. This journal focuses on multimodal interfaces involving advanced modalities, several modalities and their fusion, user-centric design, usability and architectural considerations. Use cases and descriptions of specific application areas are welcome including for example e-learning, assistance, serious games, affective and social computing, interaction with avatars and robots.