pp. 729-743
S&M3557 Research Paper of Special Issue https://doi.org/10.18494/SAM4788 Published: February 29, 2024 A Smart Assembly Line Design Using Human–Robot Collaborations with Operator Gesture Recognition by Decision Fusion of Deep Learning Channels of Three Image Sensing Modalities from RGB-D Devices [PDF] Ing-Jr Ding and Ya-Cheng Juang (Received September 3, 2023; Accepted February 19, 2024) Keywords: assembly line, human–robot collaboration, RGB-D image sensor, operator gesture recognition, deep learning, decision fusion
Machine vision with image sensors has been employed in smart manufacturing such as the popular automatic optical inspection (AOI) by deploying an image acquisition camera to optically scan the target device for quality defects. With the rapid progress of image sensor techniques, the RGB-D image sensor device that can capture operator assembly gesture actions to make intelligent interactions between a robot and an operator has been developed. In this study, we propose a smart assembly-line design for intelligent manufacturing or factory applications where a working mode of human–robot collaboration (HRC) will be incorporated. In the proposed HRC assembly line, the operator and manipulator (robotic arm) will co-work with each other where the appropriately deployed RGB-D image device (the well-known Intel RealSense camera in this work) is used to acquire assembly gesture data of the operator to further perform operator gesture recognition. The manipulator will then perform the corresponding feedback action according to the recognized operation gesture (e.g., grabbing the scissors and then moving to the operator if the gesture of winding the tape is recognized). For operator gesture recognition, we first construct three different sensing modalities of deep learning recognition channels, which are the RGB convolution neural network (CNN)-long short-term memory (LSTM) channel with RGB gesture image inputs, the depth CNN-LSTM channel with depth gesture image inputs, and the 3D-(x, y, z) LSTM raw channel with skeleton raw data inputs. A decision fusion scheme is then developed for hybridizations of recognition decision outputs of these three separated deep learning gesture recognition channels with different gesture sensing modalities. In this work, various weight combination strategies to achieve the decision fusion of three deep learning recognition channels are used to evaluate the effectiveness of operator gesture recognition. Experiments on classifications of ten categories of operator assembly gestures show that the half-quarter-quarter strategy with the setting of (wRGB, wDepth, w3D) = (0.5, 0.25, 0.25) for weight allocations of channel decisions can achieve the highest recognition accuracy.
Corresponding author: Ing-Jr DingThis work is licensed under a Creative Commons Attribution 4.0 International License. Cite this article Ing-Jr Ding and Ya-Cheng Juang, A Smart Assembly Line Design Using Human–Robot Collaborations with Operator Gesture Recognition by Decision Fusion of Deep Learning Channels of Three Image Sensing Modalities from RGB-D Devices, Sens. Mater., Vol. 36, No. 2, 2024, p. 729-743. |