|
pp. 2723-2738
S&M4465 Research paper https://doi.org/10.18494/SAM6144 Published: May 22, 2026 Taiwanese Sign Language Recognition and Natural Sentence Generation System Based on Spatiotemporal Graph Convolutional Networks and Distilled Bidirectional Encoder Representations from Transformers [PDF] Neng-Sheng Pai, Li-An Weng, Pi-Yun Chen, and Lian-Sheng Hong (Received December 22, 2025; Accepted April 14, 2026) Keywords: sign language recognition, natural sentence generation, MediaPipe, ST-GCN, DistilBERT
We present a Taiwanese Sign Language (TSL) recognition and natural sentence generation system that focuses on continuous sign language recognition, in contrast to most existing approaches that primarily address isolated sign recognition. The proposed system integrates a spatiotemporal graph convolutional network (ST-GCN) with a distilled bidirectional encoder representations from transformers (DistilBERT)-based language generation model, with the aim of reducing communication barriers for the deaf and hard-of-hearing community. First, a camera sensor is used to capture sign language videos. MediaPipe is then utilized to extract human body key points from sign language video sequences. These spatiotemporal key point representations are subsequently processed by the ST-GCN model to perform sign recognition. Finally, the recognized sign sequences are translated into fluent and natural sentences using a fine-tuned DistilBERT model. Experimental evaluations are conducted on a self-collected dataset consisting of 42 classes of TSL videos, along with a frame sampling analysis. The results indicate that uniformly sampling video sequences to 70 frames yields the best recognition performance for the ST-GCN model. For sentence generation, 24 predefined Chinese sentence templates are employed to fine-tune the DistilBERT model. Experimental results indicate that the proposed method can achieve accurate and natural sentence generation under low-resource training conditions. Overall, the proposed system exhibits strong performance in terms of lightweight model architecture, robust gesture recognition accuracy, and natural language generation quality, thereby validating its effectiveness and feasibility for continuous sign language translation and language generation tasks.
Corresponding author: Pi-Yun Chen![]() ![]() This work is licensed under a Creative Commons Attribution 4.0 International License. Cite this article Neng-Sheng Pai, Li-An Weng, Pi-Yun Chen, and Lian-Sheng Hong, Taiwanese Sign Language Recognition and Natural Sentence Generation System Based on Spatiotemporal Graph Convolutional Networks and Distilled Bidirectional Encoder Representations from Transformers, Sens. Mater., Vol. 38, No. 5, 2026, p. 2723-2738. |