|
pp. 2435-2453
S&M4446 Research paper https://doi.org/10.18494/SAM5999 Published: May 12, 2026 Scalable Video Sensors for Understanding Multiperson Sports Behavior Recognition in Dynamic Scenes [PDF] Saleha Kamal, Yanfeng Wu, Nouf Abdullah Almujally, Ahmad Jalal, and Hui Liu (Received November 7, 2025; Accepted March 5, 2026) Keywords: multiperson human behavior recognition, group behavior analysis, segmentation, silhouette tracking, multihead attention, human behavior interaction, sports video classification
The surge in multiperson sports videos has increased the demand for automated behavior recognition systems capable of understanding group-level activities. Recognizing collective behaviors such as coordination in basketball or formations in volleyball remains challenging owing to occlusions, rapid motion, and long video sequences, while many existing approaches focus primarily on individual actions rather than collective understanding. In this study, we advance noncontact vision-based sensing through a computer vision pipeline that aggregates per-person cues into sequence-level descriptors for sport classification using the MultiSports dataset, comprising basketball, football, aerobics, and volleyball. The proposed framework integrates preprocessing with denoising, normalization, and motion-guided keyframe extraction; human representation using silhouette detection, tracking, and skeleton estimation; and a hybrid feature strategy combining deep learned representations with handcrafted motion and shape descriptors, which are fused and modeled to capture interaction dynamics, followed by hierarchical classification. Experimental results demonstrate an overall classification accuracy of 91.25% under 10-fold cross-validation, validating the effectiveness of the proposed approach. The main contributions include a motion-aware keyframe selection strategy for long-duration videos, a hybrid feature representation for group behavior modeling, and an efficient recognition framework, while current limitations related to reliance on RGB data and fixed viewpoints motivate future work on adaptive temporal modeling and multimodal sensing.
Corresponding author: Ahmad Jalal and Hui Liu![]() ![]() This work is licensed under a Creative Commons Attribution 4.0 International License. Cite this article Saleha Kamal, Yanfeng Wu, Nouf Abdullah Almujally, Ahmad Jalal, and Hui Liu, Scalable Video Sensors for Understanding Multiperson Sports Behavior Recognition in Dynamic Scenes, Sens. Mater., Vol. 38, No. 5, 2026, p. 2435-2453. |