pp. 4309-4321
S&M4183 Technical paper of Special Issue https://doi.org/10.18494/SAM5937 Published: September 30, 2025 Employee Work Behavior Monitoring Using Multimodal Large Language Models [PDF] Yushi Chen, Chung-Hsing Chao, Linjing Liu, and Cheng-Fu Yang (Received September 16, 2025; Accepted September 24, 2025) Keywords: multimodal large language models, employee behavior monitoring, smart office, prompt engineering, privacy protection
With the rapid advancement of artificial intelligence, enterprises increasingly demand efficient and flexible solutions for employee work behavior monitoring in office environments. Traditional systems often involve high costs, rigidity, and reliance on extensive labeled data. Multimodal large language models (MLLMs), capable of integrating information from text, images, and audio, offer a novel zero-shot inference approach that reduces data dependence and deployment complexity. In this study, we present a practical application framework combining seating area definition, image cropping, and prompt engineering to analyze employee behaviors such as focused screen engagement and nonwork-related interactions. Results are output in a standardized JavaScript Object Notation format facilitating aggregation and actionable insights for human resource management. Additionally, critical privacy and ethical and legal considerations are discussed, along with mitigation strategies to support responsible deployment. Through practical simulation scenarios and cost–benefit analysis, we demonstrate that MLLMs enable scalable and economically viable employee behavior monitoring solutions suitable for small and medium-sized enterprises.
Corresponding author: Linjing Liu and Cheng-Fu Yang![]() ![]() This work is licensed under a Creative Commons Attribution 4.0 International License. Cite this article Yushi Chen, Chung-Hsing Chao, Linjing Liu, and Cheng-Fu Yang, Employee Work Behavior Monitoring Using Multimodal Large Language Models, Sens. Mater., Vol. 37, No. 9, 2025, p. 4309-4321. |