S&M3020 Research Paper of Special Issue
Published: August 2, 2022
Quantitative Evaluation System for Online Meetings Based on Multimodal Microbehavior Analysis [PDF]
Chenhao Chen, Yutaka Arakawa, Ko Watanabe, and Shoya Ishimaru
(Received May 2, 2022; Accepted July 4, 2022)
Keywords: online meeting, neural network, smile detection, head pose estimation, active speaker detection
Maintaining a positive interaction is the key to a healthy and efficient meeting. Aiming to improve the quality of online meetings, we present an end-to-end neural-network-based system, named MeetingPipe, which is capable of quantitative microbehavior detection (smiling, nodding, and speaking) from recorded meeting videos. For smile detection, we build a neural network framework that consists of an 18-layer residual network for feature representation, and a self-attention layer to explore the correlation between each receptive field. To perform nodding detection, we obtain head rotation data as the key nodding feature. Then we use a gated recurrent unit followed by a squeeze-and-excitation mechanism to capture the temporal information of nodding patterns from head pitch angles. In addition, we utilize TalkNet, an active speaker detection model, which can effectively recognize active speakers from videos. Experiments demonstrate that with K-fold cross validation, the F1 scores of the smile, nodding, and speaking detection are 97.34, 81.26, and 94.90%, respectively. The processing can be accelerated with multiple GPUs due to the multithread design. The code is available at https://github.com/humanophilic/MeetingPipe.Corresponding author: Chenhao Chen
This work is licensed under a Creative Commons Attribution 4.0 International License.
Cite this article
Chenhao Chen, Yutaka Arakawa, Ko Watanabe, and Shoya Ishimaru, Quantitative Evaluation System for Online Meetings Based on Multimodal Microbehavior Analysis, Sens. Mater., Vol. 34, No. 8, 2022, p. 3017-3027.