Abstract:
Aimed at the problems of poor feature fusion ability, weak correlation of time-series information and unclear behavior boundary in the existing human behavior detection methods, a human behavior detection method based on spatio-temporal interactive network is proposed. The dual flow feature extraction module was redesigned, and a connection layer was added between the two networks of spatial flow and spatio-temporal flow. The improved spatial transformation network and visual attention model were introduced into spatial flow and temporal flow networks respectively. A feature fusion module based on pixel filter was designed to calculate the correlation of time series information in key areas and aggregate two kinds of features with different dimensions. The loss function of the network was optimized. Experimental results on AVA dataset show that this method has advantages on detection accuracy, speed and generalization ability.