Abstract:
To improve computers’ end-to-end emotion recognition capability, an improved spatiotemporal convolutional neural network called ESTNet is proposed. The proposed ESTNet consisted of four modules: kernel attention module, spatial learning module, temporal learning module and fusion module. The size of the kernel was designed based on the sampling frequency of the EEG signal. Spatial learning module utilized Transformer model and graph neural network to decode the temporal and spatial domains of the EEG signal, and convolutional neural network was used to fuse spatiotemporal features. The experimental results on the DEAP public dataset show that ESTNet outperforms current mainstream networks under the valence label. In addition, the EEG signals are visualized with topographic map in order to find the correlation between subjective emotional state and objective biological facts.