Abstract:
Aiming at the problem that deep neural networks (DNN) face the threat of adversarial attacks, this paper proposes an adversarial samples detecting method based on heatmap. The idea of heatmap was proposed to represent the neural activity of a DNN when processing an input sample, and the original sample was transformed into an activity heuristic heatmap. The benign samples and adversarial samples were respectively generated to generate heat maps, and a binary classifier was trained to identify adversarial samples. The experimental results show that the detection accuracy of the proposed method is as high as 99.4% and 93.9% respectively when facing the advanced adversarial attack methods on MNIST and CIFAR-10 datasets.