基于神经网络热图的对抗样本检测方法

ADVERSARIAL SAMPLES DETECTING METHOD USING HEATMAP OF NEURAL NETWORKS

  • 摘要: 针对深度神经网络面临对抗攻击威胁的问题,提出一种基于“热图”的对抗样本检测方法。引入“热图”的概念表示处理样本时神经网络的神经活动,将原始样本转换为活动启发式热图;分别将良性样本和对抗样本生成热图,进而训练二元分类器来识别对抗样本。实验结果表明,该方法在面临针对MNIST和CIFAR-10数据集上的先进对抗攻击方法时,检测精度分别高达99.4%和93.9%。

     

    Abstract: Aiming at the problem that deep neural networks (DNN) face the threat of adversarial attacks, this paper proposes an adversarial samples detecting method based on heatmap. The idea of heatmap was proposed to represent the neural activity of a DNN when processing an input sample, and the original sample was transformed into an activity heuristic heatmap. The benign samples and adversarial samples were respectively generated to generate heat maps, and a binary classifier was trained to identify adversarial samples. The experimental results show that the detection accuracy of the proposed method is as high as 99.4% and 93.9% respectively when facing the advanced adversarial attack methods on MNIST and CIFAR-10 datasets.

     

/

返回文章
返回