Prototype Guided Backdoor Defense

1️⃣CVIT, KCIS, IIIT Hyderabad 2️⃣Amazon Research, India

Abstract

Deep learning models are susceptible to a variety of ‘backdoor attacks’ involving a malicious attacker perturbing a small subset of training data with a trigger that causes misclassification into a target class. Various triggers have been employed effectively, including semantic triggers, which are easily realizable in real world without requiring attacker to manipulate the image. Robustness in defense across all types of triggers is a crucial and unsolved problem. To this end, we propose Prototype Guided Backdoor Defense (PGBD), a robust post-hoc defense that scales across all trigger types, including previously unsolved semantic triggers. PGBD uses a novel sanitization loss in a fine-tuning step for defense. PGBD permits easy adaptation utilizing available attack knowledge. We observe better or on-par performance across all settings. We also present a new semantic attack based on the occluded celebrity faces dataset as a new benchmark and defend it successfully.



Our main contributions

A robust and scalable backdoor defense that achieves state of the are results across all attack variations.

Defense configurable to all types of attack scenarios and defense settings by leveraging the geometric relations in the activation space of backdoored models.

A new public semantic attack dataset consisting of completely real image based attacks, and sythetically modified image based attacks. And a first time successfull defense of such attacks.


Results