icon-symbol-logout-darkest-grey

Analysis of Adversarial Examples

  • Date in the past
  • Tuesday, 30. July 2024, 10:00
  • Mathematikon B, room B128 (3rd floor)
    • Peter Lorenz
  • Address

    Mathematikon B
    Room B128 (3rd floor)

  • Organizer

  • Event Type

The rise of artificial intelligence (AI) has significantly impacted the field of computer vision (CV). In particular, deep learning (DL) has advanced the development of algorithms that comprehend visual data. In specific tasks, DL exhibits human capabilities and is impacting our everyday lives such as virtual assistants, entertainment or web searches. Despite of the success of visual algorithms, in this thesis we study the threat adversarial examples, which are input manipulation to let to misclassification.

The human vision system is not impaired and can classify the correct image, while for a DL classifier one pixel change is enough for misclassification. This is a misalignment between the human and CV system. Therefore, we start this work by presenting the concept of a classification model to understand how these models can be tricked by the threat model - adversarial examples.

Then, we analyze the adversarial examples in the Fourier domain, because after this transformation they can be better identified for detection. To that end, we assess different adversarial attacks on various classification models and datasets deviating from the standard benchmarks.

As a complementary approach, we developed an anti-pattern utilizing a frame-like patch (prompt) on the input image to counteract the input manipulation. Instead of merely iden­tifying and discarding adversarial inputs, this prompt neutralizes adversarial perturbations during testing.

As another detection method, we expanded the use of a characteristics of multi-dimensional data- the local intrinsic dimensionality (LID) to differentiate between benign and attacked images, improving detection rates of adversarial examples.

Recent advances in diffusion models (DMs) have significantly improved the robustness of adversarial models. Although DMs are well-known for their generative abilities, it remains unclear whether adversarial examples are part of the learned distribution of the DM. To address this gap, we propose a methodology that aims to determine whether adversarial examples are within the distribution of the learned manifold of the DM. We present an exploration of transforming adversarial images using the DM, which can reveal the attacked images.