Adversarial Attacks and Defenses

📖

pojęcia

Poisoning Attack

A strategy where the attacker injects malicious data into the training set to degrade the model's performance or create a backdoor.

📖

pojęcia

Model Extraction Attack

An attack aimed at stealing the parameters or functionality of a proprietary model by querying its API and using the responses to retrain a substitute model.

📖

pojęcia

Membership Inference Attack

A privacy attack that determines whether a specific data record was used in a model's training set, compromising data confidentiality.

📖

pojęcia

Adversarial Examples

Inputs, often imperceptibly modified, that are designed to fool a machine learning model and cause incorrect classification.

📖

pojęcia

Adversarial Robustness

The ability of a machine learning model to resist adversarial attacks, i.e., to maintain its performance against inputs intentionally designed to fool it.

📖

pojęcia

Adversarial Training

A regularization technique where the model is trained on dynamically generated adversarial examples to improve its robustness against future attacks.

📖

pojęcia

Targeted Attack

A type of adversarial attack where the attacker seeks not only to cause misclassification, but to make the model predict a specific incorrect class.

📖

pojęcia

Untargeted Attack

An adversarial attack that simply aims to cause incorrect classification, regardless of the incorrect class predicted by the model.

📖

pojęcia

Black-Box Attack

An attack conducted without knowledge of the model's internal architecture, parameters, or weights, based solely on its API's inputs/outputs.

📖

pojęcia

White-Box Attack

An attack where the adversary has complete knowledge of the model's architecture, its weights, and its training procedure, allowing for more precise attacks.

📖

pojęcia

Replay Attack

An attack where the adversary records legitimate communications (e.g., queries to a model) and replays them later to obtain an unauthorized response or manipulate the system.

📖

pojęcia

Sign Method Attack

An attack method effective in black-box scenarios that uses only the sign of the loss gradient with respect to the input to generate adversarial examples.

📖

pojęcia

Randomization Defense

A defense technique that introduces randomness into the model or input data (e.g., noise, random transformations) to disrupt the attacker's gradient computation.

📖

pojęcia

Defensive Distillation

A defense method where a model is trained to mimic the output probabilities (soft probabilities) of a pre-trained model, making the decision surface smoother and less sensitive to attacks.

📖

pojęcia

Universal Adversarial Perturbations Attack

An attack aimed at finding a single perturbation (image or noise) that can fool a model across a wide range of inputs, regardless of their specific content.

📖

pojęcia

Formal Robustness Verification

The application of rigorous mathematical methods to formally prove that a model is robust against all adversarial perturbations within a defined set.

Słownik AI

Poisoning Attack

Model Extraction Attack

Membership Inference Attack

Adversarial Examples

Adversarial Robustness

Adversarial Training

Targeted Attack

Untargeted Attack

Black-Box Attack

White-Box Attack

Replay Attack

Sign Method Attack

Randomization Defense

Defensive Distillation

Universal Adversarial Perturbations Attack

Formal Robustness Verification

Nie znaleziono wyników