Słownik AI
Kompletny słownik sztucznej inteligencji
Poisoning Attack
A strategy where the attacker injects malicious data into the training set to degrade the model's performance or create a backdoor.
Model Extraction Attack
An attack aimed at stealing the parameters or functionality of a proprietary model by querying its API and using the responses to retrain a substitute model.
Membership Inference Attack
A privacy attack that determines whether a specific data record was used in a model's training set, compromising data confidentiality.
Adversarial Examples
Inputs, often imperceptibly modified, that are designed to fool a machine learning model and cause incorrect classification.
Adversarial Robustness
The ability of a machine learning model to resist adversarial attacks, i.e., to maintain its performance against inputs intentionally designed to fool it.
Adversarial Training
A regularization technique where the model is trained on dynamically generated adversarial examples to improve its robustness against future attacks.
Targeted Attack
A type of adversarial attack where the attacker seeks not only to cause misclassification, but to make the model predict a specific incorrect class.
Untargeted Attack
An adversarial attack that simply aims to cause incorrect classification, regardless of the incorrect class predicted by the model.
Black-Box Attack
An attack conducted without knowledge of the model's internal architecture, parameters, or weights, based solely on its API's inputs/outputs.
White-Box Attack
An attack where the adversary has complete knowledge of the model's architecture, its weights, and its training procedure, allowing for more precise attacks.
Replay Attack
An attack where the adversary records legitimate communications (e.g., queries to a model) and replays them later to obtain an unauthorized response or manipulate the system.
Sign Method Attack
An attack method effective in black-box scenarios that uses only the sign of the loss gradient with respect to the input to generate adversarial examples.
Randomization Defense
A defense technique that introduces randomness into the model or input data (e.g., noise, random transformations) to disrupt the attacker's gradient computation.
Defensive Distillation
A defense method where a model is trained to mimic the output probabilities (soft probabilities) of a pre-trained model, making the decision surface smoother and less sensitive to attacks.
Universal Adversarial Perturbations Attack
An attack aimed at finding a single perturbation (image or noise) that can fool a model across a wide range of inputs, regardless of their specific content.
Formal Robustness Verification
The application of rigorous mathematical methods to formally prove that a model is robust against all adversarial perturbations within a defined set.