Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Adversarial Machine Learning
Field of study of the vulnerabilities of machine learning models against malicious attacks designed to deceive or degrade their performance. This discipline simultaneously develops attack techniques and defense strategies to strengthen the security of AI systems.
Evasion attacks
Attack techniques where imperceptible perturbations are applied to input data to mislead an already trained model. These attacks aim to bypass the model's decisions without modifying its internal parameters.
Data poisoning
Attack method consisting of injecting malicious data into the training set to compromise the performance of the final model. The objective is to create backdoors or systematically degrade predictions on specific targets.
Adversarial training
Training method that actively incorporates adversarial examples into the learning process to improve the model's robustness. This approach exposes the model to the types of attacks it might encounter in production.
Randomized smoothing
Certified defense technique that adds Gaussian noise to inputs and classifies by majority vote on multiple noisy samples. This method provides mathematical guarantees on the model's robustness against bounded perturbations.
Extraction attacks
Attack strategy aiming to reproduce or steal a proprietary model by querying its API and analyzing its responses. These attacks exploit information leaks through predictions to reconstruct the model or its training data.
Robustness certification
Mathematical process formally guaranteeing that a model maintains its correct predictions for all perturbations within a defined radius. This certification provides upper bounds on the model's vulnerability to attacks.
Gradient masking
Defense technique that modifies or masks the model's gradients to prevent attackers from calculating effective adversarial perturbations. Although it may seem effective, this approach is often bypassable by more sophisticated attacks.
Universal Adversarial Attacks
Type of attack where a single perturbation can effectively fool a model across a wide range of different inputs. These attacks are particularly dangerous because they don't require computing a specific perturbation for each sample.
Robust Contrastive Learning
Learning approach that maximizes the similarity between representations of a sample and its adversarially augmented versions. This method encourages the model to develop features that are invariant to malicious perturbations.
Adversarial Example Detection
Set of techniques aimed at automatically identifying potentially manipulated inputs before they are processed by the main model. These systems often use meta-classifiers or statistical analyses of activations.
Verification-Based Training
Training method that integrates formal verifiers into the learning loop to ensure specified robustness properties. This approach combines performance optimization with mathematically proven safety constraints.
Physical Adversarial Attacks
Attacks where adversarial perturbations are applied in the real world to physical objects to deceive vision systems. These attacks must account for lighting conditions, viewing angles, and other environmental variables.