Model Robustness - Glossariusz AI

📖

pojęcia

Adversarial Machine Learning

Field of study of the vulnerabilities of machine learning models against malicious attacks designed to deceive or degrade their performance. This discipline simultaneously develops attack techniques and defense strategies to strengthen the security of AI systems.

📖

pojęcia

Evasion attacks

Attack techniques where imperceptible perturbations are applied to input data to mislead an already trained model. These attacks aim to bypass the model's decisions without modifying its internal parameters.

📖

pojęcia

Data poisoning

Attack method consisting of injecting malicious data into the training set to compromise the performance of the final model. The objective is to create backdoors or systematically degrade predictions on specific targets.

📖

pojęcia

Adversarial training

Training method that actively incorporates adversarial examples into the learning process to improve the model's robustness. This approach exposes the model to the types of attacks it might encounter in production.

📖

pojęcia

Randomized smoothing

Certified defense technique that adds Gaussian noise to inputs and classifies by majority vote on multiple noisy samples. This method provides mathematical guarantees on the model's robustness against bounded perturbations.

📖

pojęcia

Extraction attacks

Attack strategy aiming to reproduce or steal a proprietary model by querying its API and analyzing its responses. These attacks exploit information leaks through predictions to reconstruct the model or its training data.

📖

pojęcia

Robustness certification

Mathematical process formally guaranteeing that a model maintains its correct predictions for all perturbations within a defined radius. This certification provides upper bounds on the model's vulnerability to attacks.

📖

pojęcia

Gradient masking

Defense technique that modifies or masks the model's gradients to prevent attackers from calculating effective adversarial perturbations. Although it may seem effective, this approach is often bypassable by more sophisticated attacks.

📖

pojęcia

Universal Adversarial Attacks

Type of attack where a single perturbation can effectively fool a model across a wide range of different inputs. These attacks are particularly dangerous because they don't require computing a specific perturbation for each sample.

📖

pojęcia

Robust Contrastive Learning

Learning approach that maximizes the similarity between representations of a sample and its adversarially augmented versions. This method encourages the model to develop features that are invariant to malicious perturbations.

📖

pojęcia

Adversarial Example Detection

Set of techniques aimed at automatically identifying potentially manipulated inputs before they are processed by the main model. These systems often use meta-classifiers or statistical analyses of activations.

📖

pojęcia

Verification-Based Training

Training method that integrates formal verifiers into the learning loop to ensure specified robustness properties. This approach combines performance optimization with mathematically proven safety constraints.

📖

pojęcia

Physical Adversarial Attacks

Attacks where adversarial perturbations are applied in the real world to physical objects to deceive vision systems. These attacks must account for lighting conditions, viewing angles, and other environmental variables.

Słownik AI

Adversarial Machine Learning

Evasion attacks

Data poisoning

Adversarial training

Randomized smoothing

Extraction attacks

Robustness certification

Gradient masking

Universal Adversarial Attacks

Robust Contrastive Learning

Adversarial Example Detection

Verification-Based Training

Physical Adversarial Attacks

Nie znaleziono wyników