Model Robustness - Bảng thuật ngữ Trí tuệ nhân tạo

📖

thuật ngữ

Adversarial Machine Learning

Field of study of the vulnerabilities of machine learning models against malicious attacks designed to deceive or degrade their performance. This discipline simultaneously develops attack techniques and defense strategies to strengthen the security of AI systems.

📖

thuật ngữ

Evasion attacks

Attack techniques where imperceptible perturbations are applied to input data to mislead an already trained model. These attacks aim to bypass the model's decisions without modifying its internal parameters.

📖

thuật ngữ

Data poisoning

Attack method consisting of injecting malicious data into the training set to compromise the performance of the final model. The objective is to create backdoors or systematically degrade predictions on specific targets.

📖

thuật ngữ

Adversarial training

Training method that actively incorporates adversarial examples into the learning process to improve the model's robustness. This approach exposes the model to the types of attacks it might encounter in production.

📖

thuật ngữ

Randomized smoothing

Certified defense technique that adds Gaussian noise to inputs and classifies by majority vote on multiple noisy samples. This method provides mathematical guarantees on the model's robustness against bounded perturbations.

📖

thuật ngữ

Extraction attacks

Attack strategy aiming to reproduce or steal a proprietary model by querying its API and analyzing its responses. These attacks exploit information leaks through predictions to reconstruct the model or its training data.

📖

thuật ngữ

Robustness certification

Mathematical process formally guaranteeing that a model maintains its correct predictions for all perturbations within a defined radius. This certification provides upper bounds on the model's vulnerability to attacks.

📖

thuật ngữ

Gradient masking

Defense technique that modifies or masks the model's gradients to prevent attackers from calculating effective adversarial perturbations. Although it may seem effective, this approach is often bypassable by more sophisticated attacks.

📖

thuật ngữ

Universal Adversarial Attacks

Type of attack where a single perturbation can effectively fool a model across a wide range of different inputs. These attacks are particularly dangerous because they don't require computing a specific perturbation for each sample.

📖

thuật ngữ

Robust Contrastive Learning

Learning approach that maximizes the similarity between representations of a sample and its adversarially augmented versions. This method encourages the model to develop features that are invariant to malicious perturbations.

📖

thuật ngữ

Adversarial Example Detection

Set of techniques aimed at automatically identifying potentially manipulated inputs before they are processed by the main model. These systems often use meta-classifiers or statistical analyses of activations.

📖

thuật ngữ

Verification-Based Training

Training method that integrates formal verifiers into the learning loop to ensure specified robustness properties. This approach combines performance optimization with mathematically proven safety constraints.

📖

thuật ngữ

Physical Adversarial Attacks

Attacks where adversarial perturbations are applied in the real world to physical objects to deceive vision systems. These attacks must account for lighting conditions, viewing angles, and other environmental variables.

Thuật ngữ AI

Adversarial Machine Learning

Evasion attacks

Data poisoning

Adversarial training

Randomized smoothing

Extraction attacks

Robustness certification

Gradient masking

Universal Adversarial Attacks

Robust Contrastive Learning

Adversarial Example Detection

Verification-Based Training

Physical Adversarial Attacks

Không tìm thấy kết quả