Attacks on NLP - Bảng thuật ngữ Trí tuệ nhân tạo

📖

thuật ngữ

Textual Adversarial Attack

Technique consisting of subtly modifying an input text to mislead an NLP model while preserving semantics for a human reader.

📖

thuật ngữ

Character-Level Perturbation

Modification of individual characters in text (insertion, deletion, substitution) to create adversarial examples that are difficult to detect.

📖

thuật ngữ

Lexical Substitution Attack

Replacement of words with semantically close synonyms that change the NLP model's prediction in a targeted manner.

📖

thuật ngữ

Universal Adversarial Triggers

Specific sequences of words or characters that, when inserted into any text, systematically cause a classification error by the model.

📖

thuật ngữ

Black-Box Attack

Attack conducted without knowledge of the model's internal parameters, using only the model's predictions to construct adversarial examples.

📖

thuật ngữ

White-Box Attack

Attack exploiting complete knowledge of the model's architecture and gradients to generate optimal perturbations.

📖

thuật ngữ

Transfer Attack

Generation of adversarial examples on a source model that retain their effectiveness on unknown target models.

📖

thuật ngữ

Semantic Preservation

Constraint ensuring that textual perturbations do not alter the overall meaning of the text for a human reader.

📖

thuật ngữ

Data Poisoning Attack

Malicious insertion of corrupted examples into the training set to degrade model performance during its learning phase.

📖

thuật ngữ

Syntactic Perturbation

Modification of the grammatical or syntactic structure of a sentence while preserving its semantic meaning to deceive NLP models.

📖

thuật ngữ

Gradient Masking

Defense technique that modifies the model's gradient to prevent optimization-based attacks, without necessarily improving actual robustness.

📖

thuật ngữ

Query Attack

Black-box attack that optimizes perturbations by iteratively querying the model and analyzing its responses.

📖

thuật ngữ

Semantic Robustness

Ability of an NLP model to maintain consistent predictions in the face of textual variations preserving meaning but altering form.

📖

thuật ngữ

Adversarial Search Space

Set of all possible text modifications that can be applied to generate valid adversarial examples.

📖

thuật ngữ

Perturbation Score

Quantitative metric evaluating the magnitude of modification applied to the original text to create an adversarial example.

📖

thuật ngữ

Multi-objective Attack

Adversarial attack seeking simultaneously to deceive the model while optimizing multiple constraints such as readability or semantic preservation.

📖

thuật ngữ

Adversarial Attack Detection

Defensive mechanism identifying potentially adversarial inputs based on statistical or behavioral anomalies in predictions.

Thuật ngữ AI

Textual Adversarial Attack

Character-Level Perturbation

Lexical Substitution Attack

Universal Adversarial Triggers

Black-Box Attack

White-Box Attack

Transfer Attack

Semantic Preservation

Data Poisoning Attack

Syntactic Perturbation

Gradient Masking

Query Attack

Semantic Robustness

Adversarial Search Space

Perturbation Score

Multi-objective Attack

Adversarial Attack Detection

Không tìm thấy kết quả