Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Textual Adversarial Attack
Technique consisting of subtly modifying an input text to mislead an NLP model while preserving semantics for a human reader.
Character-Level Perturbation
Modification of individual characters in text (insertion, deletion, substitution) to create adversarial examples that are difficult to detect.
Lexical Substitution Attack
Replacement of words with semantically close synonyms that change the NLP model's prediction in a targeted manner.
Universal Adversarial Triggers
Specific sequences of words or characters that, when inserted into any text, systematically cause a classification error by the model.
Black-Box Attack
Attack conducted without knowledge of the model's internal parameters, using only the model's predictions to construct adversarial examples.
White-Box Attack
Attack exploiting complete knowledge of the model's architecture and gradients to generate optimal perturbations.
Transfer Attack
Generation of adversarial examples on a source model that retain their effectiveness on unknown target models.
Semantic Preservation
Constraint ensuring that textual perturbations do not alter the overall meaning of the text for a human reader.
Data Poisoning Attack
Malicious insertion of corrupted examples into the training set to degrade model performance during its learning phase.
Syntactic Perturbation
Modification of the grammatical or syntactic structure of a sentence while preserving its semantic meaning to deceive NLP models.
Gradient Masking
Defense technique that modifies the model's gradient to prevent optimization-based attacks, without necessarily improving actual robustness.
Query Attack
Black-box attack that optimizes perturbations by iteratively querying the model and analyzing its responses.
Semantic Robustness
Ability of an NLP model to maintain consistent predictions in the face of textual variations preserving meaning but altering form.
Adversarial Search Space
Set of all possible text modifications that can be applied to generate valid adversarial examples.
Perturbation Score
Quantitative metric evaluating the magnitude of modification applied to the original text to create an adversarial example.
Multi-objective Attack
Adversarial attack seeking simultaneously to deceive the model while optimizing multiple constraints such as readability or semantic preservation.
Adversarial Attack Detection
Defensive mechanism identifying potentially adversarial inputs based on statistical or behavioral anomalies in predictions.