Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Seq2Seq Architecture
Deep learning model composed of an encoder and a decoder designed to transform variable-length sequences into other sequences. This architecture is fundamentally used for machine translation, text summarization, and dialogue generation tasks.
Teacher Forcing
Training strategy where the decoder receives the true previous values as input rather than its own predictions, accelerating convergence. This technique stabilizes learning but can create a divergence between training and inference known as exposure bias.
Masking
Procedure consisting of masking certain positions of sequences to prevent the model from processing irrelevant or future information. Masking is essential for managing variable-length sequences and preventing cheating during auto-regressive training.
Embedding Vector
Dense vector representation of discrete tokens that captures semantic and syntactic relationships in a continuous space. Embeddings are learned during training and constitute the fundamental input of sequence processing models.
Gated Recurrent Unit
Simplified variant of LSTM using two gates (update and reset) to regulate information flow with fewer parameters. GRUs offer comparable performance to LSTM while being more computationally efficient.
Greedy Search
Decoding strategy that systematically selects the token with the highest probability at each generation step. Although fast, this approach can lead to suboptimal solutions as it does not consider alternative sequences.
Bidirectionality
Ability of the encoder to process the input sequence in both directions (forward and backward) to capture the complete context. Bidirectional encoders improve semantic understanding by considering both past and future context.
Subword Embeddings
Tokenization technique that divides words into smaller morphological units, allowing to handle rare words and open vocabulary. Subword embeddings like BPE or WordPiece have become the standard in modern models.