Quantification and Optimization - Bảng thuật ngữ Trí tuệ nhân tạo

📖

thuật ngữ

Quantization Aware Training (QAT)

Optimization method where low-precision quantization simulation is integrated during training, allowing the model to adapt its weights to minimize the performance loss induced by quantization.

📖

thuật ngữ

Low-Rank Adaptation (LoRA)

Efficient adaptation method that freezes the weights of a pre-trained model and injects small decomposable low-rank matrices, drastically reducing the number of trainable parameters for fine-tuning while preserving performance.

📖

thuật ngữ

8-bit Floating Point Representation (FP8)

Very low-precision numerical data format using 8 bits to represent floating-point numbers, enabling significant accelerations on modern GPUs while maintaining the training stability of large models.

📖

thuật ngữ

4-bit Integer Quantization (INT4)

Extreme compression technique representing model weights on 4 bits, requiring advanced quantization algorithms and often partial retraining to compensate for significant information loss.

📖

thuật ngữ

Quantization Bias Compensation (Q-Bias)

Post-quantization adjustment technique that systematically analyzes and corrects the biases introduced by precision reduction, often by modifying normalization layers or the biases of linear layers.

📖

thuật ngữ

Quantization Grid Search Optimization

Systematic exploration method of different quantization configurations (per-layer, per-group, mixed) to identify the optimal scheme offering the best balance between model size, speed, and precision for a given architecture.

📖

thuật ngữ

Speculative Inference

Generative inference acceleration technique where a small 'draft' model quickly proposes multiple tokens, which are then validated in parallel by the large target model, reducing the total number of costly computation steps.

📖

thuật ngữ

Truncated Singular Value Decomposition (Truncated SVD)

Application of SVD decomposition followed by truncation of the smallest singular values to approximate a weight matrix by a lower-rank sum, thus reducing parameters and computation with controlled error.

📖

thuật ngữ

Block-wise Quantization

Quantization strategy that divides weight tensors into smaller blocks and applies independent quantization to each block, better preserving the value distribution and reducing the overall error compared to global quantization.

📖

thuật ngữ

Structured Sparse Weights

Form of pruning that imposes regularity patterns (by row, column, or block) on the pruned weights, allowing efficient exploitation of hardware accelerations on CPUs/GPUs unlike random unstructured sparsity.

Thuật ngữ AI

Quantization Aware Training (QAT)

Low-Rank Adaptation (LoRA)

8-bit Floating Point Representation (FP8)

4-bit Integer Quantization (INT4)

Quantization Bias Compensation (Q-Bias)

Quantization Grid Search Optimization

Speculative Inference

Truncated Singular Value Decomposition (Truncated SVD)

Block-wise Quantization

Structured Sparse Weights

Không tìm thấy kết quả