Quantization

📖

용어

Process of reducing the numerical precision of AI model weights and activations to optimize inference and reduce memory footprint.

📖

용어

8-bit Quantization

Compression technique reducing model weights from 32 bits to 8 bits, offering an optimal trade-off between performance and accuracy for LLMs.

📖

용어

4-bit Quantization

Extreme compression method reducing weights to 4 bits, allowing significant memory gains but with potential quality loss.

📖

용어

Post-Training Quantization (PTQ)

Technique applied after model training, converting weights to reduced precision without requiring full retraining.

📖

용어

Quantization Aware Training (QAT)

Training approach simulating quantization effects during the learning process to minimize accuracy loss.

📖

용어

Dynamic Quantization

Method applied during inference where activations are quantized on-the-fly, offering flexibility but with computational overhead.

📖

용어

Static Quantization

Approach precomputing quantization parameters before inference, optimizing speed at the expense of flexibility.

📖

용어

Quantization Calibration

Process of determining optimal quantization parameters (scale, zero-point) from a sample of representative data.

📖

용어

GPTQ

Gradient-based Post Training Quantization, an advanced technique that iteratively optimizes quantized weights to minimize reconstruction error.

📖

용어

AWQ

Activation-aware Weight Quantization, a method that weights the importance of weights according to the amplitude of corresponding activations.

📖

용어

Zero-shot Quantification

Technique requiring no calibration data, using heuristics based on weight distribution to quantify the model.

📖

용어

Mixed Precision Quantification

Strategy applying different quantification precisions according to model layers to optimize the performance/accuracy trade-off.

📖

용어

Symmetric Quantification

Quantification scheme where the value range is centered around zero, simplifying calculations but potentially underutilizing the dynamic range.

📖

용어

Asymmetric Quantification

Approach allowing value ranges not centered on zero, optimizing the use of the quantized range for asymmetric distributions.

📖

용어

Scale Factor

Multiplicative parameter used to map continuous values into the quantized range, crucial for quantification accuracy.

📖

용어

Zero Point

Offset added during asymmetric quantification to align the floating-point zero value with the quantized representation.

📖

용어

Quantization Noise

Error introduced by precision reduction, manifesting as model performance degradation due to weight approximation.

📖

용어

Quantization-aware Fine-tuning

Post-quantization fine-tuning process aimed at recovering accuracy lost during model compression.

📖

용어

SmoothQuant

Quantization technique equalizing quantization difficulty between weights and activations through prior mathematical transformation.

📖

용어

LLM.int8()

Specific 8-bit quantization method for large language models, combining matrix decomposition and hybrid quantization.

AI 용어집