Quantization by Clustering

📖

Begriffe

Quantization by Clustering

Model compression technique that groups similar weights into clusters to reduce memory while preserving performance. This approach enables compact weight representation using a limited number of representative centroids.

📖

Begriffe

K-means Quantization

Clustering algorithm applied to neural network weight quantization by partitioning the weight space into K clusters. Weights are then represented by their respective cluster centroids, thus reducing the required precision.

📖

Begriffe

Codebook

Set of reference vectors or centroids used to represent quantized weights in a compressed model. The codebook enables mapping original weights to low-precision representations while minimizing reconstruction error.

📖

Begriffe

Quantization Centroids

Representative points at the center of each cluster in the quantization space, serving as substitutes for original weights. These centroids are optimized to minimize the model's overall quantization error.

📖

Begriffe

Product Quantization

Advanced technique decomposing the vector space into subspaces and quantifying each separately before combining the codes. This method enables extreme compression with minimal information loss for high-dimensional models.

📖

Begriffe

Optimized Product Quantization

Variant of Product Quantization that applies a linear transformation before subspace decomposition to optimize weight distribution. This pre-transformation significantly improves the final quantization quality.

📖

Begriffe

Additive Quantization

Approach where vectors are approximated by the sum of multiple quantized codes from different codebooks. This method offers better representation flexibility than single-codebook approaches.

📖

Begriffe

Residual Quantization

Iterative technique successively quantifying residuals not captured by previous quantization steps. Each iteration refines the approximation by capturing the model's remaining errors.

📖

Begriffe

Hierarchical Clustering Quantification

Method organizing weights into a tree-structured cluster hierarchy for efficient multi-level quantification. This hierarchy enables an adjustable trade-off between precision and storage complexity.

📖

Begriffe

Subspace Quantification

Technique dividing the weight space into orthogonal subspaces to independently quantify each dimension. This approach reduces computational complexity while preserving the model's essential characteristics.

📖

Begriffe

Mahalanobis Distance in Quantification

Adaptive distance metric accounting for covariance between weights for more informative clustering. This approach improves the quality of formed groups by considering the model's structural correlations.

📖

Begriffe

Codebook Learning

Optimization process of centroids to minimize the global reconstruction error of the quantified model. This crucial step determines the final quality of compression and model performance.

📖

Begriffe

Coarse Quantizer

First quantification level performing coarse grouping of weights into broad clusters. This fast step reduces the search space for finer quantification stages.

📖

Begriffe

Fine Quantizer

Detailed quantification level operating on restricted subspaces for precise weight approximation. This step refines the representation after the initial grouping performed by the coarse quantizer.

📖

Begriffe

IVF with Quantification

Combination of Inverted File Index with quantification techniques for efficient search in compressed models. This hybrid approach optimizes both indexing and compact weight representation.

📖

Begriffe

PQ-Codes

Compact binary representations resulting from Product Quantification for each weight vector. These codes enable fast comparisons and efficient storage while preserving essential information.

📖

Begriffe

Lattice Quantization

Méthode utilisant des structures géométriques régulières (réseaux) pour partitionner l'espace des poids de manière uniforme. Cette approche garantit des propriétés théoriques optimales pour l'erreur de quantification.

KI-Glossar