vllm.model_executor.layers.quantization.turboquant.centroids ¶
Lloyd-Max optimal scalar quantizer for TurboQuant.
After rotating a d-dimensional unit vector by a random orthogonal matrix, each coordinate approximately follows N(0, 1/d) for d >= 64. We solve the Lloyd-Max conditions to find optimal centroids.
Based on: turboquant-pytorch/lloyd_max.py (Zandieh et al.)
_trapz ¶
Trapezoidal numerical integration (replaces scipy.integrate.quad).
Source code in vllm/model_executor/layers/quantization/turboquant/centroids.py
get_centroids cached ¶
Get precomputed Lloyd-Max centroids (cached).
solve_lloyd_max ¶
solve_lloyd_max(
d: int,
bits: int,
max_iter: int = 200,
tol: float = 1e-10,
) -> tuple[Tensor, Tensor]
Solve Lloyd-Max optimal quantizer for N(0, 1/d) distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d | int | Vector dimension (determines variance = 1/d). | required |
bits | int | Number of quantization bits. | required |
max_iter | int | Maximum Lloyd-Max iterations. | 200 |
tol | float | Convergence tolerance. | 1e-10 |
Returns:
| Name | Type | Description |
|---|---|---|
centroids | Tensor | Sorted tensor of 2^bits optimal centroids. |
boundaries | Tensor | Sorted tensor of 2^bits - 1 decision boundaries. |