vllm.model_executor.model_loader.weight_utils ¶
Utilities for downloading and initializing model weights.
_natural_sort_key ¶
Natural sort key for filenames with numeric components, such as model-00001-of-00005.safetensors -> ['model-', 1, '-of-', 5, '.safetensors']
Source code in vllm/model_executor/model_loader/weight_utils.py
atomic_writer ¶
atomic_writer(
filepath: str | Path,
mode: str = "w",
encoding: str | None = None,
) -> Generator[IO]
Context manager that provides an atomic file writing routine.
The context manager writes to a temporary file and, if successful, atomically replaces the original file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath | str or Path | The path to the file to write. | required |
mode | str | The file mode for the temporary file (e.g., 'w', 'wb'). | 'w' |
encoding | str | The encoding for text mode. | None |
Yields:
| Type | Description |
|---|---|
Generator[IO] | file object: A handle to the temporary file. |
Source code in vllm/model_executor/model_loader/weight_utils.py
composed_weight_loader ¶
Create a weight loader that post-processes the weights after loading
Source code in vllm/model_executor/model_loader/weight_utils.py
convert_pyslice_to_tensor ¶
convert PySafeSlice object from safetensors to torch.Tensor
PySafeSlice object supports indexing, which is done before loading the actual tensor and can reduce the amount of memory being read into the memory. However, it does not support more advanced functionalities like .view() or .t(). Therefore, if we need to modify the loaded tensor with these more complicated operators, we need to convert to tensor first.
Source code in vllm/model_executor/model_loader/weight_utils.py
default_weight_loader ¶
Default weight loader.
Source code in vllm/model_executor/model_loader/weight_utils.py
download_safetensors_index_file_from_hf ¶
download_safetensors_index_file_from_hf(
model_name_or_path: str,
index_file: str,
cache_dir: str | None,
revision: str | None = None,
) -> None
Download hf safetensors index file from Hugging Face Hub.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name_or_path | str | The model name or path. | required |
index_file | str | The safetensors index file name | required |
cache_dir | Optional[str] | The cache directory to store the model weights. If None, will use HF defaults. | required |
revision | Optional[str] | The revision of the model. | None |
Source code in vllm/model_executor/model_loader/weight_utils.py
download_weights_from_hf ¶
download_weights_from_hf(
model_name_or_path: str,
cache_dir: str | None,
allow_patterns: list[str],
revision: str | None = None,
ignore_patterns: str | list[str] | None = None,
) -> str
Download model weights from Hugging Face Hub.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name_or_path | str | The model name or path. | required |
cache_dir | Optional[str] | The cache directory to store the model weights. If None, will use HF defaults. | required |
allow_patterns | list[str] | The allowed patterns for the weight files. Files matched by any of the patterns will be downloaded. | required |
revision | Optional[str] | The revision of the model. | None |
ignore_patterns | Optional[Union[str, list[str]]] | The patterns to filter out the weight files. Files matched by any of the patterns will be ignored. | None |
Returns:
| Name | Type | Description |
|---|---|---|
str | str | The path to the downloaded model weights. |
Source code in vllm/model_executor/model_loader/weight_utils.py
435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 | |
fastsafetensors_weights_iterator ¶
fastsafetensors_weights_iterator(
hf_weights_files: list[str], use_tqdm_on_load: bool
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model safetensor files using fastsafetensor library.
Source code in vllm/model_executor/model_loader/weight_utils.py
filter_files_not_needed_for_inference ¶
Exclude files that are not needed for inference.
See https://github.com/huggingface/transformers/blob/v4.34.0/src/transformers/trainer.py#L227-L233
Source code in vllm/model_executor/model_loader/weight_utils.py
get_gguf_weight_type_map ¶
Return GGUF mapped weight's name and its quant type
Source code in vllm/model_executor/model_loader/weight_utils.py
gguf_quant_weights_iterator ¶
gguf_quant_weights_iterator(
gguf_file: str, gguf_to_hf_name_map: dict[str, str]
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the quant weights in the model gguf files and convert them to torch tensors. Be careful of the order of yielding weight types and weights data, we have to yield all weight types first before yielding any weights. Otherwise it would cause issue when loading weights with for packed layer with different quant types.
Source code in vllm/model_executor/model_loader/weight_utils.py
initialize_dummy_weights ¶
initialize_dummy_weights(
model: Module,
model_config: ModelConfig,
low: float = -0.001,
high: float = 0.001,
seed: int = 1234,
) -> None
Initialize model weights with random values.
The model weights must be randomly initialized for accurate performance measurements. Additionally, the model weights should not cause NaNs in the forward pass. We empirically found that initializing the weights with values between -1e-3 and 1e-3 works well for most models.
We use per-parameter random seed, so that dummy weights are consistent, even if the model is partitioned across multiple devices. When the seed is fixed, the random values generated by this function only depends on the parameter's number of elements and its data type.
Source code in vllm/model_executor/model_loader/weight_utils.py
maybe_download_from_modelscope ¶
maybe_download_from_modelscope(
model: str,
revision: str | None = None,
download_dir: str | None = None,
ignore_patterns: str | list[str] | None = None,
allow_patterns: list[str] | str | None = None,
) -> str | None
Download model from ModelScope hub if VLLM_USE_MODELSCOPE is True.
Returns the path to the downloaded model, or None if the model is not downloaded from ModelScope.
Source code in vllm/model_executor/model_loader/weight_utils.py
maybe_remap_kv_scale_name ¶
Remap the name of FP8 k/v_scale parameters.
This function handles the remapping of FP8 k/v_scale parameter names. It detects if the given name ends with a suffix and attempts to remap it to the expected name format in the model. If the remapped name is not found in the params_dict, a warning is printed and None is returned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | The original loaded checkpoint parameter name. | required |
params_dict | dict | Dictionary containing the model's named parameters. | required |
Returns:
| Name | Type | Description |
|---|---|---|
str | str | None | The remapped parameter name if successful, or the original name if no remapping is needed. |
None | str | None | If the remapped name is not found in params_dict. |
Source code in vllm/model_executor/model_loader/weight_utils.py
1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 | |
multi_thread_pt_weights_iterator ¶
multi_thread_pt_weights_iterator(
hf_weights_files: list[str],
use_tqdm_on_load: bool,
pt_load_map_location: str | dict[str, str] = "cpu",
max_workers: int = 4,
) -> Generator[tuple[str, Tensor], None, None]
Multi-Thread iterate over the weights in the model bin/pt files.
Source code in vllm/model_executor/model_loader/weight_utils.py
multi_thread_safetensors_weights_iterator ¶
multi_thread_safetensors_weights_iterator(
hf_weights_files: list[str],
use_tqdm_on_load: bool,
max_workers: int = 4,
) -> Generator[tuple[str, Tensor], None, None]
Multi-Thread iterate over the weights in the model safetensor files.
Source code in vllm/model_executor/model_loader/weight_utils.py
np_cache_weights_iterator ¶
np_cache_weights_iterator(
model_name_or_path: str,
cache_dir: str | None,
hf_folder: str,
hf_weights_files: list[str],
use_tqdm_on_load: bool,
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model np files.
Will dump the model weights to numpy files if they are not already dumped.
Source code in vllm/model_executor/model_loader/weight_utils.py
pt_weights_iterator ¶
pt_weights_iterator(
hf_weights_files: list[str],
use_tqdm_on_load: bool,
pt_load_map_location: str | dict[str, str] = "cpu",
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model bin/pt files.
Source code in vllm/model_executor/model_loader/weight_utils.py
row_parallel_weight_loader ¶
Load weights that are row-parallelized.
Source code in vllm/model_executor/model_loader/weight_utils.py
runai_safetensors_weights_iterator ¶
runai_safetensors_weights_iterator(
hf_weights_files: list[str],
use_tqdm_on_load: bool,
is_distributed: bool = False,
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model safetensor files.
Source code in vllm/model_executor/model_loader/weight_utils.py
safetensors_weights_iterator ¶
safetensors_weights_iterator(
hf_weights_files: list[str],
use_tqdm_on_load: bool,
safetensors_load_strategy: str = "lazy",
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model safetensor files.
Source code in vllm/model_executor/model_loader/weight_utils.py
sharded_weight_loader ¶
sharded_weight_loader(shard_axis: int) -> LoaderFunction
Create a weight loader that shards the weights along the given axis