vllm.model_executor.layers.mamba.ops.ssu_dispatch ¶
Dispatch module for Mamba selective state update (SSU) backends.
Provides a unified selective_state_update function that dispatches to either the Triton or FlashInfer backend based on the configured MambaBackendEnum. Follows SGLang's dispatch pattern adapted for vLLM.
FlashInferSSUBackend ¶
Bases: MambaSSUBackend
FlashInfer-based SSU backend.
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
MambaSSUBackend ¶
Bases: ABC
Abstract base class for Mamba SSU backends.
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
TritonSSUBackend ¶
Bases: MambaSSUBackend
Triton-based SSU backend (vLLM's default).
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
get_mamba_ssu_backend ¶
get_mamba_ssu_backend() -> MambaSSUBackend
Get the current Mamba SSU backend. Raises if not initialized.
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
initialize_mamba_ssu_backend ¶
initialize_mamba_ssu_backend(
mamba_config: MambaConfig,
) -> None
Initialize the global Mamba SSU backend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mamba_config | MambaConfig | Mamba configuration. | required |
Source code in vllm/model_executor/layers/mamba/ops/ssu_dispatch.py
selective_state_update ¶
selective_state_update(
state: Tensor,
x: Tensor,
dt: Tensor,
A: Tensor,
B: Tensor,
C: Tensor,
D: Tensor,
dt_bias: Tensor,
z: Tensor | None = None,
dt_softplus: bool = False,
state_batch_indices: Tensor | None = None,
dst_state_batch_indices: Tensor | None = None,
null_block_id: int = NULL_BLOCK_ID,
out: Tensor | None = None,
num_accepted_tokens: Tensor | None = None,
cu_seqlens: Tensor | None = None,
is_blackwell: bool = False,
) -> None
Unified dispatch for Mamba selective state update.
Delegates to the initialized backend (Triton or FlashInfer).