vllm.model_executor.layers.attention.cross_attention ¶
CrossAttention ¶
Bases: Attention
Cross-attention for encoder-decoder models. Handles attention between decoder queries and encoder keys/values.
Source code in vllm/model_executor/layers/attention/cross_attention.py
_get_cross_slot_mapping ¶
_get_cross_slot_mapping(
encoder_seq_lens: ndarray,
block_table_tensor: Tensor,
kv_cache_spec: CrossAttentionSpec,
device: device,
) -> Tensor
Get cross-attention slot mappings.