Python module
cache_params
KVCacheParams
class max.nn.kv_cache.cache_params.KVCacheParams(dtype: max.dtype.DType, n_kv_heads: int, head_dim: int, enable_prefix_caching: bool = False, enable_kvcache_swapping_to_host: bool = False, host_kvcache_swap_space_gb: Optional[float] = None, cache_strategy: max.nn.kv_cache.cache_params.KVCacheStrategy = <KVCacheStrategy.CONTINUOUS: 'continuous'>, page_size: Optional[int] = None, n_devices: int = 1, pipeline_parallel_degree: int = 1, stage_id: Optional[int] = None, total_num_layers: Optional[int] = None, n_kv_heads_per_device: int = 0, n_layers_per_stage: Optional[int] = None)
-
Parameters:
-
- dtype (DType)
- n_kv_heads (int)
- head_dim (int)
- enable_prefix_caching (bool)
- enable_kvcache_swapping_to_host (bool)
- host_kvcache_swap_space_gb (float | None)
- cache_strategy (KVCacheStrategy)
- page_size (int | None)
- n_devices (int)
- pipeline_parallel_degree (int)
- stage_id (int | None)
- total_num_layers (int | None)
- n_kv_heads_per_device (int)
- n_layers_per_stage (int | None)
cache_strategy
cache_strategy: KVCacheStrategy = 'continuous'
dtype
dtype: DType
dtype_shorthand
property dtype_shorthand: str
The textual representation in shorthand of the dtype.
enable_kvcache_swapping_to_host
enable_kvcache_swapping_to_host: bool = False
enable_prefix_caching
enable_prefix_caching: bool = False
head_dim
head_dim: int
host_kvcache_swap_space_gb
n_devices
n_devices: int = 1
n_kv_heads
n_kv_heads: int
n_kv_heads_per_device
n_kv_heads_per_device: int = 0
n_layers_per_stage
page_size
pipeline_parallel_degree
pipeline_parallel_degree: int = 1
stage_id
static_cache_shape
total_num_layers
KVCacheStrategy
class max.nn.kv_cache.cache_params.KVCacheStrategy(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
CONTINUOUS
CONTINUOUS = 'continuous'
Deprecated. Use PAGED
instead.
MODEL_DEFAULT
MODEL_DEFAULT = 'model_default'
PAGED
PAGED = 'paged'
kernel_substring()
kernel_substring()
Returns the common substring that we include in the kernel name for this caching strategy.
-
Return type:
uses_opaque()
uses_opaque()
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!