Mojo struct
Info
@register_passable
struct Info
Comprehensive information about a GPU architecture.
This struct contains detailed specifications about GPU capabilities, including compute units, memory, thread organization, and performance characteristics.
Fields
- name (
StringLiteral
): The model name of the GPU. - vendor (
Vendor
): The vendor/manufacturer of the GPU (e.g., NVIDIA, AMD). - api (
StringLiteral
): The graphics/compute API supported by the GPU (e.g., CUDA, ROCm). - arch_name (
StringLiteral
): The architecture name of the GPU (e.g., sm_80, gfx942). - compile_options (
StringLiteral
): Compiler options specific to this GPU architecture. - compute (
SIMD[float32, 1]
): Compute capability version number for NVIDIA GPUs. - version (
StringLiteral
): Version string of the GPU architecture. - sm_count (
Int
): Number of streaming multiprocessors (SMs) on the GPU. - warp_size (
Int
): Number of threads in a warp/wavefront. - threads_per_sm (
Int
): Maximum number of threads per streaming multiprocessor. - threads_per_warp (
Int
): Number of threads that execute together in a warp/wavefront. - warps_per_multiprocessor (
Int
): Maximum number of warps that can be active on a multiprocessor. - threads_per_multiprocessor (
Int
): Maximum number of threads that can be active on a multiprocessor. - thread_blocks_per_multiprocessor (
Int
): Maximum number of thread blocks that can be active on a multiprocessor. - shared_memory_per_multiprocessor (
Int
): Size of shared memory available per multiprocessor in bytes. - register_file_size (
Int
): Total size of the register file per multiprocessor in bytes. - register_allocation_unit_size (
Int
): Minimum allocation size for registers in bytes. - allocation_granularity (
StringLiteral
): Description of how resources are allocated on the GPU. - max_registers_per_thread (
Int
): Maximum number of registers that can be allocated to a single thread. - max_registers_per_block (
Int
): Maximum number of registers that can be allocated to a thread block. - max_blocks_per_multiprocessor (
Int
): Maximum number of blocks that can be scheduled on a multiprocessor. - shared_memory_allocation_unit_size (
Int
): Minimum allocation size for shared memory in bytes. - warp_allocation_granularity (
Int
): Granularity at which warps are allocated resources. - max_thread_block_size (
Int
): Maximum number of threads allowed in a thread block. - flops (
Flops
): Floating-point operations per second capabilities for different precisions.
Implemented traits
AnyType
,
Copyable
,
ExplicitlyCopyable
,
Movable
,
UnknownDestructibility
,
Writable
Methods
__lt__
__lt__(self, other: Self) -> Bool
Compares if this GPU has lower compute capability than another.
Args:
- other (
Self
): Another GPU Info instance to compare against.
Returns:
True if this GPU has lower compute capability, False otherwise.
__le__
__le__(self, other: Self) -> Bool
Compares if this GPU has lower or equal compute capability.
Args:
- other (
Self
): Another GPU Info instance to compare against.
Returns:
True if this GPU has lower or equal compute capability.
__eq__
__eq__(self, other: Self) -> Bool
Checks if two GPU Info instances represent the same GPU model.
Args:
- other (
Self
): Another GPU Info instance to compare against.
Returns:
True if both instances represent the same GPU model.
__ne__
__ne__(self, other: Self) -> Bool
Checks if two GPU Info instances represent different GPU models.
Args:
- other (
Self
): Another GPU Info instance to compare against.
Returns:
True if instances represent different GPU models.
__gt__
__gt__(self, other: Self) -> Bool
Compares if this GPU has higher compute capability than another.
Args:
- other (
Self
): Another GPU Info instance to compare against.
Returns:
True if this GPU has higher compute capability, False otherwise.
__ge__
__ge__(self, other: Self) -> Bool
Compares if this GPU has higher or equal compute capability.
Args:
- other (
Self
): Another GPU Info instance to compare against.
Returns:
True if this GPU has higher or equal compute capability.
__is__
__is__(self, other: Self) -> Bool
Identity comparison operator for GPU Info instances.
Args:
- other (
Self
): Another GPU Info instance to compare against.
Returns:
True if both instances represent the same GPU model.
__isnot__
__isnot__(self, other: Self) -> Bool
Negative identity comparison operator for GPU Info instances.
Args:
- other (
Self
): Another GPU Info instance to compare against.
Returns:
True if instances represent different GPU models.
target
target[index_bit_width: Int = 64](self) -> target
Gets the MLIR target configuration for this GPU.
Parameters:
- index_bit_width (
Int
): The bit width for indices (default: 64).
Returns:
MLIR target configuration for the GPU.
from_target
static from_target[target: target]() -> Self
Creates an Info instance from an MLIR target.
Parameters:
- target (
target
): MLIR target configuration.
Returns:
GPU info corresponding to the target.
from_name
static from_name[name: StringLiteral]() -> Self
Creates an Info instance from a GPU architecture name.
Parameters:
- name (
StringLiteral
): GPU architecture name (e.g., "sm_80", "gfx942").
Returns:
GPU info corresponding to the architecture name.
occupancy
occupancy(self, *, threads_per_block: Int, registers_per_thread: Int) -> SIMD[float64, 1]
Calculates theoretical occupancy for given thread and register config.
Occupancy represents the ratio of active warps to the maximum possible warps on a streaming multiprocessor.
Note: TODO (KERN-795): Add occupancy calculation based on shared memory usage and thread block size and take use the minimum value.
Args:
- threads_per_block (
Int
): Number of threads in each block. - registers_per_thread (
Int
): Number of registers used by each thread.
Returns:
Occupancy as a ratio between 0.0 and 1.0.
write_to
write_to[W: Writer](self, mut writer: W)
Writes GPU information to a writer.
Outputs all GPU specifications and capabilities to the provided writer in a human-readable format.
Parameters:
- W (
Writer
): The type of writer to use for output. Must implement the Writer trait.
Args:
- writer (
W
): A Writer instance to output the GPU information.
__str__
__str__(self) -> String
Returns a string representation of the GPU information.
Converts all GPU specifications and capabilities to a human-readable string format.
Returns:
String containing all GPU information.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!