Skip to main content
Log in

Mojo struct

Info

@register_passable struct Info

Comprehensive information about a GPU architecture.

This struct contains detailed specifications about GPU capabilities, including compute units, memory, thread organization, and performance characteristics.

Fields

  • name (StringLiteral): The model name of the GPU.
  • vendor (Vendor): The vendor/manufacturer of the GPU (e.g., NVIDIA, AMD).
  • api (StringLiteral): The graphics/compute API supported by the GPU (e.g., CUDA, ROCm).
  • arch_name (StringLiteral): The architecture name of the GPU (e.g., sm_80, gfx942).
  • compile_options (StringLiteral): Compiler options specific to this GPU architecture.
  • compute (SIMD[float32, 1]): Compute capability version number for NVIDIA GPUs.
  • version (StringLiteral): Version string of the GPU architecture.
  • sm_count (Int): Number of streaming multiprocessors (SMs) on the GPU.
  • warp_size (Int): Number of threads in a warp/wavefront.
  • threads_per_sm (Int): Maximum number of threads per streaming multiprocessor.
  • threads_per_warp (Int): Number of threads that execute together in a warp/wavefront.
  • warps_per_multiprocessor (Int): Maximum number of warps that can be active on a multiprocessor.
  • threads_per_multiprocessor (Int): Maximum number of threads that can be active on a multiprocessor.
  • thread_blocks_per_multiprocessor (Int): Maximum number of thread blocks that can be active on a multiprocessor.
  • shared_memory_per_multiprocessor (Int): Size of shared memory available per multiprocessor in bytes.
  • register_file_size (Int): Total size of the register file per multiprocessor in bytes.
  • register_allocation_unit_size (Int): Minimum allocation size for registers in bytes.
  • allocation_granularity (StringLiteral): Description of how resources are allocated on the GPU.
  • max_registers_per_thread (Int): Maximum number of registers that can be allocated to a single thread.
  • max_registers_per_block (Int): Maximum number of registers that can be allocated to a thread block.
  • max_blocks_per_multiprocessor (Int): Maximum number of blocks that can be scheduled on a multiprocessor.
  • shared_memory_allocation_unit_size (Int): Minimum allocation size for shared memory in bytes.
  • warp_allocation_granularity (Int): Granularity at which warps are allocated resources.
  • max_thread_block_size (Int): Maximum number of threads allowed in a thread block.
  • flops (Flops): Floating-point operations per second capabilities for different precisions.

Implemented traits

AnyType, Copyable, ExplicitlyCopyable, Movable, UnknownDestructibility, Writable

Methods

__lt__

__lt__(self, other: Self) -> Bool

Compares if this GPU has lower compute capability than another.

Args:

  • other (Self): Another GPU Info instance to compare against.

Returns:

True if this GPU has lower compute capability, False otherwise.

__le__

__le__(self, other: Self) -> Bool

Compares if this GPU has lower or equal compute capability.

Args:

  • other (Self): Another GPU Info instance to compare against.

Returns:

True if this GPU has lower or equal compute capability.

__eq__

__eq__(self, other: Self) -> Bool

Checks if two GPU Info instances represent the same GPU model.

Args:

  • other (Self): Another GPU Info instance to compare against.

Returns:

True if both instances represent the same GPU model.

__ne__

__ne__(self, other: Self) -> Bool

Checks if two GPU Info instances represent different GPU models.

Args:

  • other (Self): Another GPU Info instance to compare against.

Returns:

True if instances represent different GPU models.

__gt__

__gt__(self, other: Self) -> Bool

Compares if this GPU has higher compute capability than another.

Args:

  • other (Self): Another GPU Info instance to compare against.

Returns:

True if this GPU has higher compute capability, False otherwise.

__ge__

__ge__(self, other: Self) -> Bool

Compares if this GPU has higher or equal compute capability.

Args:

  • other (Self): Another GPU Info instance to compare against.

Returns:

True if this GPU has higher or equal compute capability.

__is__

__is__(self, other: Self) -> Bool

Identity comparison operator for GPU Info instances.

Args:

  • other (Self): Another GPU Info instance to compare against.

Returns:

True if both instances represent the same GPU model.

__isnot__

__isnot__(self, other: Self) -> Bool

Negative identity comparison operator for GPU Info instances.

Args:

  • other (Self): Another GPU Info instance to compare against.

Returns:

True if instances represent different GPU models.

target

target[index_bit_width: Int = 64](self) -> target

Gets the MLIR target configuration for this GPU.

Parameters:

  • index_bit_width (Int): The bit width for indices (default: 64).

Returns:

MLIR target configuration for the GPU.

from_target

static from_target[target: target]() -> Self

Creates an Info instance from an MLIR target.

Parameters:

  • target (target): MLIR target configuration.

Returns:

GPU info corresponding to the target.

from_name

static from_name[name: StringLiteral]() -> Self

Creates an Info instance from a GPU architecture name.

Parameters:

  • name (StringLiteral): GPU architecture name (e.g., "sm_80", "gfx942").

Returns:

GPU info corresponding to the architecture name.

occupancy

occupancy(self, *, threads_per_block: Int, registers_per_thread: Int) -> SIMD[float64, 1]

Calculates theoretical occupancy for given thread and register config.

Occupancy represents the ratio of active warps to the maximum possible warps on a streaming multiprocessor.

Note: TODO (KERN-795): Add occupancy calculation based on shared memory usage and thread block size and take use the minimum value.

Args:

  • threads_per_block (Int): Number of threads in each block.
  • registers_per_thread (Int): Number of registers used by each thread.

Returns:

Occupancy as a ratio between 0.0 and 1.0.

write_to

write_to[W: Writer](self, mut writer: W)

Writes GPU information to a writer.

Outputs all GPU specifications and capabilities to the provided writer in a human-readable format.

Parameters:

  • W (Writer): The type of writer to use for output. Must implement the Writer trait.

Args:

  • writer (W): A Writer instance to output the GPU information.

__str__

__str__(self) -> String

Returns a string representation of the GPU information.

Converts all GPU specifications and capabilities to a human-readable string format.

Returns:

String containing all GPU information.