Skip to main content
Log in

Mojo struct

DeviceFunction

struct DeviceFunction[func_type: AnyTrivialRegType, //, func: func_type, *, target: target = _get_gpu_target[::StringLiteral](), _ptxas_info_verbose: Bool = False]

Represents a compiled device function for GPU execution.

This struct encapsulates a compiled GPU function that can be launched on a device. It handles the compilation, loading, and resource management of device functions.

Example: ```mojo from gpu.host import DeviceContext, DeviceFunction

fn my_kernel(x: Int, y: Int):
# Kernel implementation
pass

var ctx = DeviceContext()
var kernel = ctx.compile_function[my_kernel]()
ctx.enqueue_function(kernel, grid_dim=(1,1,1), block_dim=(32,1,1))
```
fn my_kernel(x: Int, y: Int):
# Kernel implementation
pass

var ctx = DeviceContext()
var kernel = ctx.compile_function[my_kernel]()
ctx.enqueue_function(kernel, grid_dim=(1,1,1), block_dim=(32,1,1))
```

Parameters

  • func_type (AnyTrivialRegType): The type of the function to compile.
  • func (func_type): The function to compile for GPU execution.
  • target (target): The target architecture for compilation. Defaults to the current GPU target.
  • _ptxas_info_verbose (Bool): Whether to enable verbose PTX assembly output. Defaults to False.

Implemented traits

AnyType, UnknownDestructibility

Methods

__copyinit__

__copyinit__(out self, existing: Self)

Creates a copy of an existing DeviceFunction.

This increases the reference count of the underlying device function handle.

Args:

  • existing (Self): The DeviceFunction to copy from.

__moveinit__

__moveinit__(out self, owned existing: Self)

Moves an existing DeviceFunction into this one.

Args:

  • existing (Self): The DeviceFunction to move from.

__del__

__del__(owned self)

Releases resources associated with this DeviceFunction.

This decrements the reference count of the underlying device function handle.

dump_rep

dump_rep[dump_asm: Variant[Bool, Path, fn() capturing -> Path] = __init__[::CollectionElement](False), dump_llvm: Variant[Bool, Path, fn() capturing -> Path] = __init__[::CollectionElement](False), _dump_sass: Variant[Bool, Path, fn() capturing -> Path] = __init__[::CollectionElement](False)](self)

Dumps various representations of the compiled device function.

This method dumps the assembly, LLVM IR, and/or SASS code for the compiled device function based on the provided parameters. The output can be directed to stdout or written to files.

Note: When a path contains '%', it will be replaced with the module name to help disambiguate multiple kernel dumps.

Parameters:

  • dump_asm (Variant[Bool, Path, fn() capturing -> Path]): Controls dumping of assembly code. Can be a boolean, a file path, or a function returning a file path.
  • dump_llvm (Variant[Bool, Path, fn() capturing -> Path]): Controls dumping of LLVM IR. Can be a boolean, a file path, or a function returning a file path.
  • _dump_sass (Variant[Bool, Path, fn() capturing -> Path]): Controls dumping of SASS code (internal use). Can be a boolean, a file path, or a function returning a file path.

Raises:

If any file operations fail during the dumping process.

get_attribute

get_attribute(self, attr: Attribute) -> Int

Retrieves a specific attribute value from the compiled device function.

This method queries the device function for information about its resource requirements, execution capabilities, or other properties defined by the specified attribute.

Example:

```mojo
from gpu.host import Attribute, DeviceFunction

var device_function = DeviceFunction(...)

# Get the maximum number of threads per block for this function
var max_threads = device_function.get_attribute(Attribute.MAX_THREADS_PER_BLOCK)
```
.
```mojo
from gpu.host import Attribute, DeviceFunction

var device_function = DeviceFunction(...)

# Get the maximum number of threads per block for this function
var max_threads = device_function.get_attribute(Attribute.MAX_THREADS_PER_BLOCK)
```
.

Args:

  • attr (Attribute): The attribute to query, defined in the Attribute enum.

Returns:

The integer value of the requested attribute.

Raises:

If the attribute query fails or the attribute is not supported.