Mojo function

ldg

ldg[dtype: DType, //, width: Int = 1, *, alignment: Int = alignof[::AnyType,__mlir_type.!kgen.target]()](x: UnsafePointer[SIMD[dtype, 1]]) -> SIMD[dtype, width]

Load data from global memory through the non-coherent cache.

This function provides a hardware-accelerated global memory load operation that uses the GPU's non-coherent cache (equivalent to CUDA's __ldg instruction). It optimizes for read-only data access patterns.

Note:

Uses invariant loads which indicate the memory won't change during kernel execution.
Particularly beneficial for read-only texture-like access patterns.
May improve performance on memory-bound kernels.

Parameters:

dtype (DType): The data type to load (must be numeric).
width (Int): The SIMD vector width for vectorized loads.
alignment (Int): Memory alignment in bytes. Defaults to natural alignment of the SIMD vector dtype.

Args:

x (UnsafePointer): Pointer to global memory location to load from.

Returns:

SIMD: SIMD vector containing the loaded data.