Skip to main content

Mojo function

conv3d

conv3d(input: Symbol, filter: Symbol, stride: Tuple[Int, Int, Int] = VariadicPack(<store_to_mem({1}), store_to_mem({1}), store_to_mem({1})>, 1), dilation: Tuple[Int, Int, Int] = VariadicPack(<store_to_mem({1}), store_to_mem({1}), store_to_mem({1})>, 1), padding: Tuple[Int, Int, Int, Int, Int, Int] = VariadicPack(<store_to_mem({0}), store_to_mem({0}), store_to_mem({0}), store_to_mem({0}), store_to_mem({0}), store_to_mem({0})>, 1), groups: Int = 1) -> Symbol

Computes the 3-D convolution product of the input with the given filter, strides, dilations, paddings, and groups.

The op supports 3-D convolution, with the following layout assumptions:

  • input has NDHWC layout, i.e., (batch_size, depth, height, width, in_channels)
  • filter has layout RSCF, i.e., (depth, height, width, in_channels / num_groups, out_channels)

The padding values are expected to take the form (pad_dim1_before, pad_dim1_after, pad_dim2_before, pad_dim2_after...) and represent padding 0's before and after the indicated spatial dimensions in input. In 2-D convolution, dim1 here repesents H and dim2 represents W. In Python like syntax, padding a 2x3 spatial input with [0, 1, 2, 1] would yield:

input = [
[1, 2, 3],
[4, 5, 6]
]
# Shape is 2x3

padded_input = [
[0, 0, 1, 2, 3, 0],
[0, 0, 4, 5, 6, 0]
[0, 0, 0, 0, 0, 0]
]
# Shape is 3x6
input = [
[1, 2, 3],
[4, 5, 6]
]
# Shape is 2x3

padded_input = [
[0, 0, 1, 2, 3, 0],
[0, 0, 4, 5, 6, 0]
[0, 0, 0, 0, 0, 0]
]
# Shape is 3x6

This op currently only supports strides and padding on the input.

Args:

  • input (Symbol): An NDHWC input tensor to perform the convolution upon.
  • filter (Symbol): The convolution filter in RSCF layout: (height, depth, width, in_channels / num_groups, out_channels).
  • stride (Tuple[Int, Int, Int]): The stride of the convolution operation.
  • dilation (Tuple[Int, Int, Int]): The spacing between the kernel points.
  • padding (Tuple[Int, Int, Int, Int, Int, Int]): The amount of padding applied to the input.
  • groups (Int): When greater than 1, divides the convolution into multiple parallel convolutions. The number of input and output channels must both be divisible by the number of groups.

Returns:

A symbolic tensor value with the convolution applied.