CUDA Sorting¶
Accelerate provides routines for sorting arrays on CUDA GPUs.
Sorting Large Arrays¶
The accelerate.cuda.sorting.RadixSort class is recommended for
sorting large (approx. more than 1 million items) arrays of numeric types.
-
class
accelerate.cuda.sorting.RadixSort(maxcount, dtype, descending=False, stream=0)¶ Provides radix sort and radix select.
The algorithm implemented here is best for large arrays (
N > 1e6) due to the latency introduced by its use of multiple kernel launches. It is recommended to usesegmented_sortinstead for batches of smaller arrays.Parameters: - maxcount (int) – Maximum number of items to sort
- dtype (numpy.dtype) – The element type to sort
- descending (bool) – Sort in descending order?
- stream – The CUDA stream to run the kernels in
-
argselect(k, keys, begin_bit=0, end_bit=None)¶ Similar to
RadixSort.selectbut returns the new sorted indices.Parameters: - keys (numpy.ndarray) – Keys to sort inplace
- begin_bit (int) – The first bit to sort
- end_bit (int) – Optional. The last bit to sort
Returns: The indices indicating the new ordering as an array on the CUDA device or on the host.
-
argsort(keys, begin_bit=0, end_bit=None)¶ Similar to
RadixSort.sortbut returns the new sorted indices.Parameters: - keys (numpy.ndarray) – Keys to sort inplace
- begin_bit (int) – The first bit to sort
- end_bit (int) – Optional. The last bit to sort
Returns: The indices indicating the new ordering as an array on the CUDA device or on the host.
-
close()¶ Explicitly release internal resources
Called automatically when the object is deleted.
-
init_arg(size)¶ Initialize an empty CUDA ndarray of uint32 with ascending integers starting from zero
Parameters: size (int) – Number of elements for the output array Returns: An array with values [0, 1, 2, ...m size - 1 ]
-
select(k, keys, vals=None, begin_bit=0, end_bit=None)¶ Perform a inplace k-select on
keys.Memory transfer is performed automatically.
Parameters: - keys (numpy.ndarray) – Keys to sort inplace
- vals (numpy.ndarray) – Optional. Additional values to be reordered along the sort.
It is modified in place. Only the
uint32dtype is supported in this version. - begin_bit (int) – The first bit to sort
- end_bit (int) – Optional. The last bit to sort
-
sort(keys, vals=None, begin_bit=0, end_bit=None)¶ Perform a inplace sort on
keys. Memory transfer is performed automatically.Parameters: - keys (numpy.ndarray) – Keys to sort inplace
- vals (numpy.ndarray) – Optional. Additional values to be reordered along the sort.
It is modified in place. Only the
uint32dtype is supported in this version. - begin_bit (int) – The first bit to sort
- end_bit (int) – Optional. The last bit to sort
Sorting Many Small Arrays¶
Using accelerate.cuda.sorting.RadixSort on small (approx. less than
1 million items) arrays has significant overhead due to multiple kernel
launches.
A better alternative is to use accelerate.cuda.sorting.segmented_sort()-which launches a single kernel for sorting a batch of many small arrays.
-
accelerate.cuda.sorting.segmented_sort(keys, vals, segments, stream=0)¶ Performs an inplace sort on small segments (N < 1e6).
Parameters: - keys (numpy.ndarray) – Keys to sort inplace.
- vals (numpy.ndarray) – Values to be reordered inplace along the sort. Only the
uint32dtype is supported in this implementation. - segments (numpy.ndarray) – Segment separation locations in ascending order.
e.g.
array([3, 6, 8])for segments ofkeys[:3],keys[3:6],keys[6:8],keys[8:]. - stream – Optional. A cuda stream in which the kernels are executed.