LoopManagers

Documentation for LoopManagers.

LoopManagers.LoopManagers
LoopManagers.MainThread
LoopManagers.MultiThread
LoopManagers.PlainCPU
LoopManagers.SingleCPU
LoopManagers.VectorizedCPU
LoopManagers.KernelAbstractions_GPU
LoopManagers.distribute

LoopManagers.LoopManagers — Module

Module LoopManagers provides computing managers to pass to functions using the performance portability module ManagedLoops. It implements the API functions defined by ManagedLoops for the provided managers. Currently supported are SIMD and/or multithreaded execution on the CPU. Offloading to GPU via CUDA and oneAPI is experimental.

Additional iteration/offloading strategies (e.g. cache-friendly iteration) can be implemented by defining new manager types and implementing specialized versions of ManagedLoops.offload.

source

LoopManagers.MainThread — Type

manager = MainThread(cpu_manager=PlainCPU(), nthreads=Threads.nthreads())

Returns a multithread manager derived from cpu_manager, initially in sequential mode. In this mode, manager behaves exactly like cpu_manager. When manager is passed to ManagedLoops.parallel, nthreads threads are spawn. The manager passed to threads works in parallel mode. In this mode, manager behaves like cpu_manager, except that the outer loop is distributed among threads. Furthermore ManagedLoops.barrier and ManagedLoops.share allow synchronisation and data-sharing across threads.

main_mgr = MainThread()
LoopManagers.parallel(main_mgr) do thread_mgr
    x = LoopManagers.share(thread_mgr) do master_mgr
        randn()
    end
    println("Thread $(Threads.threadid()) has drawn $x.")
end

source

LoopManagers.MultiThread — Type

manager = MultiThread(b=PlainCPU(), nt=Threads.nthreads())

Returns a multithread manager derived from cpu_manager, with a fork-join pattern. When manager is passed to ManagedLoops.offload, manager.nthreads threads are spawn (fork). They each work on a subset of indices. Progress continues only after all threads have finished (join), so that barrier is not needed between two uses of offload and does nothing.

Tip

It is highly recommended to pin the Julia threads to specific cores. The simplest way is probably to set JULIA_EXCLUSIVE=1 before launching Julia. See also Julia Discourse

source

LoopManagers.PlainCPU — Type

manager = PlainCPU()

Manager for sequential execution on the CPU. LLVM will try to vectorize loops marked with @simd. This works mostly for simple loops and arithmetic computations. For Julia-side vectorization, especially of mathematical functions, see `VectorizedCPU'.

source

LoopManagers.SingleCPU — Type

abstract type SingleCPU<:HostManager end

Parent type for manager executing on a single core. Derived types should specialize distribute[@ref] or offload_single[@ref] and leave offload as it is.

source

LoopManagers.VectorizedCPU — Type

manager = VectorizedCPU()

Returns a manager for executing loops with optional explicit SIMD vectorization. Only inner loops marked with @vec will use explicit vectorization. If this causes errors, use @simd instead of @vec. Vectorization of loops marked with @simd is left to the Julia/LLVM compiler, as with PlainCPU.

Note

ManagedLoops.no_simd(::VectorizedCPU) returns a PlainCPU.

source

LoopManagers.KernelAbstractions_GPU — Function

gpu = KernelAbstractions_GPU(gpu::KernelAbstractions.GPU, ArrayType)
# examples
gpu = KernelAbstractions_GPU(CUDABackend(), CuArray)
gpu = KernelAbstractions_GPU(ROCBackend(), ROCArray)
gpu = KernelAbstractions_GPU(oneBackend(), oneArray)

Returns a manager that offloads computations to a KernelAbstractions GPU backend. The returned manager will call ArrayType(data) when it needs to transfer data to the device.

Note

While KA_GPU is always available, implementations of [offload] are available only if the module KernelAbstractions is loaded by the main program or its dependencies.

source

LoopManagers.distribute — Method

Divide work among vectorized CPU threads.

source