LoopManagers
Documentation for LoopManagers.
LoopManagers.LoopManagers
LoopManagers.MainThread
LoopManagers.MultiThread
LoopManagers.PlainCPU
LoopManagers.SingleCPU
LoopManagers.VectorizedCPU
LoopManagers.KernelAbstractions_GPU
LoopManagers.distribute
LoopManagers.LoopManagers
— ModuleModule LoopManagers
provides computing managers to pass to functions using the performance portability module ManagedLoops
. It implements the API functions defined by ManagedLoops
for the provided managers. Currently supported are SIMD and/or multithreaded execution on the CPU. Offloading to GPU via CUDA and oneAPI is experimental.
Additional iteration/offloading strategies (e.g. cache-friendly iteration) can be implemented by defining new manager types and implementing specialized versions of ManagedLoops.offload
.
LoopManagers.MainThread
— Typemanager = MainThread(cpu_manager=PlainCPU(), nthreads=Threads.nthreads())
Returns a multithread manager derived from cpu_manager
, initially in sequential mode. In this mode, manager
behaves exactly like cpu_manager
. When manager
is passed to ManagedLoops.parallel
, nthreads
threads are spawn. The manager
passed to threads works in parallel mode. In this mode, manager
behaves like cpu_manager
, except that the outer loop is distributed among threads. Furthermore ManagedLoops.barrier
and ManagedLoops.share
allow synchronisation and data-sharing across threads.
main_mgr = MainThread()
LoopManagers.parallel(main_mgr) do thread_mgr
x = LoopManagers.share(thread_mgr) do master_mgr
randn()
end
println("Thread $(Threads.threadid()) has drawn $x.")
end
LoopManagers.MultiThread
— Typemanager = MultiThread(b=PlainCPU(), nt=Threads.nthreads())
Returns a multithread manager derived from cpu_manager
, with a fork-join pattern. When manager
is passed to ManagedLoops.offload
, manager.nthreads
threads are spawn (fork). They each work on a subset of indices. Progress continues only after all threads have finished (join), so that barrier
is not needed between two uses of offload
and does nothing.
It is highly recommended to pin the Julia threads to specific cores. The simplest way is probably to set JULIA_EXCLUSIVE=1
before launching Julia. See also Julia Discourse
LoopManagers.PlainCPU
— Typemanager = PlainCPU()
Manager for sequential execution on the CPU. LLVM will try to vectorize loops marked with @simd
. This works mostly for simple loops and arithmetic computations. For Julia-side vectorization, especially of mathematical functions, see `VectorizedCPU'.
LoopManagers.SingleCPU
— Typeabstract type SingleCPU<:HostManager end
Parent type for manager executing on a single core. Derived types should specialize distribute
[@ref] or offload_single
[@ref] and leave offload
as it is.
LoopManagers.VectorizedCPU
— Typemanager = VectorizedCPU()
Returns a manager for executing loops with optional explicit SIMD vectorization. Only inner loops marked with @vec
will use explicit vectorization. If this causes errors, use @simd
instead of @vec
. Vectorization of loops marked with @simd
is left to the Julia/LLVM compiler, as with PlainCPU.
ManagedLoops.no_simd(::VectorizedCPU)
returns a PlainCPU
.
LoopManagers.KernelAbstractions_GPU
— Functiongpu = KernelAbstractions_GPU(gpu::KernelAbstractions.GPU, ArrayType)
# examples
gpu = KernelAbstractions_GPU(CUDABackend(), CuArray)
gpu = KernelAbstractions_GPU(ROCBackend(), ROCArray)
gpu = KernelAbstractions_GPU(oneBackend(), oneArray)
Returns a manager that offloads computations to a KernelAbstractions
GPU backend. The returned manager will call ArrayType(data)
when it needs to transfer data
to the device.
While KA_GPU
is always available, implementations of [offload
] are available only if the module KernelAbstractions
is loaded by the main program or its dependencies.
LoopManagers.distribute
— MethodDivide work among vectorized CPU threads.