OpenCL: CUDA

Sunday, September 29, 2013

Dynamic parallelism in OpenCL 2.0

Provisional specifications of OpenCL 2.0 were released few months ago. One of the very interesting features is support for dynamic parallelism. In CUDA world it already exist for about a year but still only on the most expensive devices with compute capability 3.5 (Titan, GTX780; booth with chip GK110). On AMD side it a little bit different story. They didn't talk anything about dynamic parallelization but on the other side they introduced GCN 2.0 which might have support for it. In addition they introduced Mantle - a new GPU API which promises up to 9 times more draw calls than comparable API's (OpenGL, DirectX). This might smell that draw calls might be called from the GPU itself.

How will be dynamic parallelization used? Very simple. Kernels will enque kernels to a device queue:

Wednesday, April 10, 2013

OpenCL and Blender (Cycles)

It seems that OpenCL is not so important for Blender community (Blender 2.66a). Cycles engine works quite nice with CUDA but when you try to turn on the OpenCL support you need at first to set CYCLES_OPENCL_TEST environment variable. When done you might think that everything will work as it should, but it doesn't. When trying to render something I got next compile errors:

"/tmp/OCLpiZAxQ.cl", line 27089: error: expected a ")"

        int shader, int object, int prim, float u, float v, float t, float time, int segment = ~0)

                                                                                             ^

"/tmp/OCLpiZAxQ.cl", line 27226: error: too few arguments in function call

        shader_setup_from_sample(kg, sd, P, Ng, I, shader, object, prim, u, v, 0.0f, TIME_INVALID);

                                                                                                 ^

"/tmp/OCLpiZAxQ.cl", line 28436: error: too few arguments in function call

                        shader_setup_from_sample(kg, &sd, ls->P, ls->Ng, I, ls->shader, ls->object, ls->prim, u, v, t, time);

They are saying at http://wiki.blender.org/index.php/Dev:2.6/Source/Render/Cycles/OpenCL that drivers for OpenCL are not mature enough. But according http://www.luxrender.net/luxmark/ this is not the case. They have quite stable OpenCL renderer which can even work in GPU+CPU mode.

The problem I see with Cycles renderer is that they use to big kernel. This is no go for GPU computing in basic concept. Why? Register pressure is not equal all accross the kernel (yes I know, you can save registers to global memory too). Some sections of kernel can be executed suboptimally. Such problems might be partly solved with Dynamic parallelism but what about backward compatibility? And please don't forget that GPUs rock at SIMD (SIMT) paradigm. And should we use GPU registers more for arithmetic raw power or rather to make development easier?