This site contains OpenCL notes, tutorials, benchmarks, news.

Tuesday, April 9, 2013

Calling kernels with many parameters

Suppose we have an OpenCL kernel with 10 parameters. In order to call the kernel we need to call clSetKernelArg 10 times:
clSetKernelArg(kernel, 0, sizeof(cl_mem), &deviceMemory0);
clSetKernelArg(kernel, 1, sizeof(cl_mem), &deviceMemory1);
clSetKernelArg(kernel, 2, sizeof(cl_mem), &deviceMemory2);
clSetKernelArg(kernel, 3, sizeof(cl_mem), &deviceMemory3);
clSetKernelArg(kernel, 4, sizeof(cl_mem), &deviceMemory4);
clSetKernelArg(kernel, 5, sizeof(cl_mem), &deviceMemory5);
clSetKernelArg(kernel, 6, sizeof(cl_mem), &deviceMemory6);
clSetKernelArg(kernel, 7, sizeof(cl_mem), &deviceMemory7);
clSetKernelArg(kernel, 8, sizeof(cl_mem), &deviceMemory8);
clSetKernelArg(kernel, 9, sizeof(cl_mem), &deviceMemory9);

This is not so elegant solution. Official C++ binding to OpenCL, which is available at http://www.khronos.org/registry/cl/, solves most of the problems. First solution would be to simply use C++ binding:
kernel.setArg(0,deviceMemory0);
kernel.setArg(1,deviceMemory1);
kernel.setArg(2,deviceMemory2);
kernel.setArg(3,deviceMemory3);
kernel.setArg(4,deviceMemory4);
kernel.setArg(5,deviceMemory5);
kernel.setArg(6,deviceMemory6);
kernel.setArg(7,deviceMemory7);
kernel.setArg(8,deviceMemory8);
kernel.setArg(9,deviceMemory9);





As we can see, there is no need to specify the size of parameters (it's autodetected). The kernel is an object of the class cl::Kernel. deviceMemory0 to deviceMemory9 are objects of the cl::Buffer. But with this solution one problem persist. We still need to call setArg 10 times. C++ binding has solution for this too (it's not official yet ;) ). It comes with name KernelFunctor (KernelFunctorGlobal in OpenCL C++ binding 1.2):
cl::KernelFunctor kernelFunctor(kernel, queue, offset, globalSize, localSize);
kernelFunctor(deviceMemory0, deviceMemory1, deviceMemory2, deviceMemory3, deviceMemory4,
 deviceMemory5, deviceMemory6, deviceMemory7, deviceMemory8, deviceMemory9);

This is quite short now. At first we create the kernel functor and then we use operator () to set parameters to kernel and call it. If we look inside cl.hpp, we see many definitions of operator (). Number of parameters ranges from 0 to 15. Such issue can be solved with variadic templates in C++11:
inline void _setKernelParameters(cl::Kernel &k,int i){}//do nothing, terminating function

template<typename T, typename... Args>
inline void _setKernelParameters(cl::Kernel &kernel,int i, const T &firstParameter, const Args& ...restOfParameters){
    kernel.setArg(i, firstParameter);
    _setKernelParameters(kernel,i+1,restOfParameters...);
}

template<typename... Args>
inline void setKernelParameters(cl::Kernel &kernel,const Args& ...args){
    _setKernelParameters(kernel, 0, args...);//first number of parameter is 0
}
In order to set parameters to our kernel we need to use only the next line:
setKernelParameters(kernel, deviceMemory0, ...deviceMemory9);
I don't know exactly why OpenCL C++ binding doesn't use this approach. Possible reason can be backward compatibility with older compilers (C++11 is quite new thing).

2 comments: