kernel void AtomicSum(global int* sum){ atomic_add(sum,1); }
Let's try to test this kernel running 1024x1024x128 threads:
int sum=0; cl::Buffer bufferSum = cl::Buffer(context, CL_MEM_READ_WRITE, 1 * sizeof(float)); queue.enqueueWriteBuffer(bufferSum, CL_TRUE, 0, 1 * sizeof(int), &sum); cl::Kernel kernel=cl::Kernel(program, "AtomicSum"); kernel.setArg(0,bufferSum); queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(1024*1024*128), cl::NullRange); queue.finish(); queue.enqueueReadBuffer(bufferSum,CL_TRUE,0,1 * sizeof(int),&sum); std::cout << "Sum: " << sum << "\n";
Expected sum is: 134217728.
Our test machine uses OpenCL implementation from AMD. CPU is Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz, GPU is AMD Mobility Radeon HD 5470. How much time should this code take on CPU and how much on GPU? We usually expect that operations on GPU are much faster than operations on CPU. Are they really faster? Our test returned next results:
- CPU: 1.809s
- GPU: 3.262s
This can be quite unexpected. Is it possible to speed up whole thing? Short answer is yes. OpenCL supports utilization of local memory (on chip) which is much faster than global memory. Let's change previous kernel AtomicSum:
kernel void AtomicSum(global int* sum){ local int tmpSum[1]; if(get_local_id(0)==0){ tmpSum[0]=0; } barrier(CLK_LOCAL_MEM_FENCE); atomic_add(&tmpSum[0],1); barrier(CLK_LOCAL_MEM_FENCE); if(get_local_id(0)==(get_local_size(0)-1)){ atomic_add(sum,tmpSum[0]); } }
This kernel does atomic add at level of work groups by utilizing local memory. At the end each work group does atomic add on global memory (last thread). This approach lovers the access to global memory. It looks promising as the results look too:
- CPU: 0.815s
- GPU: 0.24s
Speedup on GPU is now more that 10x. On CPU is also not so bad. Overall this is quite a nice speedup. Can we do it even faster? Let's assume that atomic operations on local memory have significant costs to. This cost can be lowered by using more local memory, where each thread tries to do atomic add at different memory locations:
kernel void AtomicSum(global int* sum){ local int tmpSum[4]; if(get_local_id(0)<4){ tmpSum[get_local_id(0)]=0; } barrier(CLK_LOCAL_MEM_FENCE); atomic_add(&tmpSum[get_global_id(0)%4],1); barrier(CLK_LOCAL_MEM_FENCE); if(get_local_id(0)==(get_local_size(0)-1)){ atomic_add(sum,tmpSum[0]+tmpSum[1]+tmpSum[2]+tmpSum[3]); } }
We got again nice speedup, but it's not four times faster than expected:
- CPU: 0.858s
- GPU: 0.173s
We found out that atomics cost quite some time. It's recommended to omit atomics on global memory. Atomics at local memory are better but they are always also not the best solution. This applies especially to GPUs, as they can run much more threads in parallel that CPUs. Global atomics on CPUs don't have so big impact on performance. This means that same code can run even faster on CPU than on GPU.
Interesting post. It would be helpful if you could discuss the hardware architecture, because atomic performance varies strongly across devices. Also using 32 or 64 bit implementations has a strong effect on performance.
ReplyDeletec programming example codes
ReplyDeletec coding calculate Area of a Polygon
Thanks for Sharing This Article.this very halp for develop openCL concept.It is very so much valuable content."Nice blog I really appreciate your words,Nice post. It is really amazing and helpful.
ReplyDeleteDevOps Training in Chennai
DevOps Online Training in Chennai
DevOps Training in Bangalore
DevOps Training in Hyderabad
DevOps Training in Coimbatore
DevOps Training
DevOps Online Training
Java is a good choice for many reasons. It's certainly a nicer language than PHP. You have great dev tools for Java - IDEs, build systems, CI tools, Containers, Databases both SQL and NoSQL, REST service frameworks, JSON parsers.
ReplyDeletehttps://www.acte.in/java-training-in-chennai
https://www.acte.in/java-training-in-bangalore
https://www.acte.in/java-training-in-hyderabad
https://www.acte.in/java-training-in-coimbatore
https://www.acte.in/java-training
At least for now. No. The majority of all Android Apps, libraries, tutorials and books is still Java and Kotlin is far behind. If you like to use Java for Android development then just do it.
ReplyDeleteJava Training in Chennai
Java Training in Bangalore
Java Training in Hyderabad
Java Training in Coimbatore
Java Training
nice post, and useful article
ReplyDeleteSoftware Testing Training in Chennai | Certification | Online
Courses
Software Testing Training in Chennai
Software Testing Online Training in Chennai
Software Testing Courses in Chennai
Software Testing Training in Bangalore
Software Testing Training in Hyderabad
Software Testing Training in Coimbatore
Software Testing Training
Software Testing Online Training
Thanks for Sharing This Article.this very halp for develop openCL concept.It is very so much valuable content."Nice blog I really appreciate your words,Nice post. It is really amazing and helpful.
ReplyDeleteAWS Course in Chennai
AWS Course in Bangalore
AWS Course in Hyderabad
AWS Course in Coimbatore
AWS Course
AWS Certification Course
AWS Certification Training
AWS Online Training
AWS Training
Thank you for the information. It is very useful and informative
ReplyDeleteangular js course in chennai
angular course in chennai
angular js online course in chennai
angular js course in bangalore
angular js course in hyderabad
angular js course in coimbatore
angular js course
angular js online course
Great post mate, thank you for the valuable and useful information.
ReplyDeleteacte chennai
acte complaints
acte reviews
acte trainer complaints
acte trainer reviews
acte velachery reviews complaints
acte tambaram reviews complaints
acte anna nagar reviews complaints
acte porur reviews complaints
acte omr reviews complaints
This is a very useful blog.we are knowing a great content and also lots of information.
ReplyDeleteWeb Designing Training in Bangalore
Web Designing Course in Bangalore
Web Designing Training in Hyderabad
Web Designing Course in Hyderabad
Web Designing Training in Coimbatore
Web Designing Training
Web Designing Online Training
I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
ReplyDeletepython training in chennai
python course in chennai
python online training in chennai
python training in bangalore
python training in hyderabad
python online training
python training
python flask training
python flask online training
python training in coimbatore
Android development is a bit more complex than web development. For Android programming, the Java language is used that required more codding as compared with the iOS Swift programming. ... For example, if you want to learn iOS programming Swift then you must have a MacBook with you for the mobile app learning.thanks !!
ReplyDeleteAndroid Training in Chennai
Android Online Training in Chennai
Android Training in Bangalore
Android Training in Hyderabad
Android Training in Coimbatore
Android Training
Android Online Training
after reading this web site I am very satisfied simply because this site is providing comprehensive knowledge for you to audience.
ReplyDeleteCyber Security Training Course in Chennai | Certification | Cyber Security Online Training Course | Ethical Hacking Training Course in Chennai | Certification | Ethical Hacking Online Training Course |
CCNA Training Course in Chennai | Certification | CCNA Online Training Course | RPA Robotic Process Automation Training Course in Chennai | Certification | RPA Training Course Chennai | SEO Training in Chennai | Certification | SEO Online Training Course
very intresting page which is very use full for the readers to understand the basic and fundamental terms.
ReplyDeleteoracle training in bangalore
oracle training in hyderabad
oracle training
oracle online training
hadoop training in chennai
hadoop training in bangalore
Betway Casino - Review & Ratings | JT Hub
ReplyDeleteRead the full Betway Casino review & find out about its 청주 출장안마 games, mobile app and 하남 출장마사지 banking options. 세종특별자치 출장마사지 See 밀양 출장샵 everything you need to know to get Rating: 용인 출장안마 4 · Review by JT Hub
V-Ray 5 for SketchUp Cracked gives you an essential collection of free ready-to-render assets and HDRI skies. Plus, it has boosted CPU denoising with Intel Open .V-Ray License Key
ReplyDeleteMovavi Video Editor 23.0.1 Crack is a great video editing software. It is still a user-friendly system for Windows. Movavi Video Editor Plus Crack
ReplyDelete