71 Replies Latest reply: Jul 15, 2013 1:25 AM by ash RSS

How to implement cl_khr_icd?

ash Newbie
Currently Being Moderated

Hi,

 

I want to run my application on both intel CPU and nvidia GPU. As I read some other posts, I clearly understood that I need 2 SDK for this configuration so I chose : AMD SDK and NVIDIA SDK. A part from the "FATA error no flgrx found", when I run my application it can find the 2 devices.

But how can I load dynamically the good library? I heard of the extension cl_khr_icd but I can't manage to understand ho to use it. Can anybody help please?

Best regards,

 

Jacq

  • Re: How to implement cl_khr_icd?
    himanshu.gautam Master
    Currently Being Moderated

    End users need not worry about ICD. You can link to one OpenCL runtime and that will load the other runtimes and daisy-chain them transparently.

    When you query the platforms -- you should get both platforms listed. Then, everything is fine.

    Just select your platform, create the context and get going...

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

    • Re: How to implement cl_khr_icd?
      ash Newbie
      Currently Being Moderated

      Hi,

       

      Thanks for your quick reply. Sorry, I didn't explain well my case. Actually the application is for some clients ( doing an internship in a company ). So I don't know in advance if the client has installed both SDK. How can I find which one is installed? Because let's say it has only Nvidia GPU then I should load the libOpenCL.so from Nvidia, how will the program know? And also when I look for platforms with Nvidia it doesn't see my CPU only the GPU is found. Whereas the AMD SDK can see both.

      My question might be dumb so please excuse my ignorance, but it's kind of confusing for me.

      • Re: How to implement cl_khr_icd?
        himanshu.gautam Master
        Currently Being Moderated

        Hi,

         

        Just link your app against libOpenCL.so on your local machine and ship the app.

         

        When your client runs the code, it will try to load "OpenCL" library. Whatever library (nvidia or amd or intel etc..) is in the LD_LIBRARY_PATH (or) in the standard system search path will be loaded. Now, this library (by the virtue of ICD) will load all other installed platforms transparently. Your app can query the platforms and get going.

         

        AMD as a company ships both AMD CPUs as well as AMD Radeon GPUs. AMD's OpenCL SDK will support both CPU and GPU devices. However, companies like NVIDIA who sell only GPUs will expose only the GPU device. There is nothing wrong with it.

         

        Your app should find out the platforms installed, whether the devices are GPUs or CPUs, how many CUs they have etc... and decide what devices it should work on.

        Note that: an OpenCL context can only be formed out of devices from one platform.

        If you intend to work on multiple platforms, you need to separately create contexts and partition your problem manually among the different platforms.

         

        Hope this is clear.

        Regards

        Himanshu , Bruhaspati

        --------------------------------

        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

        • Re: How to implement cl_khr_icd?
          himanshu.gautam Master
          Currently Being Moderated

          Just a quick addition:

          Test your app before you ship to the client.

          And, Please inform the client to make sure that relevant OpenCL libraries are in the LD_LIBRARY_PATH (or) system search path. Otherwise, they will face "failed to load shared libraries" error.

          Just be aware.

          This is a classic "redistribution" problem. You will have to just take your app and run on a different machine and make sure it runs when all libraries are found.

          Regards

          Himanshu , Bruhaspati

          --------------------------------

          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

          • Re: How to implement cl_khr_icd?
            ash Newbie
            Currently Being Moderated

            Thanks a lot for your reply, it's much more clearer. I just still have one doubt : if by linking with one library openCL it will find other all others platforms. Then I don't understand why by putting the NVidia library in the LIBRARY_PATH the AMD samples don't work anymore. Is is because of the openCL version ( 1.2 vS 1.1) ? What should I do to make them work again with this setup?

            • Re: How to implement cl_khr_icd?
              himanshu.gautam Master
              Currently Being Moderated

              Mixing NVIDIA and AMD platforms should technically work. Can you tell which sample did not work?

               

              Mixing 1.1 and 1.2 is a problem if you are using 1.2 APIs. For example: "clinfo" might seg-fault if you do that.

              Some samples may not work

               

              You can do 2 things here:

              1. Run a simple sample that does not use 1.2 API

              2. Write your own application to query and list the platforms that you have.

              Regards

              Himanshu , Bruhaspati

              --------------------------------

              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

              • Re: How to implement cl_khr_icd?
                ash Newbie
                Currently Being Moderated

                Yes it's true, clinfo gave me en error when I first linked with AMD SDK and now it prints with Nvidia:

                clinfo: relocation error: clinfo: symbol clRetainDevice, version OPENCL_1.2 not defined in file libOpenCL.so.1 with link time reference

                For the sampels,actually you're right :

                - some  don't run because of the openCL version 1.2 such as  :

                GaussianNoise, DeviceFission, HDRToneMapping, ImageOverlap, MatrixMulDouble, SimpleImage, SobelFilterImage, TransferOverlapCPP.

                - And some don't run because they can't find the libGlew such as :

                FluidSimulation, MandelBrot, NBody, SimpleGL, NoiseGL.

                - And finally the samples where GPU is compulsory to run don't run, such as :

                ImageBandWidth, BufferBandWidth, SimpleMultiDevice.

                Others run just fine, except this error : FATAL: Module fglrx not found.

                I know thatsomeone already explained it on the forum which will probably taken care of in the next release : 

                But where is it? I'd like to hide it since it's kind of frightening for a client I suppose

                Thanks again for your precious help, I think I'm making progress.

                • Re: How to implement cl_khr_icd?
                  himanshu.gautam Master
                  Currently Being Moderated

                  Hi,

                  Thats good amount of detail.

                  I don't understand why "BufferBandwidth" cannot work. If it sees an AMD GPU, it should work.

                  So, You have an x86 CPU + NVIDIA GPU combo?

                   

                  Regarding the fatal error, you are right. It will be fixed in a subsequent driver release.

                   

                  You can just wrap it up in a shell script as

                  "./yourApp 2>/dev/null"

                  I hope the library is printing to stderr... but doing this will also remove other messages to stderr.

                   

                  OR may be, [./yourApp 2>&1 | grep -v -i "FATAL: Module fglrx" ] might help.

                  Regards

                  Himanshu , Bruhaspati

                  --------------------------------

                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                  • Re: How to implement cl_khr_icd?
                    ash Newbie
                    Currently Being Moderated

                    Hi again,

                     

                    For the BufferBandWith the message error is :

                    Platform found : Advanced Micro Devices, Inc.

                    This sample requires a GPU to be present in order to execute

                    And it just stops like that. I haven't yet looked deep in the code but maybe i'll find some clues as why it doesn't find my GPU.

                    To be precise my combo is : Intel Xeon E5430 64bits and Nvidia GTX 650. I also added another GPU Nvidia and by command line I can switch my app on the device I want, which is really nice. I couldn't manage to use your command line to hide the error message. I have "ambiguous redirection", maybe because I have an argument for my program.

                    I'll look into that at least it gives me some ideas.

                    • Re: How to implement cl_khr_icd?
                      himanshu.gautam Master
                      Currently Being Moderated

                      Hi,

                      It is quite possible that buffer-bandwidth is possibly looking for AMD devices. You need to look at the place where the "context" is created and on which platform.

                      It is quite possible that buffer bandwidth locates the AMD platform and tries to create context and AMD devices.

                      I will check the code sometime later.. meanwhile, if you can go through the code, you can find it out yourself.

                      Regards

                      Himanshu , Bruhaspati

                      --------------------------------

                      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                      • Re: How to implement cl_khr_icd?
                        ash Newbie
                        Currently Being Moderated

                        Hi,

                        I looked into the code and I now understand why it didn't work.

                        • First, it looks for an AMD platform with:

                        [code] if (!strcmp(platformName, "Advanced Micro Devices, Inc.")) [/code]

                        • An then tries to get the second device in the devices' list. But since I only have one device for AMD, this call fails.

                        [code] ret = clGetDeviceIDs( platform, devs[1], 128, devices, &num_devices );

                            if((ret == CL_DEVICE_NOT_FOUND) || (num_devices == 0))

                            {

                                fprintf( stderr, "This sample requires a GPU to be present in order to execute");

                                exit(0);

                            }

                        [/code]

                        • Re: How to implement cl_khr_icd?
                          himanshu.gautam Master
                          Currently Being Moderated

                          Jacq Jay wrote:

                           

                          Hi,

                          I looked into the code and I now understand why it didn't work.

                          • First, it looks for an AMD platform with:

                          [code] if (!strcmp(platformName, "Advanced Micro Devices, Inc.")) [/code]

                          APP SDK samples choose the AMD platoform (if present) or the default platform (platforms[0]).  So the sample should run on NVIDIA hardware.

                           

                          An then tries to get the second device in the devices' list. But since I only have one device for AMD, this call fails.

                          [code] ret = clGetDeviceIDs( platform, devs[1], 128, devices, &num_devices );

                              if((ret == CL_DEVICE_NOT_FOUND) || (num_devices == 0))

                              {

                                  fprintf( stderr, "This sample requires a GPU to be present in order to execute");

                                  exit(0);

                              }

                          [/code]

                          devs is not the device number, but just an array of cl_device_type. devs[1] means CL_DEVICE_TYPE_GPU.

                          So at least these are not the real reasons for its failure. I will try to reproduce it , If i get a NV GPU at my disposal.

                          Regards

                          Himanshu , Bruhaspati

                          --------------------------------

                          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                          • Re: How to implement cl_khr_icd?
                            ash Newbie
                            Currently Being Moderated

                            Yes, you're right ,sorry my mistake.

                            But since it first looks for an AMD platform and then looks for a GPU device for this platform, then it's normal that it fails, isn't it?

                            • Re: How to implement cl_khr_icd?
                              himanshu.gautam Master
                              Currently Being Moderated

                              Well you are having AMD APP SDK installed, so AMD platform is selected (which only contains the intel CPU as a device).

                              Most of the samples can still be run using "-p 1" commandline option, but IIRC that option is not available for BufferBandwidth sample.

                              I guess as of now, you can just edit the code, and search for NVIDIA's platoform_vendor string.

                              Regards

                              Himanshu , Bruhaspati

                              --------------------------------

                              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                  • Re: How to implement cl_khr_icd?
                    ash Newbie
                    Currently Being Moderated

                    Hi again,

                    I ran into another problem still regarding compatibility in my opinion. Actually, to make development easier I wanted to use the C++ wrapper and so include cl.hpp instead of cl.h. But I have many errors of this type :

                    test.cpp:(.text+0xd50): undefined reference to `clReleaseDevice'

                    I think the problem is coming from the conflict between OpenCl 1.2 from AMD and OpenCL 1.1 from Nvidia. But I don't know how to solve this. Should I keep using cl.h like before then?

                     

                    Best regards,

                    Jacq

                    • Re: How to implement cl_khr_icd?
                      himanshu.gautam Master
                      Currently Being Moderated

                      This is a known problem. Will be fixed by the Khronos group I believe...

                       

                      Also, It is better if you use an AMD GPU for development.

                      Regards

                      Himanshu , Bruhaspati

                      --------------------------------

                      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                      • Re: How to implement cl_khr_icd?
                        ash Newbie
                        Currently Being Moderated

                        Ok thanks a lot for you reply. Unfortunately working with AMD won't be possible since everybody is already working on Nvidia GPU. So for the moment, if I want to keep the same setup, I can't use the C++ wrapper right?

                        • Re: How to implement cl_khr_icd?
                          LeeHowes Apprentice
                          Currently Being Moderated

                          The latest C++ wrapper from the Khronos site should fix this issue. It is an annoyance with the ICD design that it cannot cope with missing functions in the underlying platform. We work around this by versioning individual devices directly in the reference counting wrapper in cl.hpp.

                          Lee Howes
                          Advanced Micro Devices Inc.

                          --------------------------------

                          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

                          • Re: How to implement cl_khr_icd?
                            ash Newbie
                            Currently Being Moderated

                            Hi,

                             

                            I copied Khronos ' cl.hpp from their site (in section opencl 1.2 specification) in both CL folders ( AMD and NVIDIA ).

                            Well, it seems that the HPP include works fine for the moment, at least with my little program

                            Thanks for your advices.

                             

                            Best regards,

                            Jacq

                          • Re: How to implement cl_khr_icd?
                            ash Newbie
                            Currently Being Moderated

                            Hi hi,

                            Sorry to bother you,  but I'm really stuck with my program. I wanted to use the sum reduction kernel with OpenCL. The strange thing is that it's giving the proper result when I compute on GPU whereas it's completely wrong on CPU and even gives some corrupted memory error.. I have probably missed something important but I can't figure out what, since previous programs (doing vector addition with no shared data) were working fine on CPU and GPU. Could you give me some clues? I can post the code if needed.

                             

                            Best regards,

                             

                            Jacq

                            • Re: How to implement cl_khr_icd?
                              LeeHowes Apprentice
                              Currently Being Moderated

                              The most important thing is to check your synchronization primitives. You may have places where you forgot a barrier and luckily the code worked on the GPU because 64 work items are packed into a single vector thread and run synchronously. When you run it on the CPU it instead serializes that set of work items so the side effects will be seen in a different order. Check every point where you write to memory that is shared by the work items in a work group and see if you are making sure all other work items wait for that data to have been written.


                              Lee

                              Lee Howes
                              Advanced Micro Devices Inc.

                              --------------------------------

                              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

                              • Re: How to implement cl_khr_icd?
                                ash Newbie
                                Currently Being Moderated

                                Hi,

                                 

                                I checked and I put barrier(CLK_LOCAL_MEM_FENCE) where it was needed, following the sample example.

                                Here is the kernel code :

                                __kernel void OCLIntegrityTest_kernel(__global float *a_g_idata, __global float *a_g_odata)

                                {  

                                     __local float ocl_test_sdata[64];

                                 

                                    // perform first level of reduction,

                                    // reading from global memory, writing to shared memory

                                    const unsigned int tid = get_local_id(0);

                                    const unsigned int i = get_group_id(0)*(get_local_size(0)*2) + get_local_id(0);

                                    ocl_test_sdata[tid] = log(exp(sqrt(a_g_idata[i])))  +  log(exp(sqrt(a_g_idata[i+get_local_size(0)]))) ;

                                    barrier(CLK_LOCAL_MEM_FENCE);

                                 

                                    // do reduction in shared mem

                                    for(unsigned int s=get_local_size(0)/2; s>0; s>>=1)

                                    {

                                        if (tid < s)

                                        {

                                            ocl_test_sdata[tid] += ocl_test_sdata[tid + s];

                                        }

                                        barrier(CLK_LOCAL_MEM_FENCE);

                                    }

                                    // write result for this block to global mem

                                    if (tid == 0)

                                        a_g_odata[get_group_id(0)] = ocl_test_sdata[0];

                                }

                                 

                                And on the host, I create the OpenCL context and all necessary stuff. Finally from the output array, I sum the elements to have the final result. I think I have problems with dimensions, most probably with the enqueueNDRangeKernelFunction or maybe to read the output array ( the size might be wrong too ).  Here is what I used  :

                                cl::NDRange global(OCLINTEGRITY_NUMS);

                                cl::NDRange local(OCLINTEGRITY_WORK_ITEMS);

                                t_err = m_command_queue.enqueueNDRangeKernel(m_kernel, 0, global, local, NULL, NULL);

                                m_command_queue.enqueueReadBuffer(m_output_buffer, CL_TRUE, 0, OCLINTEGRITY_WORK_GROUPS * sizeof(float), m_h_output, NULL, NULL);

                                 

                                Where OCLINTEGRITY_NUMS = 1024 ( size of the input array),

                                and OCLINTEGRITY_WORK_GROUPS = OCLINTEGRITY_NUMS/(OCLINTEGRITY_WORK_ITEMS*2)

                                 

                                I'm still searching the answer but if anybody finds it obvious please give me little help.

                                 

                                Best regards,

                                 

                                Jacq

                                • Re: How to implement cl_khr_icd?
                                  himanshu.gautam Master
                                  Currently Being Moderated

                                  Your code looks very fine. I hope NUM Work Items is 64.

                                  Can you confirm that?

                                  Regards

                                  Himanshu , Bruhaspati

                                  --------------------------------

                                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                  • Re: How to implement cl_khr_icd?
                                    ash Newbie
                                    Currently Being Moderated

                                    Yes it's 64. Then I really don't know where it comes from

                                    • Re: How to implement cl_khr_icd?
                                      himanshu.gautam Master
                                      Currently Being Moderated

                                      Hi,

                                      Can you please upload zip file so that I can reproduce the problem here.

                                      If we find this to be a problem with CPU Compiler or Runtime, we will work to fix the problem

                                       

                                      Also, please give the configuration of your CPU

                                      Is it from AMD or Intel? Model number, how many cores etc.. will help.

                                      Regards

                                      Himanshu , Bruhaspati

                                      --------------------------------

                                      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                      • Re: How to implement cl_khr_icd?
                                        ash Newbie
                                        Currently Being Moderated

                                        Hi,

                                         

                                        I'm trying to use gDEbugger to find some clues.

                                        Is it possible to send you the code in private?

                                        You need sources and the system configuration right?

                                         

                                        Best regards,

                                         

                                        Jacq

                                        • Re: How to implement cl_khr_icd?
                                          ash Newbie
                                          Currently Being Moderated

                                          Hi again,

                                          I may have a clue to why it's not working but still confused. I had a class stocking some variables like that :

                                            cl::Device m_device;

                                              cl::Platform m_platform;

                                              cl::Kernel m_kernel;

                                              cl::CommandQueue m_command_queue;

                                              cl::Buffer m_output_buffer;

                                          So that can I use them in different fonctions. In a first function "initOCL", I initilalise these values in that way :

                                          m_command_queue = cl::CommandQueue ( context, m_device, 0, &err );

                                          And then I reuse these variables in another function "runTest"  (where I call enqueueNDRangeKernel and enqueueReadBuffer).

                                          When doing this i get memory corruption for AMD while running fine on NVIDIA as I said before.

                                          But the strange thing is that when I call enqueueNDRangeKernel and enqueueReadBuffer in the first function ( everything in the same place), it's working fine for both platforms.

                                          Can it be the the problem? I find it very weird. What do I do wrong? Seems like a silly mistake but can't get what I did in the wrong way. Hope somebody will be able to help.

                                           

                                          Best regards,

                                          Jacq

                                          • Re: How to implement cl_khr_icd?
                                            nou Expert
                                            Currently Being Moderated

                                            there is/was pitfall in C++ binding when you could create cl::Program without proper OpenCL context as it grabs some default context. it caused really weird error for me. so check all you function binding C++ call to their C counterparts if you pass all needed parameters. it is okay if C++ binding have default value for some parameters.

                                        • Re: How to implement cl_khr_icd?
                                          himanshu.gautam Master
                                          Currently Being Moderated

                                          Please post the code as a zipped attachment.

                                          Regards

                                          Himanshu , Bruhaspati

                                          --------------------------------

                                          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                          • Re: How to implement cl_khr_icd?
                                            ash Newbie
                                            Currently Being Moderated

                                            Hi everybody,

                                             

                                            I fixed the problem. That was indeed a really silly mistake. The input memory buffer and even the program were destroyed before calling the function enqueueNDRangerKernel to lauch the kernel. Weird that it didn't disturb nvidia though. Sorry for the bother, just a beginner's mistake, and thanks for your help.

                                             

                                            Best regards,

                                            Jacq

                                            • Re: How to implement cl_khr_icd?
                                              himanshu.gautam Master
                                              Currently Being Moderated

                                              Good to know you fixed the problem. And, Thanks for letting us know about this. Good luck!

                                               

                                              Other implementations might have got some hidden reference count and probably they did not destroy those objects... but its just a guess...

                                              Regards

                                              Himanshu , Bruhaspati

                                              --------------------------------

                                              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                              • Re: How to implement cl_khr_icd?
                                                ash Newbie
                                                Currently Being Moderated

                                                Yeah Thank you Himanshu ^ ^

                                                I looked over the forum for some information on GDB and found some tutorials but I'm facing some problem. I can lauch gdb easily, put a breakpoint at clEnqueueNDRangeKernel, run,  and then I can put a breakpoint at the call at my kernel function (seems to be good).

                                                But then when I continue, the program doesn't break on the call of the function and give me this warning instead:

                                                warning temporarily disabling breakpoints for unloaded shared library

                                                I spent sometimes looking for an explanation but still stuck. Do you have any idea by chance?

                                                 

                                                Regards,

                                                Jacq

                                                • Re: How to implement cl_khr_icd?
                                                  himanshu.gautam Master
                                                  Currently Being Moderated

                                                  GDB?? Use CodeXL. Thats the preferred recommended tool from AMD.

                                                   

                                                  It has GUI and rocks, works on Linux as well.

                                                  Regards

                                                  Himanshu , Bruhaspati

                                                  --------------------------------

                                                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                  • Re: How to implement cl_khr_icd?
                                                    ash Newbie
                                                    Currently Being Moderated

                                                    Yeah it looks really nice but I don't have an AMD GPU

                                                    Well I could try for the CPU maybe.

                                                    • Re: How to implement cl_khr_icd?
                                                      ash Newbie
                                                      Currently Being Moderated

                                                      Could you tell me if codeXL can debug kernel even if I don't have AMD hardware? From the page product it seems for me that it only supports full AMD hardware, can you confirm?

                                                      If not, do you know where the GDB problem I exposed above, comes from?

                                                       

                                                      Best regards,

                                                      Jacq

                                                      • Re: How to implement cl_khr_icd?
                                                        ash Newbie
                                                        Currently Being Moderated

                                                        Hi all,

                                                        I installed CodeXL and I'm really disappointed. I can't do any debug. It's asking for an AMD GPU even though I want to run on CPU. I can't even watch my variables passing in the buffer. In the paper of CodeXL it was written any x64 CPU  then why is it blocking? Doesn't really rocks for me.

                                                         

                                                        Best regards,

                                                        Jacq

                                                        • Re: How to implement cl_khr_icd?
                                                          himanshu.gautam Master
                                                          Currently Being Moderated

                                                          It can help if you can give more information about your problem. Can you try this link http://samritmaity.wordpress.com/2009/11/20/debugging-opencl-program-with-gdb/ for running GDB.

                                                          Also I would recommmend you to start a discusion in CodeXL forum category for reporting forum category. Please mention the version you are using, and exact steps you performed.

                                                          Regards

                                                          Himanshu , Bruhaspati

                                                          --------------------------------

                                                          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                          • Re: How to implement cl_khr_icd?
                                                            ash Newbie
                                                            Currently Being Moderated

                                                            Hi,

                                                            Ok thanks for the link I was looking at this one also . I'll try on CodeXL forum.

                                                            By the way I found an old GPU card,  it's an  ATI Radeon X850 XT. From my research it seems to not support OpenCL, can you confirm?

                                                            Best regards,

                                                            Jacq

                                                            • Re: How to implement cl_khr_icd?
                                                              himanshu.gautam Master
                                                              Currently Being Moderated

                                                              By the way I found an old GPU card,  it's an  ATI Radeon X850 XT. From my research it seems to not support OpenCL, can you confirm?

                                                               

                                                              The card does not support OpenCL. Please get atleast a HD 5xxx card, although the new GCN architecture (certainly recomended) is only available for 77xx,78xx & 79xx series.

                                                              Regards

                                                              Himanshu , Bruhaspati

                                                              --------------------------------

                                                              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                              • Re: How to implement cl_khr_icd?
                                                                ash Newbie
                                                                Currently Being Moderated

                                                                Ok I'll see if I can get one but might be hard.

                                                                I wanted to start a new discussion on CodeXL forum but I didn't succeed. I had some message error saying I didn't have the right to do that, is that normal?

                                                                • Re: How to implement cl_khr_icd?
                                                                  himanshu.gautam Master
                                                                  Currently Being Moderated

                                                                  I would suggest you to wait for 1-2 days. The issue should come up live on forum, if you have submitted it once. Because of heavy spam, strict moderations were done a few weeks back. Otherwise, you can create a thread here.

                                                                  Regards

                                                                  Himanshu , Bruhaspati

                                                                  --------------------------------

                                                                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                                  • Re: How to implement cl_khr_icd?
                                                                    ash Newbie
                                                                    Currently Being Moderated

                                                                    Hi again,

                                                                    I created a topic on CodeXL forum and it seems like an AMD GPU is compulsory for kernel debugging with CodeXL. I got an AMD GPU ( HD 6450) to test codeXL and I'm kind of stuck. By the way , the drivers I installed are Catalyst 13.4.

                                                                    First of all while running my program on this AMD GPU I get strange results :

                                                                    If  I remove printf in the kernel I don't get the good results, but if I let them, it seems to work fine since I get the good result.

                                                                    But to use codeXL I have to remove printf. I can lauch the kernel and put some breakpoints but when I look at my variables in the buffer it's the same value everywhere and completely wrong it's like 1501213 (instead of 0<x<1023).

                                                                    Is there some problem with float? I'm really lost I don't understand those errors at all. Hope someone could help.

                                                                     

                                                                    Best regards,

                                                                    ash

                                                                    • Re: How to implement cl_khr_icd?
                                                                      himanshu.gautam Master
                                                                      Currently Being Moderated

                                                                      I am kindaa surprised the code works if printf is enabled...This looks like a bug in your code.

                                                                      Can you post your code so that we can take a look?

                                                                      Regards

                                                                      Himanshu , Bruhaspati

                                                                      --------------------------------

                                                                      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                                      • Re: How to implement cl_khr_icd?
                                                                        ash Newbie
                                                                        Currently Being Moderated

                                                                        Hi,

                                                                        Yeah seems like a bug in my code again I'll ask if I can post. But I had to give back the AMD GPU and after the Amd drivers  were un-installed, the icd loader in /etc/OpenCL/vendors was missing. So I put it back but still when launching samples or some of my programs, it doesn't see my CPU as an OpenCL device anymore. What should I do? Can you please help. All the samples only see the GPU I don't even have the message : "AMD GPU not found falling back to CPU" like before.

                                                                         

                                                                        Best regards,

                                                                        ash

                                                                        • Re: How to implement cl_khr_icd?
                                                                          ash Newbie
                                                                          Currently Being Moderated

                                                                          Hi,

                                                                           

                                                                          I don't know what was the problem but reinstalling AMD SDK fixed this problem.

                                                                           

                                                                          You're right there really is something wrong somewhere : I tested my code on another GPU ( NVIDIA Quadro 290) and I also get wrong results for the reduction.

                                                                          What I don't understand is that the power of opencl is that it should enable us to launch the same code on whatever compatible device. Does it has something to do with dimensioning, like local and global parameters? I tried to test different values resulting in freezing my GPU and had to reboot.

                                                                          If the array to reduce has 1024 elements, then the global arg to the kernel in enqueueNDRangekernel should be of size 1024.

                                                                          And for the local argument, which is also the size of my local array shared between threads from the same workgroup I gave 64.

                                                                          I get the same good results for GPU NVIDIA GTX 650 and CPU Intel Xeon E5430 but wrong result for the small GPU. Looking at the output the sum is much more smaller , 2.6 smaller actually.

                                                                          It doesn't crash with this setup but obviously there is something wrong. I checked error after each step (context, program, command queue, writtebuffer, etc...) and I don't see where the problem is.

                                                                           

                                                                          Best regards,

                                                                          ash

                                                                          • Re: How to implement cl_khr_icd?
                                                                            himanshu.gautam Master
                                                                            Currently Being Moderated

                                                                            Please post the code here. The Advanced editor is working now.

                                                                            Regards

                                                                            Himanshu , Bruhaspati

                                                                            --------------------------------

                                                                            The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                                            • Re: How to implement cl_khr_icd?
                                                                              ash Newbie
                                                                              Currently Being Moderated

                                                                              Hi,

                                                                              The advanced editor is not working " can't find the page" and I'm not able to upload the file either. How can I do?

                                                                               

                                                                              Best regards,

                                                                              ash

                                                                              • Re: Re: How to implement cl_khr_icd?
                                                                                ash Newbie
                                                                                Currently Being Moderated

                                                                                The advance editor is not working correctly, I had to reply to my own post to get the page. And I can't upload a file either.

                                                                                If you could test on an AMD GPU it could be a great help. Because I tested when I put the printf ( #define DEBUG_OCL) it was working fine but when I didn't put any printf( comment #DEBUG_OCL) it gave wrong results and didn't pass my test.

                                                                                 

                                                                                HOST CODE:

                                                                                #include <iostream>
                                                                                #include <CL/cl.hpp>
                                                                                #include <iomanip>
                                                                                #include <cmath>
                                                                                #include <fstream>
                                                                                #define DEBUG_OCL // /!\ This option is only available for AMD Platform
                                                                                #define OCLINTEGRITY_NUMS 1024
                                                                                #define OCLINTEGRITY_WORK_ITEMS 64
                                                                                #define OCLINTEGRITY_WORK_GROUPS (OCLINTEGRITY_NUMS/(OCLINTEGRITY_WORK_ITEMS*2))
                                                                                
                                                                                
                                                                                int main(int argc, char* argv[])
                                                                                {
                                                                                    float* m_h_input = new float[OCLINTEGRITY_NUMS];
                                                                                    float* m_h_output = new float[OCLINTEGRITY_WORK_GROUPS];
                                                                                    cl_int err;
                                                                                
                                                                                    // Init input array
                                                                                    for( int i = 0; i < OCLINTEGRITY_NUMS; i++ )
                                                                                    {
                                                                                        m_h_input[i] = i;
                                                                                    }
                                                                                
                                                                                    /*
                                                                                     *
                                                                                     * CHANGE HERE FOR THE TYPE OF DEVICE YOU WANT TO USE
                                                                                     *
                                                                                     */
                                                                                    cl_device_type type = CL_DEVICE_TYPE_GPU;
                                                                                    std::string platform_name = "AMD";
                                                                                
                                                                                    std::vector<cl::Platform> platforms;
                                                                                    std::vector<cl::Device> devices;
                                                                                    cl::Platform::get(&platforms);
                                                                                    cl::Platform platform;
                                                                                
                                                                                    //Look for specified platform
                                                                                    for(size_t i=0; i <platforms.size(); ++i)
                                                                                    {
                                                                                        std::string val;
                                                                                        platforms[i].getInfo(CL_PLATFORM_NAME, &val);
                                                                                        if(val.find(platform_name) != std::string::npos)
                                                                                        {
                                                                                            std::cout<<"Platform name found "<<val<<std::endl;
                                                                                            platform = platforms[i];
                                                                                        }
                                                                                    }
                                                                                
                                                                                    if(platform.getDevices(type,&devices)!= CL_SUCCESS)
                                                                                    {
                                                                                        std::cerr<<"Error: No device found !"<<std::endl;
                                                                                        return -1;
                                                                                    }
                                                                                
                                                                                    cl::Device m_device = devices[0];
                                                                                
                                                                                    std::string val;
                                                                                    if(m_device.getInfo(CL_DEVICE_NAME, &val) != CL_SUCCESS)
                                                                                    {
                                                                                        std::cerr<<"Error: Can't get device name"<<std::endl;
                                                                                        return false;
                                                                                    }
                                                                                    std::cout<<"--> Choosen Device name: "<<val<<std::endl;
                                                                                
                                                                                    // Read source file
                                                                                    std::ifstream sourceFile("kernel.cl");
                                                                                    std::string sourceCode(
                                                                                    std::istreambuf_iterator<char>(sourceFile),(std::istreambuf_iterator<char>()));
                                                                                    cl::Program::Sources source(1, std::make_pair(sourceCode.c_str(), sourceCode.length()+1));
                                                                                
                                                                                    // Create an OpenCL context
                                                                                    cl::Context context(devices, NULL, NULL, NULL, &err);
                                                                                    if (err != CL_SUCCESS)
                                                                                    {
                                                                                        std::cout << "Error: Can't create context" << std::endl;
                                                                                        return false;
                                                                                    }
                                                                                
                                                                                    // Create a command queue
                                                                                     cl::CommandQueue command_queue(context, m_device, 0, &err);
                                                                                    if (err != CL_SUCCESS)
                                                                                    {
                                                                                        std::cout << "Error: Failed to create commandQueue " << err << "\n";
                                                                                        return false;
                                                                                    }
                                                                                    std::string options="";
                                                                                
                                                                                #ifdef DEBUG_OCL
                                                                                #    warning "DEBUG MODE :  make sure you use AMD platform"
                                                                                    options += "-g -DDEBUG_AMD";
                                                                                #endif
                                                                                
                                                                                    // Build programm
                                                                                    cl::Program program(context, source, &err);
                                                                                    err = program.build(devices, options.c_str());
                                                                                    if (err != CL_SUCCESS)
                                                                                    {
                                                                                        std::cerr << "Error : Failed to build program " << std::endl;
                                                                                        std::cerr << program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(m_device)<< std::endl;
                                                                                        return false;
                                                                                    }
                                                                                
                                                                                    // Create memory buffers on the device for input and output values
                                                                                    cl::Buffer input_buffer(context, CL_MEM_READ_ONLY, OCLINTEGRITY_NUMS * sizeof(float), NULL, &err);
                                                                                    cl::Buffer output_buffer(context, CL_MEM_WRITE_ONLY, OCLINTEGRITY_WORK_GROUPS * sizeof(float), NULL, &err);
                                                                                    if (err != CL_SUCCESS)
                                                                                    {
                                                                                        std::cerr << "Error: Failed to create memory buffers " << err << "\n";
                                                                                        return false;
                                                                                    }
                                                                                
                                                                                    // Copy input to memory buffer
                                                                                    err = command_queue.enqueueWriteBuffer(input_buffer, CL_TRUE, 0, OCLINTEGRITY_NUMS * sizeof(float), m_h_input, NULL, NULL);
                                                                                    if (err != CL_SUCCESS)
                                                                                    {
                                                                                        std::cerr << "Error: Failed to copy to buffer " << err << "\n";
                                                                                        return false;
                                                                                    }
                                                                                
                                                                                    // Create Kernel
                                                                                    cl::Kernel kernel(program, "reduce_kernel", &err);
                                                                                    err = kernel.setArg(0, input_buffer);
                                                                                    err = kernel.setArg(1, output_buffer);
                                                                                    err = kernel.setArg(2, cl::Local(OCLINTEGRITY_WORK_ITEMS));
                                                                                    if (err != CL_SUCCESS)
                                                                                    {
                                                                                        std::cerr << "Error: Failed to build kernel " << err << "\n";
                                                                                        return false;
                                                                                    }
                                                                                
                                                                                    // Execute the OpenCL kernel on the list
                                                                                    cl::NDRange global(OCLINTEGRITY_NUMS);
                                                                                    cl::NDRange local(OCLINTEGRITY_WORK_ITEMS);
                                                                                
                                                                                    err = command_queue.enqueueNDRangeKernel(kernel, 0, global, local, NULL, NULL); //Run the kernel
                                                                                    if(err!=CL_SUCCESS)
                                                                                    {
                                                                                        std::cerr << "Error: Failed to execute kernel " << err << "\n";
                                                                                        return -1;
                                                                                    }
                                                                                    //Copy data from buffer to host memory
                                                                                    err = command_queue.enqueueReadBuffer(output_buffer, CL_TRUE, 0, OCLINTEGRITY_WORK_GROUPS * sizeof(float), m_h_output, NULL, NULL);
                                                                                    if(err!=CL_SUCCESS)
                                                                                    {
                                                                                        std::cerr << "Error: Failed to read buffer " << err << "\n";
                                                                                        return -1;
                                                                                    }
                                                                                    err = command_queue.finish();
                                                                                
                                                                                    //Sum blocks
                                                                                    double gpu_sum = 0.0;
                                                                                    for (unsigned int i = 0; i < OCLINTEGRITY_WORK_GROUPS; ++i)
                                                                                    {
                                                                                        gpu_sum += m_h_output[i];
                                                                                        std::cout << m_h_output[i] << std::endl;
                                                                                    }
                                                                                
                                                                                    std::cout<<"parallel sum "<<std::setprecision(6)<<gpu_sum<<std::endl;
                                                                                
                                                                                    //Compute on CPU
                                                                                    float reference = 0.f;
                                                                                    for (int i = 0; i < OCLINTEGRITY_NUMS; ++i)
                                                                                        reference += log(exp(sqrt(i)));
                                                                                
                                                                                    // Compare CPU - OpenCL Device
                                                                                    const float err_sum = fabsf(gpu_sum - reference);
                                                                                    if (err_sum < 10e-1f)
                                                                                    {
                                                                                        std::cout << "SUCCESS!\n";
                                                                                
                                                                                    }
                                                                                    else
                                                                                    {
                                                                                        std::cout << "ERROR : " << err_sum << std::endl;
                                                                                    }
                                                                                
                                                                                    //Release memory
                                                                                    delete[] m_h_input;
                                                                                    delete[] m_h_output;
                                                                                
                                                                                    return 0;
                                                                                }
                                                                                

                                                                                 

                                                                                KERNEL:

                                                                                __kernel void reduce_kernel(__global float *a_g_idata, __global float *a_g_odata, __local float* ocl_test_sdata)
                                                                                {
                                                                                    // perform first level of reduction,
                                                                                    // reading from global memory, writing to shared memory
                                                                                    const unsigned int tid = get_local_id(0);
                                                                                    const unsigned int i = get_group_id(0)*(get_local_size(0)*2) + get_local_id(0);
                                                                                   
                                                                                
                                                                                    if(i+get_local_size(0)<1024)
                                                                                    {
                                                                                        ocl_test_sdata[tid] = log(exp(sqrt(a_g_idata[i])))  +  log(exp(sqrt(a_g_idata[i+get_local_size(0)]))) ;
                                                                                #ifdef DEBUG_AMD
                                                                                        printf("---KERNEL input[%d] = %f \n",i, a_g_idata[i]);
                                                                                #endif
                                                                                
                                                                                    }
                                                                                    
                                                                                    barrier(CLK_LOCAL_MEM_FENCE);
                                                                                    
                                                                                    // do reduction in shared mem   
                                                                                    for(unsigned int s=get_local_size(0)/2; s>0; s>>=1) 
                                                                                    {
                                                                                        if (tid < s) 
                                                                                        {
                                                                                            ocl_test_sdata[tid] += ocl_test_sdata[tid + s];
                                                                                        }
                                                                                        barrier(CLK_LOCAL_MEM_FENCE);
                                                                                    }
                                                                                
                                                                                    // write result for this block to global mem 
                                                                                    if (tid == 0)
                                                                                    { 
                                                                                        a_g_odata[get_group_id(0)] = ocl_test_sdata[0];
                                                                                #ifdef DEBUG_AMD
                                                                                        printf("output[%d] : %f\n",get_group_id(0),a_g_odata[get_group_id(0)]);
                                                                                #endif
                                                                                
                                                                                    }
                                                                                    
                                                                                    
                                                                                }
                                                                                

                                                                                 

                                                                                          

                                                                                Best regards,

                                                                                ash

                                                                                • Re: How to implement cl_khr_icd?
                                                                                  himanshu.gautam Master
                                                                                  Currently Being Moderated

                                                                                  Few Comments:

                                                                                  1. I do not see the need to launch 1024 work-items for reducing 1024 elements. And then using conditions inside kernel, which disables half the thread directly. Why not launch 512 threads only.

                                                                                  2. use get_global_id(0). The group_id method may be right, but is very confusing (with that 2 inside it).

                                                                                   

                                                                                  Just rewriting the small section of kernel.

                                                                                  Global Size:512, Local Size=64

                                                                                   

                                                                                   

                                                                                  int gid = get_global_id(0);

                                                                                  int lid = get_local_id(0);

                                                                                  int grp_id = get_group_id(0);

                                                                                  int grp_size = get_group_size(0);

                                                                                  if(gid < 512)

                                                                                  {

                                                                                  // 3 versions for varying access pattern. Just check once before using, not tested

                                                                                       //ocl_test_sdata[lid] = log(exp(sqrt(a_g_idata[gid])))  +  log(exp(sqrt(a_g_idata[gid + get_global_size(0)]))) ; 

                                                                                  //ocl_test_sdata[lid] = log(exp(sqrt(a_g_idata[2 * gid])))  +  log(exp(sqrt(a_g_idata[2 * gid + 1]))) ; 

                                                                                  //ocl_test_sdata[lid] = log(exp(sqrt(a_g_idata[(2 * grp_id) * grp_size + lid])))  + 

                                                                                                           log(exp(sqrt(a_g_idata[(2 * grp_id + 1) * grp_size + lid]))) ; 

                                                                                  }

                                                                                  Regards

                                                                                  Himanshu , Bruhaspati

                                                                                  --------------------------------

                                                                                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                                                  • Re: How to implement cl_khr_icd?
                                                                                    ash Newbie
                                                                                    Currently Being Moderated

                                                                                    Hi,

                                                                                    Thanks for the comments I'll try that, I think I mixed the local parameter that we pass to the enqueueNDRangeKernel function and the total number of elements that should be computed. I thought it was the same but from what you told it's not really the same.

                                                                                    Another question, were you able to test my code on an AMD GPU to see of the test passed even if you disable printf?

                                                                                    I'd be reassured if my code run on NVIDIA and AMD GPU correctly.

                                                                                    Also, could you please tell me how to post code as a zipped attachment?

                                                                                    Have a nice day.

                                                                                     

                                                                                    Best regards,

                                                                                    ash

                                                                                    • Re: How to implement cl_khr_icd?
                                                                                      himanshu.gautam Master
                                                                                      Currently Being Moderated


                                                                                      Your code returns SUCCESS with/witout using printf. Here is the output when debug was disabled.

                                                                                       

                                                                                      C:\Users\cas\Desktop\reduce>host.exe
                                                                                      Platform name found AMD Accelerated Parallel Processing
                                                                                      --> Choosen Device name: Capeverde
                                                                                      959.575
                                                                                      1762.89
                                                                                      2284.09
                                                                                      2705.42
                                                                                      3069.08
                                                                                      3393.84
                                                                                      3690.06
                                                                                      3964.17
                                                                                      parallel sum 21829.1
                                                                                      SUCCESS!

                                                                                      Regards

                                                                                      Himanshu , Bruhaspati

                                                                                      --------------------------------

                                                                                      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                                                      • Re: How to implement cl_khr_icd?
                                                                                        ash Newbie
                                                                                        Currently Being Moderated

                                                                                        Good to know, thanks a lot!

                                                                                        Then maybe the problem was from the AMD GPU I got. I'll try to test on another one if possible later.

                                                                                        I'm now porting a CUDA application to OpenCL and I encountered some problems. I don't know if you're familiar with Cuda, I'm facing some diffculties to "translate" tex3D and textures in OpenCL. I read about cl::Image so I think that I choose use that to pass data to the kernel but it's not very clear.

                                                                                        • Re: How to implement cl_khr_icd?
                                                                                          himanshu.gautam Master
                                                                                          Currently Being Moderated

                                                                                          You are right. Look into cl::image, you can checkout some APP SDK Samples (although most of them have been written without OpenCL C++ wrapper). SimpleImage, MatrixMulImage are a few to name.

                                                                                          Regards

                                                                                          Himanshu , Bruhaspati

                                                                                          --------------------------------

                                                                                          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                                                          • Re: How to implement cl_khr_icd?
                                                                                            ash Newbie
                                                                                            Currently Being Moderated

                                                                                            Hi,

                                                                                            I have a small question about cl::Image3D. When you enqueueWriteImage it asks for an origin and a region.

                                                                                            If I want to read the whole image, then the region should be defined as (width,height,depth), isn't it?

                                                                                            • Re: How to implement cl_khr_icd?
                                                                                              himanshu.gautam Master
                                                                                              Currently Being Moderated

                                                                                              region defines the (width, height, depth) in pixels of the 2D or 3D rectangle being read or written. If image is a 2D image object, the depth value given by region[2] must be 1.

                                                                                              From the khronos C++ wrapper document.

                                                                                              Regards

                                                                                              Himanshu , Bruhaspati

                                                                                              --------------------------------

                                                                                              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                                                              • Re: How to implement cl_khr_icd?
                                                                                                ash Newbie
                                                                                                Currently Being Moderated

                                                                                                Ok then it should be fine, sorry for the bother.

                                                                                                I have (again) another question : in the CUDA code that I'm porting there is a CudaPitchPtr. I read the specs and when you create a 3D image, it's said that you can pass the row_pitch which should be the equivalent of the  host_ptr.pitch.

                                                                                                but What about the xSize and ySize seems like slice_pitch but not too sure. Also I really don't know what to give as a host_ptr when I construct the 3d Image. I think I should allocate an array for the size of the image which means 3 dimensions but seems like in cuda they allocate for a 3D array dimension. I hope you could help I'm kind of lost.

                                                                                                Cuda :

                                                                                                    cudaPitchedPtr h_ptr;

                                                                                                        h_ptr.pitch = volume_size.width*sizeof(float);

                                                                                                        h_ptr.xsize = volume_size.width;

                                                                                                        h_ptr.ysize = volume_size.height;

                                                                                                OpenCL :

                                                                                                cl::Image3D(context, CL_MEM_READ_ONLY, fmt, width, height, depth,

                                                                                                          row_pitch, // = row_pitch = height*sizeof(float);

                                                                                                          slice_pitch, //?

                                                                                                          host_ptr); //?

                                                                                                    

                                                                                                I hope I'm not too far but some help would be pretty well welcomed.

                                                                                                 

                                                                                                best regards,

                                                                                                ash

                                                                                                • Re: How to implement cl_khr_icd?
                                                                                                  nou Expert
                                                                                                  Currently Being Moderated

                                                                                                  you can pass pitch parameters as 0 then OpenCL will compute proper value automatically as row_pitch = width*sizeof(pixel type) and slice_pitch=height*row_pitch

                                                                                                  • Re: How to implement cl_khr_icd?
                                                                                                    ash Newbie
                                                                                                    Currently Being Moderated

                                                                                                    Then I "only" have to allocate memory for the host pointer?

                                                                                                    So If I have a 3D image I have to allocate memory for a 3D Array? Sorry if my question is dumb but i haven't really understood yet.

                                                                                                    • Re: How to implement cl_khr_icd?
                                                                                                      nou Expert
                                                                                                      Currently Being Moderated

                                                                                                      yes? I am not sure what exactly are you asking. What else memory you want to allocate?

                                                                                                      • Re: How to implement cl_khr_icd?
                                                                                                        ash Newbie
                                                                                                        Currently Being Moderated

                                                                                                        Hi,

                                                                                                        No it's okay I was just confused, but it's the same with buffer object, when you use the flag CL_MEM_ALLOC_HOST_PTR

                                                                                                        May I ask if you know some good sources that could help me for programming a kernel using gaussian smooth on a 3D image ?

                                                                                                         

                                                                                                        Best regards,

                                                                                                        ash

                                                                                                        • Re: How to implement cl_khr_icd?
                                                                                                          ash Newbie
                                                                                                          Currently Being Moderated

                                                                                                          Hi,

                                                                                                          Very tiny question : in a for loop where I call my kernel, if I change some argument' s value, do I have to reset the argument with setArg function or is it done automatically?

                                                                                                           

                                                                                                          Best regards,

                                                                                                          ash

                                                                                                          • Re: How to implement cl_khr_icd?
                                                                                                            nou Expert
                                                                                                            Currently Being Moderated

                                                                                                            what you mean by reset. kernel remember argument until it is changed via clSetKernelArg(). you can change only one arg and enqueue kernel and it will run with this new value and other arguments will have old value.

                                                                                                            • Re: How to implement cl_khr_icd?
                                                                                                              ash Newbie
                                                                                                              Currently Being Moderated

                                                                                                              Ok thanks. Yes what I meant by re-set was to give a new value

                                                                                                              Have a nice day.

                                                                                                              • Re: How to implement cl_khr_icd?
                                                                                                                ash Newbie
                                                                                                                Currently Being Moderated

                                                                                                                Hi everybody,

                                                                                                                I'm really down. I had to change the include from /usr/local/cuda/include ( NVidia  folder) to /opt/AMDAPP/include ( AMD folder). The thing is that, the both cl.hpp files are exactly the same (copied the latest version form Khronos registry) so why do I get errors when pointing the include path to amd folder?

                                                                                                                I hope somebdy could help and I have absolutely no clue.

                                                                                                                 

                                                                                                                 

                                                                                                                Regards,

                                                                                                                ash

                                                                                                                • Re: How to implement cl_khr_icd?
                                                                                                                  nou Expert
                                                                                                                  Currently Being Moderated

                                                                                                                  cl.hpp include cl.h and other headers. so chect those too. and what error do you get?

                                                                                                                • Re: How to implement cl_khr_icd?
                                                                                                                  himanshu.gautam Master
                                                                                                                  Currently Being Moderated

                                                                                                                  What is the error that you get? Without specifying the error, we really cannot help you out here -- as much as we want to.

                                                                                                                  -

                                                                                                                  Bruha

                                                                                                                  Regards

                                                                                                                  Himanshu , Bruhaspati

                                                                                                                  --------------------------------

                                                                                                                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                                                                                  • Re: How to implement cl_khr_icd?
                                                                                                                    ash Newbie
                                                                                                                    Currently Being Moderated

                                                                                                                    Hi,

                                                                                                                     

                                                                                                                    Previously with a test program I had good results on NVIDIA but the same code was giving memory leaks or wrong results on AMD CPU. Then I found out that some objects were desallocated before the enqueueNDRange call, and corrected it. My code was then working fine on AMD CPU and NVIDIA GPU. Now, and I really don't know why my code is not running on NVIDIA anymore, I have the message : memory corrupted free some libgcc detected error and it crash.

                                                                                                                    I'll try to run my code on another computer and tell you what I find. It doesn't seem to come from the code anymore, or at least I hope so . Should be some library or some systems linking problems. Because, it seems that  people are able to run the same code on NVIDIA  gpu and AMD cpu without any problem.

                                                                                                                    The difference form the code above, is that I used the #define __CL_ENABLE_EXCEPTIONS( for error handling) and that my function doing the test takes in argument a reference to a device. Is that a wrong thing to do?

                                                                                                                    Going on further investigation.

                                                                                                                     

                                                                                                                    if somebody has any idea, meanwhile I'll test and come back later.

                                                                                                                     

                                                                                                                    Regards,

                                                                                                                    ash

                                                                                                                    • Re: How to implement cl_khr_icd?
                                                                                                                      himanshu.gautam Master
                                                                                                                      Currently Being Moderated

                                                                                                                      If you think that __CL_ENABLE_EXCEPTIONS is not working, try this simple program at http://www.thebigblob.com/using-the-cpp-bindings-for-opencl/

                                                                                                                      Can you run it properly?

                                                                                                                      Regards

                                                                                                                      Himanshu , Bruhaspati

                                                                                                                      --------------------------------

                                                                                                                      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                                                                                                                      • Re: How to implement cl_khr_icd?
                                                                                                                        ash Newbie
                                                                                                                        Currently Being Moderated

                                                                                                                        Hi,

                                                                                                                        No it's not coming from the __CL_ENABLE_EXCEPTIONS. I have some other test program using this macro that were working fine.

                                                                                                                        My code was running fine on another computer. I think it's some lib links problems. I have to focus on something else for now, so I'll come back to this after I finish.

                                                                                                                        Thanks for your help.

                                                                                                                         

                                                                                                                        Best regards,

                                                                                                                        ash

                                                                                                                        • Re: How to implement cl_khr_icd?
                                                                                                                          ash Newbie
                                                                                                                          Currently Being Moderated

                                                                                                                          Hi everybody,

                                                                                                                          It's been a long time. I've been doing my work leaving the AMD problem I had for now. And I have some questions about the convolution. I want to use the FFT implementation for a convolution. Since I still work on NVidia device I read here that it's better to use Apple's clFFT. What library or implementation do you recommand me to use to work on Nvidia GPU with the C++ OpenCL bindings ?

                                                                                                                          Regards,

                                                                                                                           

                                                                                                                          ash

                            • Re: How to implement cl_khr_icd?
                              himanshu.gautam Master
                              Currently Being Moderated

                              Please post the code as a zipped attachment if the issue is still not resolved.

                              Regards

                              Himanshu , Bruhaspati

                              --------------------------------

                              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

More Like This

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points