12 Replies Latest reply: Apr 28, 2013 10:09 PM by himanshu.gautam RSS

Multiple contexts parallel allocating or writing to memory of a single device

chevydevil Newbie
Currently Being Moderated

Hello, I have a program which uses openmp to schedule work in parallel to one opencl device i.e a gpu. This is done right now by using multiple contexts and which have there own unique queues and buffers. The program stops after some iteration steps. I mean it just stops, without exiting or segmentation fault or something. Could it be that the allocation from multiple contexts is not thread safe? Do I have to use one context and a queue for each thread (which is my choice for the future anyway)? Btw. this only happens on a GPU device. CPU devices work fine.

 

Thx in advance.

  • Re: Multiple contexts parallel allocating or writing to memory of a single device
    himanshu.gautam Expert
    Currently Being Moderated

    Multiple threads operating on a context is supported from OpenCL 1.1. All OpenCL calls  are thread-safe except "clSetKernelArg". Even with this API, multiple threads can still work with unique cl_kernel objects. However, they cannot wok with the same cl_kernel object at the same time. So, per-thread allocation of "cl_kernel" object will help overcome this issue.

    Check Appendix A.2 of OpenCL Spec. So, as long as your platform is OpenCL 1.1 or later, you can use just 1 context and allow all your openmp threads to work.

     

    However, if multiple threads are reading/writing shared "cl_mem" objects across multiple command queues -- then this can result in undefined behaviour. Check Appendix A.1 of the OpenCL Spec. That will help resolve all your doubts.

     

    Now coming to the issue you are facing,

    I am not sure what you mean the program stops...but no seg-fault. You may want to first find out until which point the application is running. (or) Please post your sources as a standalone zip file which we can use to reproduce here.

    You need to also specify the following:

    1. Platform - win32 / win64 / lin32 / lin64 or some other?

        Win7 or win vista or Win8.. Similarly for linux, your distribution

    2. Version of driver

    3. CPU or GPU Target?

    4. CPU/GPU details of your hardware


    THanks,

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

  • Re: Multiple contexts parallel allocating or writing to memory of a single device
    chevydevil Newbie
    Currently Being Moderated

    It has been a while, but my problem still exists. My above responses weren't accurate because the remote access didn't use the GPU but only found the CPU. The classic healess problem. I am now able to access the GPU remotely but then there is my "stopping" problem again. I believe a deadlock is happening when releasing a memory object in the multiple command_queue called by multiple threads scenario. Here is a part of my debug log taken when the execution stops:

    debug]#0  0x00007ffff582d420 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
    [debug]#1  0x00007fffef1f9ba0 in amd::Semaphore::wait() () from /usr/lib/libamdocl64.so
    [debug]#2  0x00007fffef1f6162 in amd::Monitor::finishLock() () from /usr/lib/libamdocl64.so
    [debug]#3  0x00007fffef21f6fc in gpu::Device::ScopedLockVgpus::ScopedLockVgpus(gpu::Device const&) () from /usr/lib/libamdocl64.so
    [debug]#4  0x00007fffef242c3e in gpu::Resource::free() () from /usr/lib/libamdocl64.so
    [debug]#5  0x00007fffef243207 in gpu::Resource::~Resource() () from /usr/lib/libamdocl64.so
    [debug]#6  0x00007fffef22fd3d in gpu::Memory::~Memory() () from /usr/lib/libamdocl64.so
    [debug]#7  0x00007fffef23123f in gpu::Buffer::~Buffer() () from /usr/lib/libamdocl64.so
    [debug]#8  0x00007fffef1e8998 in amd::Memory::~Memory() () from /usr/lib/libamdocl64.so
    [debug]#9  0x00007fffef1e9607 in amd::Buffer::~Buffer() () from /usr/lib/libamdocl64.so
    [debug]#10 0x00007fffef1f41eb in amd::ReferenceCountedObject::release() () from /usr/lib/libamdocl64.so
    [debug]#11 0x00007fffef1c5a37 in clReleaseMemObject () from /usr/lib/libamdocl64.so
    

    I will try to finally reproduce this by focusing on threaded allocating and releasing memory in a minimal example. Hopefully this is leading somewhere. It would be nice to solve this to convince my boss to by some of the 7990 cards for our computing.

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points