12 Replies Latest reply: May 8, 2013 10:05 PM by himanshu.gautam RSS

DMA with AMD A10 APU?

reynmorris Newbie
Currently Being Moderated

I have a functioning OpenCL application right now that uses 2 command queues so that I can run a kernel and DMA transfer data concurrently. It works with multiple systems that use discreet GPUs (NVidia and AMD).


However, when I try to run it on my system with an AMD A10 APU, the kernel locks up and freezes. Is this just not possible with this architecture or is there some kind of exception I need to use?

 

I can provide an example program privately if an AMD developer can help.

 

Thanks!

  • Re: DMA with AMD A10 APU?
    himanshu.gautam Master
    Currently Being Moderated

    Please attach your test case here & I will try to reproduce it at my end. I can also forward it to relevant AMD Engg team if the bug is found valid.

    I would also suggest to go through Transfer Overlap SDK sample for some directions.

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

  • Re: DMA with AMD A10 APU?
    himanshu.gautam Master
    Currently Being Moderated

    If I recall correctly, CUDA requires Multiple Streams (within a CUDA context) for overlapping DMA with Kernel Execution.

    However, I think, in AMD - You really dont need multiple command queues. Just make sure that the kernel and the buffer copy are  enqueued one after another and that they dont have depndency and that the buffer uses pinned memory. This should suffice.

    Please give me sometime while I experiment with the same and let you know of.

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

    • Re: DMA with AMD A10 APU?
      Meteorhead Apprentice
      Currently Being Moderated

      So do I take it correctly, that DMA is only used when pinned memory is used? Do I remember correctly, that pinned memory is used only, when a buffer is smaller than 32MB and is moved by clEnqueueMapBuffer? I recall reading about this a while back, and if I remember correctly mapping buffers return pointers to pinned memory, if they are small enough. I only ask because I'm writing a prototype of GPU cluster capable physics simulation with MPI, and CUDA has RDMA implemented with CUDA (most likely not ported to OpenCL), so my best chance with AMD is using pinned buffers.

       

      Plus, does AMD plan on implementing something similar on the Red side of the force? (RDMA namely with Infini, or simply within a host)

      • Re: DMA with AMD A10 APU?
        reynmorris Newbie
        Currently Being Moderated

        I really hope there isn't a hard cap that small on the size of pinned memory, I haven't checked.  And I'm also curious about whether or not there are plans for RDMA in OpenCL, but not very hopeful as that is probably an architecture-specific thing that nVidia is doing (as it only appears to be available on newer Tesla models).

         

        Himanshu - Sorry I haven't responded to the main replies here, I've had to move forward with an alternate approach but I am still curious whether or not this can be done on APU hardware (concurrent DMA and kernel execution). If you come up with a very simple example that works on Trinity hardware I'd be very appreciative to see it. Thanks for your time

      • Re: DMA with AMD A10 APU?
        himanshu.gautam Master
        Currently Being Moderated

        I think the 32MB limit comes from Table 4.2 in AMD APP Programming Guide. This is the case for normal regular buffers (which are not pinned and stored in device usually) and the guide is talking about behaviour of "clEnqueueMap"

         

        But, if you want to use DMA - you got to Pin the buffer. Pinning usually happens when you use "USE_HOST_PTR". Either the host application pages are directly pinned (or) the host application pages are copied to a temporary pinned buffer for one-shot transfer (or) Transferred chunk by chunk using DMA and double-buffering. The run-time will decide the time of transfer (depending on first time usage mostly.) Until you MAP that buffer, the OpenCL runtime will own your host-ptr. When you map, you own it - you can write to it.. When you UNMAP, control returns to OpenCL run-time.

         

        When you use ALLOC_HOST_PTR, if zero-copy is supported, pinned memory is allocated. The KERNEL can directly read this data using a pointer and hence data-transfer and kernel execution occur together -- which is not a great way to overlap data-transfer and kernel execution (GPU is too fast and will often stall waiting for data to arrive from system memory across PCIe)

         

        When you use PERSISTENT_AMD flag, the buffer is allocated inside the GPU and the CPU gets a pointer (that read/writes across the PCIe bus). In this case, memcpy and kernel execution can happen together. But the memcpy is PIO and cannot be called as DMA.

         

        The best way to overlap a transfer with kernel execution is to first allocate Pinned buffer (using ALLOC_HOST_PTR), Map it to get the pointer and write something onto the buffer. Allocate another normal buffer (which sits on GPU). Now, do a clEnqueueWrite* from pinned buffer to the normal buffer. This is just DMA.

        It is this DMA that I would like to overlap with Kernel execution. I am still investigating whether this is possible or not.

        Will post an update next week.

        Regards

        Himanshu , Bruhaspati

        --------------------------------

        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

        • Re: DMA with AMD A10 APU?
          himanshu.gautam Master
          Currently Being Moderated

          I am still working on creating a sample. Will get this across at the earliest. Apologies for the delay.

           

          Thanks,

          Regards

          Himanshu , Bruhaspati

          --------------------------------

          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

  • Re: DMA with AMD A10 APU?
    himanshu.gautam Master
    Currently Being Moderated

    Hi,

    Your e-mail is private. So, I am not sure how I can contact you to get the repro-case sources.

    I have sent you a friend request. please accept it. Let us see if that opens the door for some private message exchange.

    We will work with you to resolve your issue.

    Thanks,

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

  • Re: DMA with AMD A10 APU?
    himanshu.gautam Master
    Currently Being Moderated

    Working on to enable private message communications. Hopefully, once this fixed, You can feel free to send in your code. We will test it out and see why it is crashing..

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

    • Re: DMA with AMD A10 APU?
      himanshu.gautam Master
      Currently Being Moderated

      Hi,

       

      I am afraid... but I think Private messaging may not be possible at the moment.

      Can you confirm, if you are still having the issue?

      We will appreciate if you could post a simple test-case that shows the crash/hang.

       

      Thanks,

      Regards

      Himanshu , Bruhaspati

      --------------------------------

      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

      • Re: DMA with AMD A10 APU?
        himanshu.gautam Master
        Currently Being Moderated

        Here is a Sample Code to showcase Asynchronous DMA using AMD GPUs. It should compile for both windows and linux.

        Regards

        Himanshu , Bruhaspati

        --------------------------------

        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

More Like This

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points