7 Replies Latest reply: Feb 14, 2014 8:10 AM by bragadeesh RSS

kernel occupancy of fft_fwd (clAmdFft) only at 33%?

cipoint Newbie
Currently Being Moderated

I'm using clAmdFft a lot in my code. About 75% of the execution time is spend in fft_fwd and fft_back. According to CodeXL the occupancy of these kernels is only at 33% and and the limiting factor is the (local) work group size being 64. Is the local work group size algorithm specific or can I somehow increase it?

 

Soft-/hardware that I'm using:

AMD APP SDK 2.8

CLAMDFFT 1.10

Juniper XT

Driver Packaging Version: 9.012-121219a-151962C-ATI

  • Re: kernel occupancy of fft_fwd (clAmdFft) only at 33%?
    himanshu.gautam Master
    Currently Being Moderated

    This is bit strange. Probably you are doing many kernel calls, and the lower half section of the image is for overall application (or maybe the hot spot-kernel).  Can you attach the profiler counts output also?

    To answer your question, I would think only library developers will be able to fix any kernel occupancy issues, but not sure, as i have not used this library so far.

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

  • Re: kernel occupancy of fft_fwd (clAmdFft) only at 33%?
    cipoint Newbie
    Currently Being Moderated

    @: I've attached the csv File to the initial post. Is it that what you meant?

     

    PS: I was not able to reply to your post directly (not authorized) ...

    • Re: kernel occupancy of fft_fwd (clAmdFft) only at 33%?
      himanshu.gautam Master
      Currently Being Moderated

      Thanks for the details. I have asked the library experts for their comments. In case you can give more information, as to what library routine (and with what parameters) is giving this behavior, it would be helpful in tracking down the issue.

      Regards

      Himanshu , Bruhaspati

      --------------------------------

      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

  • Re: kernel occupancy of fft_fwd (clAmdFft) only at 33%?
    bragadeesh Moderator
    Currently Being Moderated

    Hi,

     

    The workgroup sizes were chosen empirically for maximum performance. The library was tuned to work well for the 59xx cards and to most extent apply to the 5770 card as well. Please know that the GPU card family you have is 3 generations old and typically support for them will dwindle in such cases.

     

    What FFT problems are you running? What size transforms etc?

     

    Also, we recently open-sourced the code. It is called clMath. It is available on github at https://github.com/clMathLibraries/clFFT

    You can now browse through the code if needed.

     

    Thanks

    Bragadeesh Natarajan

    AMD

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

    • Re: kernel occupancy of fft_fwd (clAmdFft) only at 33%?
      cipoint Newbie
      Currently Being Moderated

      If I change SIMD_WIDTH=64 to SIMD_WIDTH=128 or any other number (even SIMD_WIDTH=100000 compiles and runs fine) in plan.h, nothing changes in the performance. Moreover, SIMD_WIDTH doesn't appear in any other place in the clFFT source code. So how can I tune this parameter?

       

      (By the way, I'm using a HD7970 now. I've also successfully changed the maximum DP FFT size from 2^22 to 2^24.)

      • Re: kernel occupancy of fft_fwd (clAmdFft) only at 33%?
        bragadeesh Moderator
        Currently Being Moderated

        Hi,

         

        If that is not seen anywhere in the code, that particular name SIMD_WIDTH is not used and probably stale code. It should be removed. The only parameter that you can programmatically change is the kernel work group size. The file stockham.generator.cpp has a class/constructor called KernelCoreSpecs where WorkGroupSize is set for a particular transform. This value can be changed and experimented with. Keep in mind that this value is just for the 1d transform that any problem gets eventually broken down into.

        Bragadeesh Natarajan

        AMD

        --------------------------------

        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.

  • Re: kernel occupancy of fft_fwd (clAmdFft) only at 33%?
    himanshu.gautam Master
    Currently Being Moderated


    Hi

     

    Confirm us in case if your problem/issue got resolved. Also we are curious know your experience. Please do share with us.

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

More Like This

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points