20 Replies Latest reply: Feb 21, 2013 5:15 AM by yurtesen RSS

OpenCL performance dropped down 12.10 >> 13.1

darkmen Newbie
Currently Being Moderated

Hi everyone.

I have updated today the AMD Catalist drivers to 13.1 and got 20% performance loss on my HD7970.

Does anyone have the same experiance?

Also which is the easiest way to rollback to 12,10? Uninstalling 13.1 and reinstalling 12.10 gives the same lower speed (opencl reporting NEW runtime version)

  • Re: OpenCL performance dropped down 12.10 >> 13.1
    Claggy Newbie
    Currently Being Moderated

    I reported that last week too:

     

    http://devgurus.amd.com/message/1286437#1286437

     

    I had to delete a whole lot of files to be able to reinstall Cat 12.8,

    since then an AMD Catalyst Un-install Utility has appeared on the AMD Game Driver download site:

     

    http://sites.amd.com/us/game/downloads/Pages/catalyst-uninstall-utility.aspx

     

    Not tried it properly yet, except that it didn't work on Vista, and it says it is for Windows 7 only,

     

    Claggy

  • Re: OpenCL performance dropped down 12.10 >> 13.1
    darkhmz Newbie
    Currently Being Moderated

    Hi!

     

    I have experienced the same issue with Catalyst 13.1. In my case the performance drop was around 39% on my HD5830. I've tested kernel performance with different versions of amdocl.dll and the OpenCL version shipped with Catalyst 13.1 was the worst. According to APP profiler, kernel execution times were ~17.51ms and ~24.38ms (12.10 vs 13.1).

    • Re: OpenCL performance dropped down 12.10 >> 13.1
      himanshu.gautam Master
      Currently Being Moderated

      Hi,

      I am sorry to hear this.

      If I am not asking for more, Can you please post a simple code that shows the performance degradation.

      Thanks,

      Regards

      Himanshu , Bruhaspati

      --------------------------------

      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

  • Re: OpenCL performance dropped down 12.10 >> 13.1
    darkmen Newbie
    Currently Being Moderated

    Hi, i have just tried the 13.2 version with OCL runtime 1124.2,

    Performance goes even more down then 13.1.

    And this is all goes to a compiler. Now comparing ISA sources produced by 12.10 and 13.1 (btw, AMD APP KernelAnalyzer crashes on 13.2)

    Seems there are some changes around branches and\or loops.

     

    The source pseudo code:

    for(uint i=0;i<STEP;i++){

              if(check_data(...))

         output[0] = i;

    }

     

    12.10 ISA:

      s_mov_b64     exec, s[10:11]     

      s_addk_i32    s3, 0x001f         

      s_addk_i32    s2, 0x0001         

      s_cmp_ge_u32  s2, 0x00002100     

      s_cbranch_scc1  label_3CC4       

      s_branch      label_0707         

      s_getpc_b64   s[10:11]           

      s_sub_u32     s10, s10, 0x0000d6e4

      s_subb_u32    s11, s11, 0        

      s_setpc_b64   s[10:11]           

    label_3CC4:                        

     

    13.1 ISA:

      s_mov_b64     exec, s[10:11]     

      s_addk_i32    s3, 0x001f         

      s_addk_i32    s2, 0x0001         

      s_cmp_ge_u32  s2, 0x00002100     

      s_cbranch_scc0  label_3F7E       

      s_getpc_b64   s[10:11]           

      s_add_u32     s10, s10, 0x00000038

      s_addc_u32    s11, s11, 0        

      s_setpc_b64   s[10:11]           

    label_3F7E:                        

      s_getpc_b64   s[10:11]           

      s_sub_u32     s10, s10, 0x0000d19c

      s_subb_u32    s11, s11, 0        

      s_setpc_b64   s[10:11]           

      s_getpc_b64   s[10:11]           

      s_sub_u32     s10, s10, 0x0000d1b0

      s_subb_u32    s11, s11, 0        

      s_setpc_b64   s[10:11]           

     

    As you can see, the new compiler seems makes more instructions for same code.

    • Re: OpenCL performance dropped down 12.10 >> 13.1
      realhet Novice
      Currently Being Moderated

      Wow, that's funny code...

        s_getpc_b64   s[10:11]       

        s_add_u32     s10, s10, 0x00000038

        s_addc_u32    s11, s11, 0       

        s_setpc_b64   s[10:11]          

      It can be realized with an "s_branch 0x000E" (0x000E comes from 0x0038/4, /4 because of dword align)

      I guess they prepared the compiler to do bigger loops than 128KB (which can't be encoded in s_branch), so they replaced almost every jumps with these 4cycle far jumps. Even when the jump targets are well known absolute locations in s_branch's reach

       

      (Btw: 64KByte is running out of the GCN's 32KByte code cache! You should keep that loop below 32K)

       

      Tho', I think the performance issue could be rather inside the check_data(...) region, not in this rarely executed loop management code.

      • Re: OpenCL performance dropped down 12.10 >> 13.1
        darkmen Newbie
        Currently Being Moderated

        Well, I agree: offcourse this will not give 20% perf loss.

         

        I can see positive experience also (atleast in theory):

        • Loops even more unrolled now
        • exec mask instruntions are more effective (i can see even less branches in code):

        12.10 ISA:

          s_mov_b64     s[48:49], exec                             

          s_andn2_b64   exec, s[48:49], s[46:47]                   

          s_andn2_b64   s[44:45], s[44:45], exec                   

          s_cbranch_scc0  label_086E                               

          s_andn2_b64   exec, s[48:49], exec                       

          s_mov_b64     exec, s[48:49]                             

          s_mov_b64     exec, s[44:45]                             

          s_branch      label_0838                                 

        label_086E:

         

        13.1 ISA:

          s_mov_b64     vcc, exec                                  

          s_andn2_b64   exec, vcc, s[46:47]                        

          s_andn2_b64   s[44:45], s[44:45], exec                   

          s_cbranch_scc0  label_0C76                               

          s_mov_b64     exec, s[44:45]                             

          s_branch      label_0C42                                 

        label_0C76:

         

        So, the question is still open, what makes it slower?

        • Re: OpenCL performance dropped down 12.10 >> 13.1
          himanshu.gautam Master
          Currently Being Moderated

          Hi everyone,

          From the last few posts, it looks like, there have been some optimizations in the driver 13.1 which have affected a few applications adversely. It will be helpful, if someone can help in pin-pointing this issue. You can point any SDK sample, or a small testcase, which can showcase the performance drop just by using a different driver.

          I tried a few SDK Samples: MatrixMulImage, BlackScholes & LDSMemoryBandwidth. But did not see any changes in performance.

          Regards

          Himanshu , Bruhaspati

          --------------------------------

          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

          • Re: OpenCL performance dropped down 12.10 >> 13.1
            darkhmz Newbie
            Currently Being Moderated

            Hi!

             

            Here is a small testcase that shows quite a big (~33% difference in fps) performance drop on my HD5830 just by using different amdocl.dll versions. I've included the two dlls from 12.10 and 13.1 to make the testing easier, and two pictures to show the obvious performance difference on my card. Hope it helps.

             

            http://www.mediafire.com/?nip722foiqoc4v8

            • Re: OpenCL performance dropped down 12.10 >> 13.1
              himanshu.gautam Master
              Currently Being Moderated

              Thanks darkhmz,

              Will look into the test case and let you know.

              Is this windows issue or linux? It is helpful if you can give any more details about your setup.

              Regards

              Himanshu , Bruhaspati

              --------------------------------

              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

              • Re: OpenCL performance dropped down 12.10 >> 13.1
                darkhmz Newbie
                Currently Being Moderated

                Hi!

                 

                Win7 x64 + Catalyst 12.10 here...

                • Re: OpenCL performance dropped down 12.10 >> 13.1
                  himanshu.gautam Master
                  Currently Being Moderated

                  Hi darkhmz,

                  I have been trying to work on it. I was able to see the slow down in kernel execution (from the outputs of codexl) using the dlls you provided.

                  But i also tried to create a fresh system, with just the AMD driver installed. When I installed catalyst 12.10, and tried running the executable, using your dlls (12.10 & 13.1), I did not saw the performance degradation. When using the catalyst's amdocl.dll also, the fps was consistent. Still digging more on it.

                   

                   

                  Did you made any progress, in narrowing down the issue?

                   

                  Surprisingly codexl still shows the diffference in kernel timings (~33%) when run on the fresh machine just having the driver . Will it be possible for you to share some code, which i can compile. It is a 32-bit exe on a 64bit win7 platoform. Do you see similar performance drop on a 64-bit executable too?

                   

                  Message was edited by: Himanshu Gautam

                  Regards

                  Himanshu , Bruhaspati

                  --------------------------------

                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                  • Re: OpenCL performance dropped down 12.10 >> 13.1
                    darkhmz Newbie
                    Currently Being Moderated

                    Hi Himanshu,

                    I've compiled a 64 bit version and tested again, this time with amdocl64.dlls and the performance difference is still here. Though if  i change the scene, the difference is gone in some cases. For example with the following simple plane + bumpy torus scene i didnt see fps difference.

                     

                    float4 de(float4 p, float4 q)

                    {

                              float dst1 = dfPlane(p, (float4)(0.0f, 1.0f, 0.0f, -1.0f));

                              float dst2 = dfTorus(p, (float2)(2.5f, 0.8f)) - max(perlin(p * 3.0f) * 0.1f, 0.0f);

                              return (float4)(U(dst1, dst2), 0.0f, 0.2f, 0.0f);

                    }

                     

                    Im going to try a fresh test system and share my code sometime this week, then test again.

          • Re: OpenCL performance dropped down 12.10 >> 13.1
            yurtesen Apprentice
            Currently Being Moderated

            I have a small program which is getting about ~25% performance drop with 13.1 drivers. Do you have an email that I can send it to? (it is small but I would rather not upload it to public forum unless absolutely necessary).  ?

            • Re: OpenCL performance dropped down 12.10 >> 13.1
              himanshu.gautam Master
              Currently Being Moderated

              Hi yurtsen,

              I guess it is necessary to send your testcase via public medium only. I would recommend you to start a new thread, so it is easy to track. My apologies for the inconvenience.

              Regards

              Himanshu , Bruhaspati

              --------------------------------

              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

              • Re: OpenCL performance dropped down 12.10 >> 13.1
                yurtesen Apprentice
                Currently Being Moderated

                himanshu.gautam wrote:

                 

                Hi yurtsen,

                I guess it is necessary to send your testcase via public medium only. I would recommend you to start a new thread, so it is easy to track. My apologies for the inconvenience.

                I would understand this if I was looking for a problem in my program. But it doesnt make much sense since the problem appears to be the driver and nobody else (other than AMD) has to see the code. However, I will try to find out if I am allowed to share the code with public and return back to you in a new thread if I can.

                • Re: OpenCL performance dropped down 12.10 >> 13.1
                  himanshu.gautam Master
                  Currently Being Moderated

                  Thanks for your support.

                  I had asked for private message channel, but there appears to be some legal problems with that. Hope you will be able to reproduce your problem with a small testcase, which is easy to share for your in public domain.

                  Regards

                  Himanshu , Bruhaspati

                  --------------------------------

                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                  • Re: OpenCL performance dropped down 12.10 >> 13.1
                    yurtesen Apprentice
                    Currently Being Moderated

                    himanshu.gautam wrote:

                     

                    Thanks for your support.

                    I had asked for private message channel, but there appears to be some legal problems with that. Hope you will be able to reproduce your problem with a small testcase, which is easy to share for your in public domain.

                    The code itself does not have any copyright/license and our own experimental research code. The code is already a small testcase, we simply do not want it out in public yet. But I will see how we can flex that....

More Like This

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points