8 Replies Latest reply: Jul 18, 2013 3:11 AM by Sayantan Datta RSS

Reduced cache hit when I put a piece of code under loop!!

Sayantan Datta Newbie
Currently Being Moderated

Hi,

 

Card:       7970

Catalyst:  13.4

APP    :    2.8

OS      :    Kubuntu 12.04 x64

 

Code snippet:

//for (i =0 ; i  < 25 ; i++)

   encrypt();

 

When I comment the the loop the cache hit (tested using codeXL 1.1)  is 99%. But as soon as I un comment it cache hit drops to 23% and the kernel execution time is increased by 50 times when it should increase only by 25 times.  The function encrypt() is quite large to fit into i-cache but still when there is no loop cache hit is 99%. But as soon as I increase the no iterations i.e anything more than 1 iteration the cache hit wil drop to 23% and the performance penalty is 2x times , where x is the number of iterations. 

 

Regards,

Sayantan

  • Re: Reduced cache hit when I put a piece of code under loop!!
    himanshu.gautam Master
    Currently Being Moderated

    Called once, the function encrypt() might just get inlined in the kernel. There may be several optimization, that may reduce the variables needed, resulting in high performance. Multiple iterations of a big function is highly unlikely to get inlined. Which would require lot of variable fetching, and stack management.

    Anyways it is interesting, and I will seek some experts advice

    Can you try checking the performance once again with using "-cl-opt-disable" flag for compiling the kernel?

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

    • Re: Reduced cache hit when I put a piece of code under loop!!
      Sayantan Datta Newbie
      Currently Being Moderated

      "-cl-opt-disable" flag does improve the performance around 5-6% when the function is looped but definitely not enough to eliminate 50% performance loss due to cache hit.

      Thanks for the reply.

       

      Regards,

      Sayantan

      • Re: Reduced cache hit when I put a piece of code under loop!!
        himanshu.gautam Master
        Currently Being Moderated

        Can you improve the code a bit. I was a getting a lot of errors because of the goto statement. Also you seem to be running the kernel only once. For profiling purpose, run it over say 100 iterations, and average out.

        Regards

        Himanshu , Bruhaspati

        --------------------------------

        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

        • Re: Reduced cache hit when I put a piece of code under loop!!
          Sayantan Datta Newbie
          Currently Being Moderated

          Generally goto statements produce only warnings which are mostly harmless. Also the kernel I have attached doesn't have any goto statement.  Also the compiler seems to auto inline all the functions which I really don't want to.  Is there any possible way to reduce the code length (ISA length) ?

           

          Regards,

          Sayantan

          • Re: Reduced cache hit when I put a piece of code under loop!!
            himanshu.gautam Master
            Currently Being Moderated

            goto is not that big a problem. But number of iterations is. Can you report your results after running the kernel for multiple iterations. Cache-hit counter might be buggy (and in that case, the issue should go to CodeXL team), but we need to make sure that performance is indeed going worse. In that case, it becomes a OpenCL Compiler/runtime issue.

            Regards

            Himanshu , Bruhaspati

            --------------------------------

            The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

            • Re: Reduced cache hit when I put a piece of code under loop!!
              Sayantan Datta Newbie
              Currently Being Moderated

              Hi ,

               

              CodeXL seems to be reporting correctly because when the cache hit drops it is accompanied by an increase in fetch size and mem unit busy which can be only explained by increased cache misses.  I also tested the kernel 10 times inside a loop on the the host side and performance counters were almost identical for each kernel call. This seems to be a compiler problem to me.

               

              Regards,

              Sayantan

              • Re: Reduced cache hit when I put a piece of code under loop!!
                himanshu.gautam Master
                Currently Being Moderated

                This code is not going to run well because it’s too large for the instruction cache.  Even the code without the loop take 35072 bytes, which is too large.  Combine that with the fact that we can only get 4 waves per CU, due to the VGPR usage, and we can’t hide the latency of the I$ fetches.  Perhaps with the user’s particular driver, the code without the loop fits in the I$, but with the driver I am testing both kernels are too large for the I$.


                 

                The developer should also be aware that, as far as I can see, adding the loops does nothing to the algorithm since the first thing done in encrypt() is to set out[] to in[] which undoes all the previous computations.  The compiler can’t see this.


                Courtesy: Jeff Golds

                Regards

                Himanshu , Bruhaspati

                --------------------------------

                The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

More Like This

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points