17 Replies Latest reply: Oct 7, 2013 5:10 AM by himanshu.gautam RSS

Different results with HD 7970 and HD 7750

wayne_static Newbie
Currently Being Moderated

Hello,

 

I have a kernel that I have written to perform some dynamic programming routine particularly targeting the GCN architecture. Recently, I tried to optimize the kernel by getting rid of If-Else constructs and replacing them with select instead. However, the same kernel works fine for my HD 7970 GPUs and with some improvement in speed but the strange thing is that the same kernel does not work correctly on the HD 7750 GPUs.

 

By not working I mean - the output of the kernel is a a huge table of values. I verify against a sequential implementation on CPU after a kernel execution and the HD 7970 results are always correct but the results from the HD 7750 are somewhere between 60% to 90% correct. For example, 4,193,984 out of 4,194,304 passes verification.

 

Again ONLY thing I did was replace if-else with select in the kernel. Could anyone please shed some light on this strange behavior? Many thanks and I can provide kernel codes if necessary. Thanks.

  • Re: Different results with HD 7970 and HD 7750
    nou Expert
    Currently Being Moderated

    it may be bug in driver or faulty hardware. best thing is if you can provide test case.

    • Re: Different results with HD 7970 and HD 7750
      wayne_static Newbie
      Currently Being Moderated

      Hi nou nou thanks for the reply. I am not ruling out your response but may I also mention that this behavior also exist on the nVidia hardware as well, GeForce 650 and 680 GTX to be precise. I don't know what this means with respect to drivers. Please could you elaborate on what you mean by test case in this situation? Thanks

      • Re: Different results with HD 7970 and HD 7750
        himanshu.gautam Master
        Currently Being Moderated

        As the code is failing on nvidia as well as 7750, i would guess it is accidentally passing on 7970. 7750 & 7970 are both GCN, it is hard to imagine them giving different results. I would guess you have different drivers installed on 7750 & 7970 machines. Are they running same OS, and do they latest APP SDK? Latest catalyst driver is recommended (13.8 beta as of today) Before sharing your kernel, i would suggest you to check verification logic and all the places where you used select instead of if-else. You might be having some silly bug somewhere ;) If nothing rings a bell, feel free to share your kernels here. It is recommended to attach a testcase that can be downloaded by anyone and compiled with little hassles. Use advanced editor for attaching.

        Regards

        Himanshu , Bruhaspati

        --------------------------------

        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

        • Re: Different results with HD 7970 and HD 7750
          wayne_static Newbie
          Currently Being Moderated

          Thanks for the reply. I agree with you and it is hard to imagine such behavior. At the moment, all machines are running identical drivers, i.e, Catalyst version 13.4 and AMD APP version 1124.2 which comes with the latest SDK version 2.8.1. All machines are also running same copies of Windows 7 Enterprise 64-bit. Maybe I should also mention that the machine with the GeForce 680 GTX also has same version of OS and does not use the AMD APP SDK.

           

          I usually work with a single project using Visual Studio 2012 and then copy the project to which ever machine I want to run tests on. All results are integers so there are no floating-point headaches. Input data is randomized and output data is a table and so the verification process is simply a matter of looping through the GPU values and comparing with the sequential CPU results. All of these happen in one execution of the code.

           

          Do you suggest I update to the catalyst driver 13.8 beta and try again before providing a test case? Thanks.

          • Re: Different results with HD 7970 and HD 7750
            nou Expert
            Currently Being Moderated

            yes try latest drivers at there is chance that it was already fixed.

            • Re: Different results with HD 7970 and HD 7750
              wayne_static Newbie
              Currently Being Moderated

              I have updated the machine with the HD 7750 GPU to catalyst version 13.8 beta2 but it still fails verification. This machine is also equipped with an A10-5800K APU and it also fails on the HD 7660D GPU attached to it.

              • Re: Different results with HD 7970 and HD 7750
                himanshu.gautam Master
                Currently Being Moderated

                Please provide us the testcase

                Regards

                Himanshu , Bruhaspati

                --------------------------------

                The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                • Re: Different results with HD 7970 and HD 7750
                  himanshu.gautam Master
                  Currently Being Moderated

                  Hi Not able to access the above link due to internal security reasons. Please give us the direct link or attach the file/project directly in this. Dont post any 3rd party urls.

                  Regards

                  Himanshu , Bruhaspati

                  --------------------------------

                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                  • Re: Re: Different results with HD 7970 and HD 7750
                    wayne_static Newbie
                    Currently Being Moderated

                    Apologies I attached a wrong project. Please find attached the original test case I attempted to link to. Many thanks.

                    • Re: Different results with HD 7970 and HD 7750
                      himanshu.gautam Master
                      Currently Being Moderated

                      Hi Wayne,

                       

                      A cursory glance at your code revealed some race conditions in your kernel.

                      A very similar scenario was reported in NVIDIA forums some 5 years back - where everyone thought it was a hardware bug.

                      But it turned out to be a race condition.

                       

                      here is what I found (there could be others hiding -- request you to prune your code)

                       

                      1. dps1_kernel - A "barrier" in the middle of FOR loop will cause race conditions between UPPER and LOWER half.

                                                This is a very subtle race that can dodge even the trained eyes.

                                                 You need to have another barrier towards end of FOR loop

                       

                      2. dps1_kernel -- A "barrier" cannot be used in the middle of FOR loop that reads for(x=tid; x<constantN; x += localSize)

                                                  Technically, some threads cannot enter the Loop and "barrier" will never be reached...

                                                   Unless -- you know for sure that "localSize" divides "constantN" perfectly.

                                                   In such cases, you need to write something like this:

                                                   for(x =0; x<N; x+=localSize)

                                                   {

                                                             if ((x + localId) < N)

                                                             { DO WORK }

                                                             barrier();

                                                             if ((x +localID) < N)

                                                             { DO SOME MORE WORK }

                                                             barrier(); // This is important!

                                                   }

                       

                      i have not checked other kernels. I hope you will be able to refactor your code with this input.

                      If the bug remains, please post here.

                      - Bruhaspati

                      Regards

                      Himanshu , Bruhaspati

                      --------------------------------

                      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                      • Re: Different results with HD 7970 and HD 7750
                        wayne_static Newbie
                        Currently Being Moderated

                        Thanks for the feedback. I will work on these right away and get back to you ASAP.

                      • Re: Different results with HD 7970 and HD 7750
                        wayne_static Newbie
                        Currently Being Moderated

                        Wow! Thanks very much, I am glad and very impressed.

                         

                        I started with your first point and added another barrier at the end of the for-loop and ran the code a couple of times with different input sizes. Things are now looking the way they should and working correctly on both GPUs.

                         

                        Regarding your second point, for this implementation, localSize must always divide constantN perfectly (formulation depends on this too) so I guess it's not much of an issue now. However, for future references, I will definitely keep this in mind.

                         

                        Once again thanks very much for your help.

                        • Re: Different results with HD 7970 and HD 7750
                          himanshu.gautam Master
                          Currently Being Moderated

                          You are welcome! Glad it worked!

                           

                          Thanks for marking it as "Answered"... It helps :-)

                          Regards

                          Himanshu , Bruhaspati

                          --------------------------------

                          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

  • This reply has been hidden. This can happen if the message has been hidden by a moderator, or has been reported as abusive.
    • Re: Different results with HD 7970 and HD 7750
      aldep Newbie
      Currently Being Moderated

      Beautiful benchmarks, but what has to do with the original question?

       

      I got the same problem (between 7750 & 7950), any news about the subject, AMD?

      • Re: Different results with HD 7970 and HD 7750
        wayne_static Newbie
        Currently Being Moderated

        I don't know whether to feel relief that someone else has encountered a similar situation. However, it would be really helpful if anyone from AMD could give us some update on the situation regarding this issue. Thanks.

        • Re: Different results with HD 7970 and HD 7750
          himanshu.gautam Master
          Currently Being Moderated

          Hey Wayne,

           

          Sorry about the delay from our side.... We do track all threads and yours is still in Unresolved state.

          So, this will gain our attention anyway....

           

          I just downloaded your package. I will let you know whether I can reproduce here.

          I still need to find if I can get 7970 and 7750...If I find, I will experiment and check out...

           

          Thanks for your time,

          The experiments will take some time.. Please bear with us,

          Thanks,

          - Bruhaspati

          Regards

          Himanshu , Bruhaspati

          --------------------------------

          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

      • Re: Different results with HD 7970 and HD 7750
        himanshu.gautam Master
        Currently Being Moderated

        Beautiful benchmarks -- is nothing but a camouflaged spam...

        If you look towards the bottom -- there is a link toward laptop prices etc....

        Spammers have become very intelligent today...They can beat all these text-mining algorithms.

        Nowdays we got to be very vigilant on these type of messages...

        Sigh..

        Regards

        Himanshu , Bruhaspati

        --------------------------------

        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

More Like This

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points