43 Replies Latest reply: Apr 12, 2013 2:29 PM by Claggy Branched to a new discussion. RSS

Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs

Raistmer Apprentice
Currently Being Moderated

My app starts to produce invalid results if source CL file is compiled with Catalyst 12.10 and causes driver restarts if it compiled with Cat 12.11 beta 8.

All this observed on HD7770 GPU. Looks like HD5xxx and HD6xxx are not affected.

 

Moreover, if app uses cached binaries compiled with Catalyst 12.8 - it works OK under Catalyst 12.10 and 12.11 beta 8. So, it's some problem that very new OpenCL runtime compiler adds.

 

Any chance to get this issue fixed in Cat 12.11 release ?

  • Re: Problems with Cat 12.10 and up and HD7xxx GPUs
    binying Novice
    Currently Being Moderated

    Could you provide more information about this issue such as a simple test code?

    • Re: Problems with Cat 12.10 and up and HD7xxx GPUs
      Raistmer Apprentice
      Currently Being Moderated

      It was observed on whole app (final results were invalid) and I had no time so far to debug this issue to particular kernel type.

      App is available for test, I can provide bench config but so far each time I did this AMD side falled to permanent silence. I don't want spend time for nothing really.

    • Re: Problems with Cat 12.10 and up and HD7xxx GPUs
      Raistmer Apprentice
      Currently Being Moderated

      Well, unfortunately, this problem not only HD7xxx specific.

      I was able to reproduce it on own HD6950.

       

      Here is testcase you asked for: https://dl.dropbox.com/u/60381958/Bad_binaries_with_Cat12.11beta8_test_case.7z

       

      How to use:

      Extract archive, run application (executable). It will perform some computations over included in archive dataset.

      App provided with text-based CL file. It will compile that CL file and produce few *.bin* files with binary kernels.

      There are 2 subdirectories also. One with such binaries generated under Catalyst 12.11 beta 8 (for HD6950 GPU) and another - binaries generated with some older Catalyst (can't say exactly but 12.6 most probably). When I use older binaries (running under Catalyst 12.11 beta 8 ) app does its computations and finished OK.

      But when I use no binaries (that is, compilation from scratch under Cat 12.11 beta8) or binaries already compiled under Cat 12.11b8 app causes driver restart.

       

      Please, confirm this and advise for some possible fix for this issue. App supposed to be installed automaticvally on huge number of hosts so driver restarts not an option to live with...

  • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
    Raistmer Apprentice
    Currently Being Moderated

    Same issue with Catalyst 12.11 beta 11, GPU is HD6950, OS: Vista x86

  • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
    Raistmer Apprentice
    Currently Being Moderated

    I installed Catalyst 12.1

    Test case with freshly generated kernels binaries works OK.

    When I replace those binaries with one generated under Cat 12.11 driver restart issue returns.

     

    I think all this quite full evidence that the problem not in runtime, but in new OpenCL->binary compiler that generates GPU binaries. No matter under what runtime they run, old or new, always binaries generated with old Catalyst work OK, binaries generated with Cat 12.11 (beta 8 or beta 11) cause driver restart.

    One can find those binaries in link above.

    • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
      freighter Newbie
      Currently Being Moderated

      Had some time to test this apps issue on two 64bit openSuse Linux hosts (same sources as on windows).

       

      Sadly was able to reproduce the problem found on windows on one of the hosts with two Radeon HD 7750 with Cat. 12.11beta8 and beta11. It causes a complete system freeze (screen active, system does not respond to any actions, like mouse or keyboard, anymore.) and makes a hard reboot necessary. This system has only PCIe2.0 slots available, while the second host, which does NOT reproduce this problem has a Radeon HD 7850 residing in a PCIe3.0 slot. Maybe this is connected, maybe not, at least some observation.

       

      So, if there is a fix for windows please make sure you have one for Linux as well.

  • Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
    Raistmer Apprentice
    Currently Being Moderated

    Any advance with test case? Problem confirmed ?

  • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
    Raistmer Apprentice
    Currently Being Moderated

    So, what the current status of this issue ?

  • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
    Raistmer Apprentice
    Currently Being Moderated

    Problem not fixed in Cat 13.1, I have reports that driver restart occurs on 13.1 too.

  • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
    Raistmer Apprentice
    Currently Being Moderated

    Looks like problem not fixed in Catalyst 13.2 beta 3 too.

    Now CL file just can't be compiled at all:

    OpenCL Platform Name:



    AMD Accelerated Parallel Processing
    Number of devices:


    1
      Max compute units:


    10
      Max work group size:


    256
      Max clock frequency:


    1120Mhz
      Max memory allocation:

    536870912
      Cache type:



    Read/Write
      Cache line size:


    64
      Cache size:



    16384
      Global memory size:


    1073741824
      Constant buffer size:


    65536
      Max number of constant args:

    8
      Local memory type:


    Scratchpad
      Local memory size:


    32768
      Queue properties:


        Out-of-Order:


    No
      Name:




    Capeverde
      Vendor:



    Advanced Micro Devices, Inc.
      Driver version:


    1124.2 (VM)
      Version:



    OpenCL 1.2 AMD-APP (1124.2)
      Extensions:



    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_amd_c1x_atomics

     

     

    INFO: can't open binary kernel file: .\\AstroPulse_Kernels_r1761.cl_Capeverde.bin_V6, continue with recompile...

    Error : Building Program (source, clBuildProgram):main kernels: not OK code -11

    Internal error: Compilation failed.

     

    It works OK with Cat 12.1, with Cat 12.8 (for example).

     

    EDIT: And I don't see how this issue recived "Assumed Answered" status if it remains in all new drivers AMD releases. It's absolutely not answered and critical issue in fact.

    • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
      himanshu.gautam Master
      Currently Being Moderated

      Hi raistmer,

      Sorry for the delay. I am looking into this issue now. I will let you know the status.

       

      I am not sure who marked this thread as answered. I hope you can again mark it as unanswered.

       

      Raistmer,

      It looks the dropbox link you had given is dead (may be temporarily). Can you please share the test case again.

       

      Message was edited by: Himanshu Gautam

      Regards

      Himanshu , Bruhaspati

      --------------------------------

      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

      • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
        Raistmer Apprentice
        Currently Being Moderated

        1) I don't see how I can unmark that.

        2) Just now I clicked on DropBox link above and got archive downloaded on this PC.

        Link is: https://dl.dropbox.com/u/60381958/Bad_binaries_with_Cat12.11beta8_test_case.7z and it should not expire until I manually delete this archive from DropBox. So try again, maybe luck will be with you next time

        • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
          freighter Newbie
          Currently Being Moderated

          Confirming that the dropbox link Raistmer has given still works.

           

          Also i can confirm that Linux shows identical failure to compile OpenCL Kernels from source. This avoids the crashes i reported earlier in this thread.

          Using precompiled (binary) kernels still works, but how to create these in the future.

          • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
            himanshu.gautam Master
            Currently Being Moderated

            Hi,

            Still not able to access this link as my company network does not allow it.

            Please re-share the test case by attaching it here itself as a zip file.

             

            EDIT: Use advanced text editor to attach the testcase.

             

            Message was edited by: Himanshu Gautam

            Regards

            Himanshu , Bruhaspati

            --------------------------------

            The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

              • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                himanshu.gautam Master
                Currently Being Moderated

                Thanks Raistmer.

                I guess the problem is already reported by binying, but I was not able to find a tracking number for it. Will let you know the status now.

                Regards

                Himanshu , Bruhaspati

                --------------------------------

                The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

              • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                himanshu.gautam Master
                Currently Being Moderated

                Hi Raistmer,

                I tried the two executables you had shared (MB7_win_x86_SSE_OpenCL_ATi_r1726_verbose.exe and setiathome_6.99_windows_intelx86__opencl_ati_sah.exe) on Drivers 13.1, 12.8 and 12.3. Both the applications always resulted in driver crash.

                My system details: HD 7970, Driver: 13.1,12.8,12.3, CPU: FX4100

                 

                Anyways I will report it to AMD Team Again. Sorry could not find the reference to the old bug.

                Regards

                Himanshu , Bruhaspati

                --------------------------------

                The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                  Raistmer Apprentice
                  Currently Being Moderated

                  Are you sure you was able to downgrade recent drivers properly.

                  Inability of AMD Catalyst installer to properly do OpenCL runtime downgrade is known bug and was reported by Claggy on these forums too.

                  Very possible that all variants you tried were on the same recent 13.1 OpenCL runtime that fails to compile at all indeed.

                  Real OpenCL runtime from Cat 12.8 has no issues with app. And to check initial problem with invalid computations you should try Catalyst 12.10 drivers, not ones you tried.

                  • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                    himanshu.gautam Master
                    Currently Being Moderated

                    I have not seen any problems in downgrading to old drivers with a clean system. I could see proper driver versions in CCC.

                    Anyways will check with 12.10 too. (As per you, i should get invalid results with 12.10 driver, how to verify that?).

                     

                    I will check once more with 12.8 with more rigorous cleanup. Thanks for your support.

                    Regards

                    Himanshu , Bruhaspati

                    --------------------------------

                    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                    • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                      Raistmer Apprentice
                      Currently Being Moderated

                      Driver version and OpenCL runtime version are quite different things. Be careful to refer the right one (OpenCL runtime).

                      I have reports of success with running Catalyst 13.1 video (and perhaps sound and so on) driver but with OpenCL runtime taken from Cat 12.8. Only OpenCL compiler works incorrectly. BTW, did you remove *.bin* files between runs? If you will run with compiled binaries (under old driver) on new driver you will get correct results too (cause again, OpenCL compiler broken, not OpenCL runtime per se. If one already have right binary it will be executed OK.

                      For now to check if app works differently check stderr.txt file for number of found signals.

                      I will attach validating tool later.

  • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
    Raistmer Apprentice
    Currently Being Moderated

    To check validness of computation one can use attached tool and reference result (inside archive).

    usage:

    rescmpv5.exe ref-setiathome_6.98_windows_intelx86.exe-PG0395_v7.wu.res result.sah

    where result.sah is the file generated after app full run.

    tool output self-explaining. In case of big result differencies it will show table with quality of found signals between 2 files.

     

    Examples of usage:

    Cat 12.8 run:

    E:\123>rescmpv5.exe ref-setiathome_6.98_windows_intelx86.exe-PG0395_v7.wu.res result.sah

    Result      : Strongly similar,  Q= 99.41%

     

    not 100% similarity almost inevitable between CPU and GPU long floating point computations but similarity good enough.

     

    Cat 12.8 but binaries taken from Cat 12.11 beta 8:

     

    1) driver restart occured (just as was reported in initial post).

    2)

    E:\123>setiathome_6.99_windows_intelx86__opencl_ati_sah.exe

     

     

    E:\123>rescmpv5.exe ref-setiathome_6.98_windows_intelx86.exe-PG0395_v7.wu.res result.sah

                    ------------- R1:R2 ------------     ------------- R2:R1 ------------

                    Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad

            Spike      0      0      0      0      0        0      0      0      0      0

         Autocorr      0      0      0      0      1        0      0      0      0      0

         Gaussian      0      0      0      0      1        0      0      0      0      0

            Pulse      0      0      0      0      0        0      0      0      0      0

          Triplet      0      0      0      0      0        0      0      0      0      0

       Best Spike      0      0      0      0      1        0      0      0      0      0

    Best Autocorr      0      0      0      0      1        0      0      0      0      0

    Best Gaussian      0      0      0      0      1        0      0      0      0      0

       Best Pulse      0      0      0      0      1        0      0      0      0      0

    Best Triplet      0      0      0      0      0        0      0      0      0      0

                    ----   ----   ----   ----   ----     ----   ----   ----   ----   ----

                       0      0      0      0      6        0      0      0      0      0

     

     

    Unmatched signal(s) in R1 at line(s) 672 689 716 732 749 775

    Result      : Different.

    As one can see number of found results differs (of course, app was terminated after driver restart, computations not finished).

     

    One will see similar table if computation will finish ok, but with wrong results.

    Validation tool will show differencies as in this sample.

     

    P.S.:

     

    (As per you, i should get invalid results with 12.10 driver, how to verify that?).

    Yes, expect wrong result (but no driver restart ) with Cat 12.10. Driver restarts appeared on later driver releases.

    Tool for verification and how to use it described in this post, above.


    • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
      himanshu.gautam Master
      Currently Being Moderated

      Hi Raistmer,

      Probably you were right about driver downgrading issue. Here are my observations:

       

      1. I had installed 12.10 without a proper system clean and saw the driver crash there. Ran rescmpv5.exe and result were incorrect.

      2. Then I had cleaned the system using AMD cleanup utility before installing any other driver:

      3. Installed 12.8 driver: SETI.exe ran without a crash. Check correctness with rescmpv5.exe and it gave 99.9% correctness.

      4. Installed 12.10 again, and surprisingly seti.exe again ran without crash. rescmpv5 also passed correctly.

      5. Now installed 13.1 driver, SETI.exe crashed. rescmpv5.exe confirms incorrect result.

       

      Attached are the result.sah and stderr file for all cases.

      So our observations are differing for 12.10 driver as of now. But anyways it is a bug. Please provide any feedback you have on the results.

      Regards

      Himanshu , Bruhaspati

      --------------------------------

      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

      • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
        Claggy Newbie
        Currently Being Moderated

        himanshu.gautam wrote:

         

        Hi Raistmer,

        Probably you were right about driver downgrading issue. Here are my observations:

         

        1. I had installed 12.10 without a proper system clean and saw the driver crash there. Ran rescmpv5.exe and result were incorrect.

        2. Then I had cleaned the system using AMD cleanup utility before installing any other driver:

        3. Installed 12.8 driver: SETI.exe ran without a crash. Check correctness with rescmpv5.exe and it gave 99.9% correctness.

        4. Installed 12.10 again, and surprisingly seti.exe again ran without crash. rescmpv5 also passed correctly.

        5. Now installed 13.1 driver, SETI.exe crashed. rescmpv5.exe confirms incorrect result.

         

        Attached are the result.sah and stderr file for all cases.

        So our observations are differing for 12.10 driver as of now. But anyways it is a bug. Please provide any feedback you have on the results.

        Looking at the result from:

         

        [quote]4. Installed 12.10 again, and surprisingly seti.exe again ran without crash. rescmpv5 also passed correctly.[/quote]

         

        It looks as if Raistmer has supplied a workunit that doesn't show a weakily similiar result on Cat 12.10, it has:

         

        'WU true angle range is :  0.394768'

         

        The Workunits that that showed the weakily similar reult were the PG0009_v7.wu and the refquick_v7.wu workunits,

         

        which have 'WU true angle range is :  0.008955' and 'WU true angle range is :  0.775000' respectively.

         

        Here's a full bench of five different workunits (with 3 different apps) where those two workunits are weakily similar.

         

        Claggy

         

        Edit: added PG0009_v7.wu and refquick_v7.wu workunits along with ref files for said workunits.

      • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
        Raistmer Apprentice
        Currently Being Moderated

        Himanshu, thanks for looking into this issue deeply.

        Cause Claggy was first who report Cat 12.10 issue to me I think he is right with explanation why your observation differs from what I said about Cat 12.10.

        Please, replace work_unit.sah from my archive with same file from PG0009_v7.workunit.7z.zip archive that Claggy attached.

        Also, another ref file,ref-setiathome_6.98_windows_intelx86.exe-PG0009_v7.wu.res (again, provided in that archive), needed to check fresh result.sah. Comparison utility remains the same.

         

        P.S. So, for now we can summarize issues in next way:

        1) Incorrect computations with Cat 12.10 appear not in all data sets. Moreover, difference for PG0009 task in (as we call them) "best signals", that is, signals below threshold to be marked as reportable. That means computations in kernels compiled under 12.10 differ from correct ones not too big, but enough for precision issue to appear.

        2) Catalyst 13.1 compiler broken for this kernels file. It's another issue cause error appears even before computations begin.

        • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
          himanshu.gautam Master
          Currently Being Moderated

          Hi Raistmer,

          I checked with the new data as you suggested. I had taken the result.sah file and the reference file from PG0009_v7.workunit.7z.zip attachment. Not sure what the other 2 attachments are intended for.


          rescmpv5 utility gives weakly similar for 12.10 driver. So some corruption happening.

          rescmpv5 gives strongly similar for 12.8 driver. Expected.

          But rescmpv5 gives strongly similar for 13.1 driver now with New Data. SURPRISE again.

           

          so as i understand it, there are two issues here:

          1. Data corruption when driver is updated from 12.8 to 12.10. But not reproduced with 13.1 driver, so probably not a issue. Can you confirm?

          2. Driver crash when driver updated from 12.10 to 13.1. This is valid for the old data itself.

           

          I will try to do some debugging on codeXL too, and let you know.

           

          Hi Raistmer,

          Will it be possible to give a testcase with the host code. I tried working with the kernel file, but there are so many kernels (which are enabled/disabled using #defines) . Also RESULT_SIZE seems to be a macro defined in Host code and used in kernels. I could not compile the kernels in KernelAnalyzer because of this macro.

           

          Message was edited by: Himanshu Gautam

          Regards

          Himanshu , Bruhaspati

          --------------------------------

          The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

          • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
            Raistmer Apprentice
            Currently Being Moderated

            Hi, Himanshu.

            Thanks for continuing to look into this issue.

            Regarding no crash under Cat 13.1 - no idea for now, maybe Claggy or other alpha tester who follows this thread will bring some idea. I can't maintain test configs by myself now cause need stable environment so stick with Cat 12.1 on main PC and "unknown" version of Catalyst (but old too) on C-60 netbook. Info about app behavior on latest drivers comes from alpha testers.

             

            And regarding host code - of course, no problems with this. It's GPLed app with freely available sources.

            So you can look directly into repository (head or that revision that I used for test case binary). Suggestions and improvements are welcomed!

            Here is repository:

            https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt

            and for this particular app you need files in root + these dirs:

            https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt/AKv8

            https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt/bin

            https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt/lib

            https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt/src

             

            P.S. and defines you looking for in GPU_lock.cpp file:

             



            strcpy(buildoptions,"-w -DRESULT_SIZE=32 -cl-unsafe-math-optimizations -fno-bin-llvmir -fno-bin-amdil");

            if(swi.analysis_cfg.autocorr_fftlen) strcat(buildoptions," -DSETI7");//R: dynamically define if autocorr is needed
            • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
              himanshu.gautam Master
              Currently Being Moderated

              Thanks Raistmer for the update.

               

              Are you sure -cl-unsafe-math-optimizations flag is not causing the data corruption issue?

              I will try to look into the code base in some days. Meanwhile If you can arrange for more information, from claggy and team, it would be helpful.

              Regards

              Himanshu , Bruhaspati

              --------------------------------

              The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

              • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                Raistmer Apprentice
                Currently Being Moderated

                It was not the case with older drivers. But maybe new one enabled some more "unsafe" optimizations indeed. Worth to check, I will, thanks.

                Regarding kernel file compilation issues - they were observed under Linux too. Not a crash but some "internal error" instead:

                 

                Error : Building Program (source, clBuildProgram):main kernels: not OK code -11

                Internal error: Compilation failed.

                 

                (it's on Catalyst 13.2 beta7 )

                 

                It's the same app, just its Linux port. We will try to narrow issue location inside CL file.

                • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                  himanshu.gautam Master
                  Currently Being Moderated

                  Error : Building Program (source, clBuildProgram):main kernels: not OK code -11

                  Internal error: Compilation failed.

                   

                  -11 is the kernel compilation failed. Check out the build log from clGetProgramBuildInfo API.

                  Regards

                  Himanshu , Bruhaspati

                  --------------------------------

                  The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                  • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                    freighter Newbie
                    Currently Being Moderated

                    himanshu.gautam schrieb:

                     

                    Error : Building Program (source, clBuildProgram):main kernels: not OK code -11

                    Internal error: Compilation failed.

                     

                    -11 is the kernel compilation failed. Check out the build log from clGetProgramBuildInfo API.

                    That is the complete Buildlog:

                    "Internal error:Compilation failed."

                     

                    Attached the output from AMD APP KernelAnalyzer2 for our MultiBeam_Kernels.cl

                    • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                      himanshu.gautam Master
                      Currently Being Moderated

                      I am able to compile both the kernel files Multibeam_kernels_r1726.cl & Multibeam_kernel_r1643.cl with the above mentioned build options with 13.1 Driver. 13.2 is in beta, so I recommend to try 13.1 only. Kernel Analyzer with attached Info, built both kernels for all 18 OpenCL devices.

                      Regards

                      Himanshu , Bruhaspati

                      --------------------------------

                      The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                    • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                      Raistmer Apprentice
                      Currently Being Moderated

                      Did you try under Windows or under Linux ?

                      This subthread about Linux and your screenshot very resembles Windows version "About" dialog...

                      • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                        himanshu.gautam Master
                        Currently Being Moderated

                        Hi Raistmer,

                        Yes , i tried in windows just like everything else. I will try on linux today, and let you know.

                         

                        But I am afraid, we have not been able to nail down the problem that i can forward to Some relevant people for fixing.

                        As of now, here are the inferences:

                        1. Both Kernels compile fine on Kernel Analyzer on Windows ( so kernel compilation is probably not the issue). Need to check on linux though.

                        2. Two testcases were given, first testcase produces driver crash. Can you confirm it is not a very time consuming kernel? The reason for driver crash may just be VPU recover. The second testcase, which only differs in some date files, passes properly (with strong correctness).

                         

                        Do you have any ideas?

                        Regards

                        Himanshu , Bruhaspati

                        --------------------------------

                        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                        • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                          Raistmer Apprentice
                          Currently Being Moderated

                          1) No need to re-run on Linux. Linux dev confirmed that he can compile under Linux now (after driver reinstall and Linux update). So it was some glitch and he can't reproduce it now. Hence no need to waste more time on this.

                          2) Driver crash on execution come most probably from too lengthly kernel.

                          As I reported in another thread there are 12 registers allocated per kernel workitem under Cat 12.8 while only 5 under Cat 13.1. So, I think register spills are inevitable that degrades performance under red line of driver restart.

                          Driver crash on compilation stage is quite another issue but again, I'm not quite sure is it stable reproducible (you saw this issue once but it gone after one more driver reinstall).

                          Also, app build with different kernels that uses more local memory doesn't suffer from too long kernel execution time so we can just go with that app build for now. Not show-stopper issue.

                          3) But issue with Catalyst 12.10 is reproducible (unfortunately) and plagues both kinds of app (not all data patterns but in general we can't restrict app from reciving different kind of data patterns as input). Loks like it was fixed in Cat 13.1 though.

                          Cause hotfix for Cat 12.10 in the presence of 13.1 is highly unlikely I think there is currently no more issues to solve with your help. Thanks a lot for participating! I will create new thread in case latest Catalysts will cause some headaches to keep issue easier to follow. Kernel performance degradation under Cat 13.1 is worth to look into though. So I attach ISA's under different Catalysts (12.8 and 13.1). There is another thread about another (not mine) app that sees same issue of lowering registers usage under 13.1 so maybe worth to look for Catalyst compiler devs.

                          • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                            himanshu.gautam Master
                            Currently Being Moderated

                            Hi Raistmer,

                            Thanks for the update.

                            I guess I have seen many issues where performance drop down with 13.1. driver has been reported.   I will attach these ISAs to some relevent people. Can you let me know if these ISAs are for kernels Multibeam_kernels_r*.cl files you had attached earlier. May be you can give the names of the specific kernels.

                             

                            Hi Raistmer,

                            The kernel files have ~20 different kernels. I cannot find kernel with name PulseFind. I am assuming you meant PC_find_pulse_partial_kernel1_cl. I am forwarding it to OpenCL Compiler team.

                             

                            Message was edited by: Himanshu Gautam

                            Regards

                            Himanshu , Bruhaspati

                            --------------------------------

                            The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                  • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                    Claggy Newbie
                    Currently Being Moderated

                    Still getting this failed compilation on the Cat 13.3 Beta 3 drivers on Windows 7 x64:

                     

                    OpenCL-kernels filename : MultiBeam_Kernels_r1779.cl

                    INFO: can't open binary kernel file: .\\MultiBeam_Kernels_r1779.clHD5_Capeverde.bin_V7, continue with recompile...

                    Error : Building Program (source, clBuildProgram):main kernels: not OK code -11

                    Internal error: Compilation failed.

                     

                    Claggy

                     

                    Edit: this problem has been worked around, details in this thread:

                     

                    http://devgurus.amd.com/message/1288791#1288791

              • Re: Problems with Cat 12.10 and up and HD7xxx (and not only) GPUs
                Raistmer Apprentice
                Currently Being Moderated

                himanshu.gautam wrote:

                 

                Thanks Raistmer for the update.

                 

                Are you sure -cl-unsafe-math-optimizations flag is not causing the data corruption issue?

                I will try to look into the code base in some days. Meanwhile If you can arrange for more information, from claggy and team, it would be helpful.

                Checked, -cl-unsafe-math-optimizations flag has no influence on invalid resuls.

More Like This

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points