11 Replies Latest reply: Mar 26, 2013 4:27 AM by sk7041 RSS

AMD OpenCL - compiler segmentation fault

sk7041 Newbie
Currently Being Moderated

Hello All,

 

I have recently been testing my OpenCL code on an AMD HD7970 GPU, and some of my kernels are causing the compiler to crash with a segmentation fault at clBuildProgram() . I would like to mention that the kernels compile and run fine on any NVIDIA device, and on CPU with AMD SDK.

 

Information about my system:

 

Description:  Debian GNU/Linux 6.0.1 (squeeze)

Arch: x86_64

CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4200+

GPU: AMD HD7970

AMD OpenCL SDK: 2.8

Driver: AMD Catalyst proprietary driver 12.10

 

Here are the back traces given by GDB for the 2 kernels that produce the segmentation fault. I would like to mention that these are ~400 lines long kernels, with many nested 'for' loops, and requiring a fairly large amount of private memory.

 

back trace for kernel 1:

 

Program received signal SIGSEGV, Segmentation fault.

0x00007ffff49a1b88 in SCRegSpill::CreateReload(SCInst*, int, SCInst*, SCBlock*, bitset*, bitset*, int) () from /usr/lib/libamdocl64.so

(gdb) bt

#0  0x00007ffff49a1b88 in SCRegSpill::CreateReload(SCInst*, int, SCInst*, SCBlock*, bitset*, bitset*, int) () from /usr/lib/libamdocl64.so

#1  0x00007ffff49b2533 in SCRegSpill::Spill() () from /usr/lib/libamdocl64.so

#2  0x00007ffff49b5160 in SCRegAlloc::Allocate(bool) () from /usr/lib/libamdocl64.so

#3  0x00007ffff49b54af in SCRegAlloc::AllocateRegisters() () from /usr/lib/libamdocl64.so

#4  0x00007ffff45b0b5f in CompilerBase::GenerateCodeUsingNewIR(void*, bool) () from /usr/lib/libamdocl64.so

#5  0x00007ffff45b6764 in Compiler::Compile(ILProgram*) () from /usr/lib/libamdocl64.so

#6  0x00007ffff45b6ee0 in Compiler::CompileShader(unsigned char*, unsigned char*, unsigned int const*, CompilerExternal*) ()

   from /usr/lib/libamdocl64.so

#7  0x00007ffff45b3227 in CompilerExternal::CompileShader(_SC_SRCSHADER const*, _SC_HWSHADER*) () from /usr/lib/libamdocl64.so

#8  0x00007ffff49cffc2 in scWrapCompileBinarySI(void*, unsigned int, void**, unsigned int*, unsigned int, unsigned int, scWrapOptionEnum*)

    () from /usr/lib/libamdocl64.so

#9  0x00007ffff458df6b in amuCompCompile () from /usr/lib/libamdocl64.so

#10 0x00007ffff458ecee in ddiCompile () from /usr/lib/libamdocl64.so

#11 0x00007ffff44cb91e in gpu::NullKernel::create(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, void const*, unsigned long) ()

   from /usr/lib/libamdocl64.so

#12 0x00007ffff44d05d3 in gpu::Kernel::create(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, void const*, unsigned long) ()

   from /usr/lib/libamdocl64.so

#13 0x00007ffff44df058 in gpu::Program::createKernel(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, gpu::Kernel::InitData const*, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, bool*, void const*, unsigned long) ()

   from /usr/lib/libamdocl64.so

#14 0x00007ffff44de2ca in gpu::NullProgram::linkImpl(amd::option::Options*) () from /usr/lib/libamdocl64.so

#15 0x00007ffff4479055 in device::Program::build(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, char const*, amd::option::Options*) () from /usr/lib/libamdocl64.so

#16 0x00007ffff4489030 in amd::Program::build(stlp_std::vector<amd::Device*, stlp_std::allocator<amd::Device*> > const&, char const*, void (*)(_cl_program*, void*), void*, bool) () from /usr/lib/libamdocl64.so

#17 0x00007ffff4466ff3 in clBuildProgram () from /usr/lib/libamdocl64.so

 

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

back trace for kernel 2:

 

Program received signal SIGSEGV, Segmentation fault.

0x00007ffff4936199 in SC_SCCGCM::GetEarly(SCInst*) () from /usr/lib/libamdocl64.so

(gdb) bt

#0  0x00007ffff4936199 in SC_SCCGCM::GetEarly(SCInst*) () from /usr/lib/libamdocl64.so

#1  0x00007ffff49363a4 in SC_SCCGCM::ComputeEarlyPosition(SCInst*, FuncRegion*) () from /usr/lib/libamdocl64.so

#2  0x00007ffff49c5848 in SC_SCCGVN::GVNSCCInst(SCInst*, SC_SCCVN*) () from /usr/lib/libamdocl64.so

#3  0x00007ffff49c7704 in SCCVNBase<SCInst, SC_CurrentValue>::VNSCCInst(SCInst*) () from /usr/lib/libamdocl64.so

#4  0x00007ffff49c6ff5 in SC_SCCBLK::VNSCCItem(int) () from /usr/lib/libamdocl64.so

#5  0x00007ffff49c7a97 in void SCCVNBase<SCInst, SC_CurrentValue>::ProcessSCC<SC_SCCBLK>(SC_SCCBLK*, int) () from /usr/lib/libamdocl64.so

#6  0x00007ffff4938a5f in SCC_BASE<SCBlock>::SCC(SCBlock*) () from /usr/lib/libamdocl64.so

#7  0x00007ffff49c69ad in SC_SCCBLK::Traversal() () from /usr/lib/libamdocl64.so

#8  0x00007ffff45b07d3 in CompilerBase::GenerateCodeUsingNewIR(void*, bool) () from /usr/lib/libamdocl64.so

#9  0x00007ffff45b6764 in Compiler::Compile(ILProgram*) () from /usr/lib/libamdocl64.so

#10 0x00007ffff45b6ee0 in Compiler::CompileShader(unsigned char*, unsigned char*, unsigned int const*, CompilerExternal*) ()

   from /usr/lib/libamdocl64.so

#11 0x00007ffff45b3227 in CompilerExternal::CompileShader(_SC_SRCSHADER const*, _SC_HWSHADER*) () from /usr/lib/libamdocl64.so

#12 0x00007ffff49cffc2 in scWrapCompileBinarySI(void*, unsigned int, void**, unsigned int*, unsigned int, unsigned int, scWrapOptionEnum*)

    () from /usr/lib/libamdocl64.so

#13 0x00007ffff458df6b in amuCompCompile () from /usr/lib/libamdocl64.so

#14 0x00007ffff458ecee in ddiCompile () from /usr/lib/libamdocl64.so

#15 0x00007ffff44cb91e in gpu::NullKernel::create(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, void const*, unsigned long) ()

   from /usr/lib/libamdocl64.so

#16 0x00007ffff44d05d3 in gpu::Kernel::create(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, void const*, unsigned long) ()

   from /usr/lib/libamdocl64.so

#17 0x00007ffff44df058 in gpu::Program::createKernel(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, gpu::Kernel::InitData const*, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, bool*, void const*, unsigned long) ()

   from /usr/lib/libamdocl64.so

#18 0x00007ffff44de2ca in gpu::NullProgram::linkImpl(amd::option::Options*) () from /usr/lib/libamdocl64.so

#19 0x00007ffff4479055 in device::Program::build(stlp_std::basic_string<char, stlp_std::char_traits<char>, stlp_std::allocator<char> > const&, char const*, amd::option::Options*) () from /usr/lib/libamdocl64.so

#20 0x00007ffff4489030 in amd::Program::build(stlp_std::vector<amd::Device*, stlp_std::allocator<amd::Device*> > const&, char const*, void (*)(_cl_program*, void*), void*, bool) () from /usr/lib/libamdocl64.so

#21 0x00007ffff4466ff3 in clBuildProgram () from /usr/lib/libamdocl64.so

 

The kernel code is proprietary code so I cannot post it on this forum, but I accept sending it to the AMD compiler dev team if need be. Please get in touch if you would like me to do so.

 

Regards,

 

Simon

  • Re: AMD OpenCL - compiler segmentation fault
    himanshu.gautam Master
    Currently Being Moderated

    13.1 is the latest driver. Can you try with the latest driver?

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

  • Re: AMD OpenCL - compiler segmentation fault
    himanshu.gautam Master
    Currently Being Moderated

    Hi,

    I am not sure if there is a private channel for sending bugs to AMD.

    There used to be ticket mechanism earlier, but is no longer present. http://developer.amd.com/support/

     

    Anyways, it is very helpful for us here to confirm & fix a issue, if it is easily reproducible. I would suggest you to do some homework and try coming up with a simple testcase that can be shared here.

     

    Also Try CodeXL and Stream Kernel analyer. You might find some help with your kernel.

    Regards

    Himanshu , Bruhaspati

    --------------------------------

    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

    • Re: AMD OpenCL - compiler segmentation fault
      sk7041 Newbie
      Currently Being Moderated

      Hi Himanshu,

       

      Thank you for your suggestions. Using the latest stable driver (13.1) makes no difference, I still get the same seg faults. I also noted that kernels that compile can take up to 15 seconds to compile, and reading/writing buffers of ~100MB can take up to 20 seconds. This is a ridiculous overhead, considering the computation takes about 20 seconds to complete.

       

      Regards,

       

      Simon

      • Re: AMD OpenCL - compiler segmentation fault
        himanshu.gautam Master
        Currently Being Moderated

        20 seconds for 100MB?

         

        Even if this is via PCIe, It should not be greater than 20 milliseconds.

        So this is 1000x slower.

         

        Can you run the bufferbandwidth SDK sample and post the results?

        Regards

        Himanshu , Bruhaspati

        --------------------------------

        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

        • Re: AMD OpenCL - compiler segmentation fault
          sk7041 Newbie
          Currently Being Moderated

          Hi Himanshu,

           

          Thank you for your help, I am sure it is not a hardware problem. Here is the output from bufferbandwidth:

           

          Platform found : Advanced Micro Devices, Inc.

           

          Device  0            Tahiti

          Build:               DEBUG

          GPU work items:      32768

          Buffer size:         33554432

          CPU workers:         1

          Timing loops:        20

          Repeats:             1

          Kernel loops:        20

          inputBuffer:         CL_MEM_READ_ONLY

          outputBuffer:        CL_MEM_WRITE_ONLY

           

          Host baseline (naive):

           

          Timer resolution     1690.84 ns

          Page fault           4445.19 ns

          CPU read             3.29 GB/s

          memcpy()             2.66 GB/s

          memset(,1,)          3.69 GB/s

          memset(,0,)          3.69 GB/s

           

           

          AVERAGES (over loops 2 - 19, use -l for complete log)

          --------

           

          1. Host mapped write to inputBuffer

           

                clEnqueueMapBuffer(WRITE):  0.012605 s [     2.66 GB/s ]

                                 memset():  0.019332 s       1.74 GB/s

                clEnqueueUnmapMemObject():  0.010468 s [     3.21 GB/s ]

           

          2. GPU kernel read of inputBuffer

           

                 clEnqueueNDRangeKernel():  0.004686 s     143.22 GB/s

                           verification ok

           

          3. GPU kernel write to outputBuffer

           

                 clEnqueueNDRangeKernel():  0.005745 s     116.82 GB/s

           

          4. Host mapped read of outputBuffer

           

                 clEnqueueMapBuffer(READ):  0.011147 s [     3.01 GB/s ]

                                 CPU read:  0.023514 s       1.43 GB/s

                           verification ok

                clEnqueueUnmapMemObject():  0.000027 s [  1229.44 GB/s ]

           

           

          Passed!

           

          And here is the output from my program (buffers are written before any computation has started on the device):

           

          Device [0]

              Device ID: 0xb714df0

              Device name: Tahiti

              Device vendor: Advanced Micro Devices, Inc.

              Device compute units: 32 - Clock freq: 925MHz

              Device global mem: 2048MB - Device type: 4

           

          Wrote Buffer of Size: 768 bytes (0MB) in 2.88797 seconds

          Wrote Buffer of Size: 13285376 bytes (12MB) in 0.062138 seconds

          Wrote Buffer of Size: 127744 bytes (0MB) in 0.054394 seconds

          Wrote Buffer of Size: 510976000 bytes (487MB) in 9.27033 seconds

          Wrote Buffer of Size: 9216 bytes (0MB) in 0.000461 seconds

          Wrote Buffer of Size: 172709888 bytes (164MB) in 0.358325 seconds

          Wrote Buffer of Size: 159424512 bytes (152MB) in 45.1698 seconds

          Wrote Buffer of Size: 9216 bytes (0MB) in 0.260514 seconds

          Wrote Buffer of Size: 1532928 bytes (1MB) in 1.24125 seconds

          Wrote Buffer of Size: 79712256 bytes (76MB) in 5.10956 seconds

           

          Notice the 152MB took 45 seconds to write. Sometimes the previous buffer (164MB) takes up to 20 seconds. Not sure why this time it did it in 0.3. In any case, all these numbers are above the PCIe speed you mentioned. It must be a driver issue...(this was tested with 13.2 beta)

           

          Many thanks,

           

          Simon

          • Re: AMD OpenCL - compiler segmentation fault
            himanshu.gautam Master
            Currently Being Moderated

            Hi Simon,

            As per Buffer Bandwidth Sample you are getting ~3GBps of read and write speed. Can you check the sample and try to figure out, what you might be doing wrong.

            Reaching optimal data transfer speeds is an art in itself . But in any case it should not be that slow. Do check your timers are reliable, and you are not timing something extra with data transfer.

            Regards

            Himanshu , Bruhaspati

            --------------------------------

            The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

            • Re: AMD OpenCL - compiler segmentation fault
              sk7041 Newbie
              Currently Being Moderated

              Hi Himanshu,

               

              Thank you for suggesting implementing optimal data transfer. I can assure you that my timers are reliable as shown by running the same executable on 6 other devices from different vendors, and getting data transfer time of at most 0.2 seconds. This is a driver issue..

               

              Regards,

               

              Simon

              • Re: AMD OpenCL - compiler segmentation fault
                himanshu.gautam Master
                Currently Being Moderated

                Hi Simon,

                Can you post some code here (in ZIP) that can showcase this long time taken for data transfer. I will try to reproduce and suggest changes that might help you improve the transfer speed.

                Regards

                Himanshu , Bruhaspati

                --------------------------------

                The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

                • Re: AMD OpenCL - compiler segmentation fault
                  sk7041 Newbie
                  Currently Being Moderated

                  Hi Himanshu,

                   

                  It seems like that problem was fixed by reducing the amount of private memory requested in a kernel. One specific kernel that uses a lot of private memory arrays is compiled before any memory transfers are made from the host to the device and it seems that reducing the size of these arrays has fixed the problem of slow host-to-device memory transfers, but I am not too sure why. Does the OpenCL context allocate device private memory for a kernel as it is compiled, or at runtime? If allocation happens at compile time then this could have been the cause of my problems.

                   

                  Anyway all works well now, so to sum up:

                  1) Update to newest drivers to avoid compiler segmentation fault.

                  2) Beware of how  much private memory your kernels are using!

                   

                  Many thanks for your help.

                   

                  Regards,

                   

                  Simon

    • Re: AMD OpenCL - compiler segmentation fault
      sk7041 Newbie
      Currently Being Moderated

      Hi Himanshu,

       

      I would like to mention that using the latest beta driver (13.2 beta) has fixed the segmentation faults. However, the kernels still take a long time to compile, and memory read/writes are still very lengthy...

       

      Regards,

       

      Simon

      • Re: AMD OpenCL - compiler segmentation fault
        himanshu.gautam Master
        Currently Being Moderated

        Hi Simon,

        Thanks for keeping us in loop. It is good to hear that 13.2 fixed the crashes.

         

        Checkout Table 4.2 in AMD OpenCL Programming Guide for getting optimal data transfer though. It might be helpful.

        Regards

        Himanshu , Bruhaspati

        --------------------------------

        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied

More Like This

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points