0 Replies Latest reply: Nov 30, 2012 1:01 AM by nmanjofo RSS

Compute Shader Problem

nmanjofo Newbie
Currently Being Moderated

I'm implementing a simple N-Body simulation using DX11 & Compute Shader, running on GTX 280. Theory behind is based on this article:http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html

 

I also noticed that such simulation is already a part of MS DX SDK (nBodyGravityCS11), where I took some inspiration.

 

The problem I encountered:

 

 

void body_body_interaction(inout float3 ai, float4 bi, float4 bj)

{

    float3 r = bj.xyz - bi.xyz;

 

    float distSqr = dot(r, r);

    distSqr += g_softeningFactorSq;

 

    float distInvCube = 1.0f / sqrt(distSqr * distSqr * distSqr);

 

    //ai += g_FG * bj.w * distInvCube * r; - NOT WORKING

    ai += g_FG *g_fParticleMass * distInvCube * r; //WORKS, g_fParticleMass can be either in cbuffer or global constant, both work

}

 

Variable bj (xyz - position, w - mass) is at first loaded to shared memory, then GroupMemoryBarrierWithGroupSync() is called to sync group.

 

[loop]

for(uint block=0; block< num_blocks; ++block)

{

    //Fetch positions to shared cache

    sh_Positions[indexGroup] = oldPar[block * BLOCK_SIZE + indexGroup].pos;

    GroupMemoryBarrierWithGroupSync();

 

    [unroll]

    for(uint i = 0; i<BLOCK_SIZE; i+=8)

    {

        body_body_interaction(accel, myParticle.pos, sh_Positions[i]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+1]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+2]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+3]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+4]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+5]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+6]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+7]);

    }

 

    GroupMemoryBarrierWithGroupSync();

}

 

If I use mass stored in bj.w, I end up with NaNs as a result of simulation, even after very first step. Particle positions are correct, because when I choose particle weight from cbuffer or from global constant, simulation works. I init all particle weights to the same number, same as the g_fParticleMass constant in shader.

 

Funy about this is that if I do the same thing in MS example I mentioned above, the result is very same - I get no output and buffer contains NaNs. Why am I unable to use 4th vector component from a shared memory in this case?? It is initialized properly on CPU side and the copied to GPU (verified)

 

Full shader code here: http://pastebin.com/SJhs8ntthttp://pastebin.com/SJhs8ntt

 

Thank You very much!

More Like This

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points