Game Design, Programming and running a one-man games business…

My big slowdown function

Well I feared as much when I wrote this: but it turns out that, yup, that function is the slowest in the entire engine, at least until battle is joined. I really need to fix it.

Essentially what I’m doing there is maintaining a depth buffer for objects in the game. I then do some fancy processing on that buffer (all taking place in video card memory. I then really badly need to know the values of the depth buffer for about 100 different points, and based on the outcome of that, I either don’t draw, draw some stuff quite small, or draw it really big. In short, I’m scaling an object based on specific values of the depth buffer.

Right now, my engine does not use vertex shaders at all. I just use pixel shaders, and have vertex shaders as NULL. I’m pretty sure the solution to my problem is easy if I go to draw all of these objects, then write a vertex shader that can scale the object accordingly. the thing is, I’m using directx9 and therefore I really do have separate vertex and pixel; shaders. This is going to involve me reading up on the most undocumented stuff ever, which how vertex shaders and pixel shaders can be used in a 2D game under directx9.


3 thoughts on My big slowdown function

  1. In the idea case, a bare-bones vertex shader to handle the same work that happens with the fixed function pipeline should take you very very little time. But you seem to want something more complex.

    If I understand correctly, what you want to do is try to read from that ‘depth buffer’ into your vertex shader code, in order to scale or discard some stuff, without copying that depth data back into the CPU.
    This thing is, this will mean you need to read from a texture inside a vertex buffer. I don’t know what your target hardware is, but doing this requires Vertex Texture Fetch capabilities, which is a Shader Model 3.0 feature, and has a sad history of not being supported on many DirectX 9.3 cards. NVIDIA 6600 and above support it, but ATI cards which are Dx9.3 (but not yet DX10) don’t properly support it. They do support an alternative technique called Render To Vertex Buffer (R2VB), but it’s an ATI hack, so you’d need two code paths to properly support this.

    AFAIK, all DX10 cards or newer support it through a SM3.0 shader, even when used through the DX9 API, but those pre-DX10 ATI cards will have issues.

  2. I’ve kind of got round it by keeping my current read-back system but doing an extra stage where I use a tiny tiny texture as an intermediate stage. It seems like locking a small part of a big render target takes forever, but locking the whole of a small render target is comparatively very quick :)

Comments are currently closed.