So Gratuitous Space Battles 2 is running really well (50-60FPS even at dual-monitor 5120 res mode) on my dev PC. dev PC specs:

win 7 64bit 8 GB RAM, i7-3770k CPU @ 3.50 GHZ. GeForce GTX 670.

However it can drip pretty low on my HD4000 intel laptop (also an i7, but lower spec). I’ve seen things go to 25 FPS at 1600×900, although that is without all the fancy options off, so maybe it will go higher with them deselected. Ideally I’d get that much better. So what is the problem?

I think it’s too many small draw calls, and sadly, thats kinda the way my engine works.

The basic algorithm of my engine is this:

Update_Everything() (game simulation, partly multithreaded)
Check_what_is_onscreen()
SetRenderTarget()
DrawBigListOfObjectsToRenderTarget()
SetRenderTarget()
DrawBigListOfObjectsToRenderTarget()
...
CompositeAllTheTargetsIntoFinalImage()
GUI()

The problem is all of those lists of objects being drawn. The solution to this in a conventional 3D game (before you all suggest it), is to use a z-buffer, sort all those objects by render state or texture or both, and blap them in a few draw calls. Thats fab, but it doesn’t work with alpha blending. People who do 3D games think alpha blending means particles, but nope, it also means nice fuzzy edges of complex sprite objects. To do the order-independent Z-buffer rendering method, you have to disable proper alpha-blending, and then everything starts to look sharp, boxy and ugly as hell. 3D games sprinkle antialiasing everywhere to try and cover it. With complex sprites layered on top of each other, this just looks dreadful.

The solution is the good old fashioned painters algorithm, meaning drawing in Z order from back to front. This works well and everything looks lovely.

screen1

The problem is that you end up with 4,000 draw calls in a frame, and thenĀ  the HD4000 explodes. Why 4,000? well to get some of my more l33t effects I need to draw a lot of objects four times, so thats only really 1,000 objects. to do proper lighting on rotated objects I can’t group objects of a different rotation, so each angle of an identical object means a separate draw call. Some of my render targets let me draw regardless of that angle, but the problem then becomes textures. If you draw painters algorithm and draw this…

ShipA
ShipB
ShipA
ShipB

Then there is no way to group the ships by texture without screwing it up, if those ships overlap. This is “a pain”. There are some simple things I can do…and I have a system that does them. For example, if I have this big list of sprites to draw and it turns out that occasionally I *do* get ShipA,ShipA Then I identify that, and optimize away the second call by making a single VertexBuffer call for both sprites. (or both particle systems, in those cases) I even have a GUI that shows me when this happens….

engine1

The trouble is, the majority of the time this is NOT happening. There are to my mind two potential solutions, both of them horribly messy:

1) Go through the listĀ  and calculate where I have a ‘ShipA….ShipA’ pair where there is nothing in between them that overlaps either of them, and then re-arrange them so that they are next to each other, thus allowing for a lot more grouping. (This involves some hellish sorting and overlap detection hell).

2) Pre-process everything, building up a database at the start of the rendering of which textures seem to naturally follow on from each other, then render those textures to a temporary ‘scratch’ render target atlas, which I can then index into. This would be fun to code, also amusing to watch the render target itself in the debugger :D Adds a lot of ‘changing texture pointers and UVs after the event’ complexity though.

Be aware that I’m using Direct9, mostly for compatibility reasons, which means that rendering to multiple render targets at once, or doing multithreaded rendering really isn’t an option.

Edit: just spotted a bug with method 2. If I draw 10 instances of ShipA, they may be at different Zooms, so I will only be caching (in my temp atlas) a single image, not the full mip-map chain, meaning the rendering of atlased sprites would lose effective neat mip-mapping and potentially look bad :(

12 Responses to “THE GSB2 engine optimizing post”

  1. e-dog says:

    It’s totally possible to use multiple render targets in DX9, if the hardware supports it (most does).

    Is there a gameplay reason for the ships to be NOT grouped by type in Z?

    Why you can’t group rotated objects? Are there not enough vertex channels to pass all necessary lighting/rotation data via VB?

  2. cliffski says:

    Even if I could do multiple RTS, I use different shaders for each RT, so there would be no reduction in draw calls :(. Regarding rotating, are you suggesting packing rotation data into the vertex structure?

  3. e-dog says:

    Yes, pass rotation data (or lighting, or whatever you need per object) via vertex data instead of shader constants. You’ll need to duplicate it for every vertex (or use hardware instancing), but it’s not a big deal for 4 vertices.

    There are only 16 float4 vectors and you need some of them to pass vertex position, UVs etc., but the rest are usable. Of course, you’ll need to pass them to the pixel shader from the vertex one, if you need that data in the pixel shader. You can also do some processing/unpacking in the vertex shader.

  4. cliffski says:

    Indeed, there is definitely room there (although it complicates things if I change my vertex format, for all kinds of legacy crap reasons…). I use the rotation data in my pixel shader to do some shading, so it would involve passing the data through the VS and looking at it for every pixel. The problem is there are just so many ‘theoretical’ ways to speed stuff up, and each of therm will be optimal under different conditions… I think an easier ‘win’ is to speed up the non-bumnpmapped rendering (which is 75% of it) as I don’t have that problem in those cases, and can probably spot a lot of easy-grouping.

  5. e-dog says:

    I’d say passing data via VB is easier than making atlases at runtime, even if you need to change the vertex format.

  6. Andrew Copland says:

    How about dividing up the screen into overlapping regions and treating them as buckets for your overlapping tests?

  7. ac says:

    I think these screenshots either don’t give the game justice or there’s some sort of problem that I would look at before spending time on extreme optimization trickery.

    My impression of the screenshot is that it’s as if my eyes were focused on a HUD and there was a wall of water between the HUD and the ships. It would be desirable to have the impression such that the ships felt as they were far yet you were focused on them.

    I’d like to see how the game looks if rendered at 2-4x resolution and then resized to screen resolution. The UI, ships and background could be done all rendered at different resolution and then adjusting the blend of those final layers + applying shaders should give best result.

    The problem is that lets say the viewer was say 300-1000 M from the ships and the asteroids were 1000 M + from the view.

    Also lets say that since there’s these HUD-effects on the screen, where is your theoretical focal point if this is supposed to be a view through glass. Either way I don’t think this sort of extremely blurry depth simulation looks good. I played 2 weeks with the depth of field filtering in Skyrim and while it looked good eventually it was still bit annoying in the practise because I don’t like extremely blurry graphics anywhere on the screen – one has to focus *very close* in order to have the background blurry as blurry as the asteroids in these shots. If you are focusing at 300+ M away, the blurriness between objects 300M and 3000M apart is not that big I argue.

    Now if we are not talking about a window + projection but some sort of “space camera” and the player is imagined to look the battle through a monitor, then we can imagine the camera is going to be at infinite focus with wide field and low aperture -> so again – objects at 300M or 3000M distance are going to have very little difference in sharpness.

    I could paste some screenshots from that show space shuttle, ISS and earth all in one shot, or shots from ST:TNG, they look quite similar – there’s no fuzzy asteroids and other blurriness anywhere.

  8. ac says:

    Some reference photos. Taken from spacecraft according to NASA page.
    http://solarsystem.nasa.gov/images/Asteroids.jpg
    http://www.keepbanderabeautiful.org/shuttle-spacestation.jpg

    Notice the lack of anything blurry. What’s needed to get similar effect? Well I don’t know anything about building gfx engines but there’s a bundle of screenshots that show old games rendered at 8K resolution then downsampled that give a look much closer to this than rendering at screen resolution.

  9. ac says:

    (I forgot to add that I’d need to get a small pixel pitch 4K+ display to say definitely whether this blurriness issue needs to be fixed – but if you posted a 4K+ shot I could then resize it down myself to 50% and see what the effect would be roughly)

  10. ac says:

    LG should be coming up with inexpensive 4K OLEDs this year, hopefully one at <25". That could be the solution to much of my issues with PC gfx today.

  11. ac says:

    I also think there should be user definable profiles and post processing fx. This way I could create custom profile for OLED, LCD and CRT use and day and night time each. With user tweakable pixel shaders, the blurriness/sharpening could be adjusted to preference and to suit the used technology.

  12. Tim says:

    For option 2, if you’re going to create an atlas, why not create a few mip levels of it simultaneously?