Drawing a LOT of sprites

August 29, 2016 | Filed under: programming

I’m doing early work on my next game, a completely new IP. I’ll be announcing it in a few months. Anyway… it involves drawing a big world with a LOT of objects in it. tens of thousands on screen probably. Drawing 10,000 objects in 2D is not as simple as you think.

If you are a non coder, or someone who only ever uses middleware, you might think ‘the new video cards can draw 10,000,000 polys per frame, what’s the problem? and indeed there is *no problem* if you want to set a single texture and then splat 5 million sprites on the screen that show it. Thats one (well…probably several) draw call, one render state, one texture. Even really old video cards like that.

The problem is when you have a lot of different textures and want to swap between them, because for engine-related reasons, you need to draw stuff in a specific order. In a 3D world, you can use a Z-buffer and draw out of order, but with 2D objects with some soft aliased edges, that looks BAD. The good old fashioned painters-algorithm is your friend. The problem is, if you draw back to front and the sprite textures needed go A B A B A B, you are kinda fucked…that means a lot of texture changes, and in directx9 (which I use for compatibility reasons), texture changes mean separate draw calls, which stalls the video cards, and is sloooowwwww.

Relevant video from GSB2:

So what are the workarounds?

Texture atlases. This is the obvious one. Stick A & B in the same texture, and you are laughing, suddenly stuff is a LOT quicker. This only solves the texture issue, not drawing to different render targets, but you can defer those draws anyway and do them separately (GSB 2 does this). Texture atlases are an obvious ‘win’ even if they only halve the texture changes. The problems here are that you either need to know what textures will follow each other and pre-compile texture atlases (something I’m trying right now), or you need to dynamically create texture atlases based on recent drawing, and effectively use an off-screen render target as a texture ‘cache’. I tried this recently…and it was actually slower :(

Dirty-rects. Basically draw the whole scene once, and save it in an offscreen buffer, and use it as your background, a single quad blaps the whole screen, and you only draw stuff that has changed / is animating. This, as I recall was used by sim city 4. The only problem is that scrolling really causes hell.

Intelligent grouping. The painters algorithm is only really needed where stuff overlaps. if I draw a tile, then draw a sprite on top of it, I need the tile first, but there is no reason why I can’t draw all the tiles first, then the contents. That means I can then sort the tiles by texture and draw them in a handful of calls (or one, if the tiles all fit into an atlas). You can do this at pretty much any level, effectively drawing ‘out-of-order’ but with caveats. Again, GSB2 does this, with various objects, especially debris and asteroids. In fact it goes one stage further by scanning ahead with each draw call to see if some non conflicting later objects could be ‘collapsed’ into the current draw call.

Multi-threading and other speed boosts. If you have too many draw calls and things are too slow, then you can expand on the time available to make draw calls. Essentially you have two threads, one which prepares everything to be drawn, and the draw-call thread, which makes all your directx calls. This way they both run in parallel (also note that the directx runtime will be another thread, and the video card driver another one, so you have 2 threads less than you think. With a hyperthreaded 4 core chip, you have 8 truly simultaneous threads, so you give away 2, have 1 core thread, 1 render thread and 2 extra ‘worker threads’ spare. Because of my own disorganisation, I tend to have directx called from my main thread, which means I do the inverse. GSB2 did this, with all of the transformation stuff for the asteroids, debris and other bits and pieces handed to a bunch of threads while I was busy with other stuff, then returning to the main thread to present the draw calls. Less efficient, but way better than single threading.

Hybrids. All of the above techniques seem valid to me. Although I am currently fixated on pre-compiled texture atlases, I’ll definitely use multithreading and probably some of the others. With some parts of a game, a specific optimisation system may work well, and with others it could be useless. It really is specific to what you draw, and is why I prefer a hand crafted engine to a generic one.

My basic problem is that (without explaining what the game is), I have a small number of ‘components’ that make up a tiny scene on a tile. There will be a lot of components per tile, as I want a fairly ‘busy’ look, but rendering them all individually may be ‘too much’. What I may end up doing is pre-rendering each conceivable tile as an offline-step, to reduce the number of calls. I’d like that to be a last minute thing though, so I can keep editing what the scenes look like. IU also want sections of each tile to animate, or be editable and customizable, which means there is less scope to pre-render them.

It will make a lot more sense once I announce the game :D

7 Responses to “Drawing a LOT of sprites”

  1. Thomas says:

    Would it work to render the entire thing at 2x or 3x resolution with hard edges, so you can use the z-buffer, then downsample the whole thing?

  2. Alex says:

    Also consider a texture array instead of atlas, and encode the array index in the vertex data. This avoids the need to bake an atlas offline and the filtering artefacts that can introduce.

    • cliffski says:

      Indeed. I admit I’ve not used texture arrays, partly because to embed extra data in my vertex format means…changing my (minimalist) vertex format, and thus my vertex buffer code and thus my base sprite class and… it just involves a lot of messing around :D
      One day maybe! (although TBH then I’ll have moved to a DX version with less concerns re texture swapping).

  3. Les says:

    In Age of Fear game, we are commonly drawing ca. 1000-1500 objects (textures and semi-transparent shapes). Almost all alpha-ed.

    Engine (in java) does the following optimisations:
    – depending on supported hardware, most of operations are processed on GPU
    – downsample sprite based on viewing distance and cache it (cache takes roughly 600-700 MB)
    – only dirty rectangles are painted and engine is merging dirty regions into one if overlapping region is big enough (i.e. it’s cheaper to draw once than overdraw)
    – delayed texture loading into graphics memory (upon first use)
    – objects have their boundaries checked and repainted only if they are on dirty region
    – most of stuff is double buffered (helps to batch graphic operations)
    – background texture is tiled (multiple small textures instead of single one)
    – scrolling uses offhand buffer so it’s not that bad. Animated scrolling is bad
    – AI is using only half of available cores (so drawing has some cores left)

    • cliffski says:

      Interesting. isn’t your downscaling the same as the mipmapping the video card should do for free?

      • Les says:

        Depends on system – OS X and Linux has no problem downsampling itself.

        Windows works much faster when alphaed sprite is downscaled and rotated – it might be Java thing