Game Design, Programming and running a one-man games business…

Thoughts on multi-threaded 2D game development in directx9

Am I the only person doing this? probably. I often am. Most people have moved on from DX9 (I know it so well there is big opportunity cost to updating) or use OpenGL, and very few people are doing 2D games where performance is an issue. I am taking early steps with Gratuitous Space Battles 2, and my aim is to have it run at 60 FPS on average hardware with 2 1920×1080 monitors. I also intend to get it running ok for bigger setups too. That’s a lot of pixels, and due to all sorts of fancyness I’m adding to GSB 2.0, it means a lot of processing. a REAL lot.

So…multithreading! it’s about time i ventured forth. To date, my only multi-threading efforts have been the asynch server communication in GSB 1.0 for challenge uploads etc, and the loading screen for GTB and Democracy 3. Actual mid-game multihthreading has scared me until now.

I hate middleware so I’m not using any libraries, just raw calls to CreateThread, TerminateThread and so on… This might make it more complex, but means I have complete control over stuff. My first experiments were not exactly encouraging. I attempted to speed up the position calculations of asteroids. Now to cut a long story short, I use D3DTLVERTEX style stuff (not hardware Transform and lighting) and for good reason i won’t bore you with. The upshot is, I have a lot of non-directx transform stuff to do for anything drawn on the screen.

An ideal case for multithreading!

threads--friend--grey-cat--thread_3218279

So I wrote code to split up the asteroids into 8 chunks (test case of an 8 core chip), and gave each processor a list of asteroids to process. Result? SLOWER. Actually quite a bit slower. Some fiddling with AQTime (My profiler) let me analyze cache misses for each thread, and I also profiled it as 1 thread. The cache miss rate went through the roof. Basically, my transform code was relying on some global camera data, and I suspect that either:

a) Referencing he camera data was a bottleneck with each thread blocking each other from getting it or…

b) The memory locations of the asteroid transform data was laid out in such a way that all the different threads kept fighting for the same cache lines and generally getting in each others way.

I spent a lot of time reading and fiddling and decided that it wasn’t working (although did manage a decent few speedups in other ways). I then decided that if lots of threads sharing the same job wasn’t going to help, maybe lots of threads doing different (unrelated) jobs would…?

And this is more of a success. I have a function called ProcessFrame() which does a lot of non-directx stuff, such as the aforementioned asteroid transforming, updating engine glows, updating explosion plumes, particle effects and distortion waves blah blah… Until recently, it just did them one after the other. I then realized that although a lot of them accessed the same data (camera position stuff mostly), none of them altered it, and the tasks were quite discrete.  So I packaged them up and sent them to different threads, and then spun in the main thread waiting for them to finish. result? 21% faster. yay? not bad, but not 800% faster, which would have been theoretically do-able(not really but…)

Of course the missing link was that I am then left waiting for the slowest thread. Plus if I have more than 8 tasks, I run out of CPUs. So I re-coded it to have a queue of tasks, and when a thread finished a task, it checked the queue, and only reported it was done when the queue was empty. This was way more efficient, and easier to scale to available cores. Result? 41% faster!

Now obviously a 41% processing speedup is good (although this is pre-render, not render, so probably only a 20% FPS boost) but I can’t help thinking that if not 800%, a 200% speedup of that bit of code must be possible. Debugging cache-misses is hard, as even aqtime will bluescreen occasionally on windows 7 when profiling it. I’m pretty sure it’s some cache, false-sharing issue going on.

In the meantime, GSB is now faster as a result, even if i spend no more time attempting to multithread it (and i will… I’ve only just got going). Anyone else attempting this sort of thing?

 

Steam workshop hopefully happening anyway…

So… I am feeling more motivated about this. The plan is basically to have 3 options in the Democracy 3 mod screen. one is the mod panel which lets you enable/disable mods as right now. This will be in ALL builds of the game. Steam builds will get 2 extra windows.  ! will browse steam workshop entries for the game (I haven’t even started that yet) the other will handle submitting new mods to steam workshop.

Right now, I haven’t even implemented a single steam API call for this, but what I *have* done si the rather tedious step of doing the GUI for submitting a mod. That is 90% done, but the back-end stuff is only half done. By back-end, I mean the stuff I have which will actually pass the mod data onto steam for uploading. Why do I need an intermediate layer?

Well basically steam workshop and cloud save work with individual files, and Democracy 3 mods are NOT individual files. You might have 36 policy icons, some csvs and some text files in your mod, and this doesn’t play nice with steam. Arghhhh! So I am coding a completely hidden-from-the-player translation system which (once you have selected your mod folder) , packs all of those files and their filenames and directory structure into a single packed file, and then submits that to steam workshop. Then the reverse happens with installed steam workshop mods. I haven’t started that bit yet, but the GUI and the packing is done, more or less. I’ll finish it tonight. I’m at ComicCon Saturday, then hopefully Sunday I can finish the Steam back-end and actually have the game submitting real live steam workshop mods.

So theoretically on Monday I then code a nice workshop browser, test it all on Tuesday and patch it into the game on Wednesday. AHAHAHAH. Yeah maybe :D

In practice Tuesday I am involved in ‘family things’ so realistically it will be Thursday/Friday, if that doesn’t clash too badly with ExPlay in Bath, where I’m giving a talk. Holy crap when do I write the talk?.

I may have to miss Downton Abbey at this rate!

Hmmm. Maybe not workshop support then…

Dang. I ad hoped to get steam workshop support in to Democracy 3. however, today is the first time I’ve really looked into it in any depth, and unfortunately it doesn’t seem ideal for the kind of modding Democracy 3 is based around. It is ideally suited for games with a built-in editor, with a publish button that then publishes games to steams cloud save, and which can then be grabbed back from cloud save too.

This is problematic. Mostly because D3 is edited primarily in Excel or other spreadsheet / csv editors. And it involves making new graphics using graphics programs, and generally it involves putting together a collection of 20 or 30 files for a new country, and uploading them as a group, not a single file. To add to the woes, Steam workshop obviously would be separate to my existing efforts to support modding, and is obviously only for steam users.
Democracy 3 is also on sale direct, and through GoG and the MacGameStore. If anyone at apple can be bothered to reply to my emails, I might put this top-selling strategy game on sale through their app store… but that’s another story…

Anyway… as a result of my investigations I’m tempted to put the time I had mentally set aside for workshop integration into just far far better mod-browsing and support within the game itself. It wouldn’t be too difficult to list the current ‘official’ mods in a database and have the game show a list of those, and their installed/available status. Theoretically I could unzip all of the mod files on my server and have the game manage the downloading of those files itself automatically, negating any need for installers, or the possibility of people screwing up installation…

Sometimes thoughts like this lead to a spiral of 18 hour work days and depression, sometimes they lead to 3 hours work, and a great feeling of achievement. You never know till you try it.

Meanwhile Democracy 3 sells like hot cakes. I don’t want to become one of ‘those people’ who keeps going on about sales figures, but it’s doing very nicely and I’m very happy about that :D

Normal mapping question (it’s my day OFF)

EDIT: I’ve fixed it. It’s my own dumbassness. I was setting the sprite_angle variable in the shader AFTER the shader had started, and it was presumably never being set, and populated with garbage (or the ship beforehand) it works a charm now :D

Right so, despite being good at AI coding and optimisation, i suck at this clever stuff you people call ‘3D math’. I was in the pub that day of school. So I enlist you, the all-knowing internet to explain to me like a child what I am doing wrong. Here are 3 images:

normalmapstuff

The left is just the ship, the middle is a normal map thingy (thanks to charles!) and the right is what it looks like on screen given a single light source. The end result is exactly what I wanted, and there is much rejoicing… BUT. It’s screwed up, the position of the light source is wrong, and seemingly random, and sometimes jumps and cycles all over the place. I reckon I have everything sorted except the shader, which is an fx file as follows:

sampler2D g_samSrcColor : register( s0);
sampler2D g_samNormalMapColor: register( s1);
float sprite_angle : register(C0); 

float4 NormalMap( float2 texCoord : TEXCOORD0 ) : COLOR0
{  
    //get the value of the normal at this texturecoord
     float3 normalcolor = tex2D(g_samNormalMapColor, texCoord);

     //convert it to +/- 1.0 range
     normalcolor *= 2.0f;
     normalcolor -= 1.0f;

     float3 LightDirection;
     LightDirection.x = sin(sprite_angle);
     LightDirection.y = cos(sprite_angle);
     LightDirection.z = 0;

     float dot_prod = dot(LightDirection, normalcolor);

     //apply as a tint to the final pixel
       float4 original = tex2D(g_samSrcColor, texCoord); 
       float4 final = original;
       final.r *= dot_prod;
       final.b *= dot_prod;
       final.g *= dot_prod;

       return final;
}

As an idiot, I’m not really sure what I am doing here. What I *think* I’m doing is this: I am drawing a sprite which has a separate normal map (g_samNormalMapColor) I also pass in the current angle of the sprite. I sample the color of the normal map, and convert it into the required range. I then convert the sprites angle into a light direction vector (probably wrongly) and I then do some magic which kids call ‘dot’ which I’m guessing gives me the brightness of the pixel given the angle and the default light direction. I then multiply the original texture color by this brightness to tint my final rendered sprite. Yay.

it’s something clever to do with angles and vectors and stuff isn’t it? explain it to me like I’m an idiot :D

 

And yeah…I’m playing about with Gratuitous Space Battles. It’s a Sunday morning, don’t read too much into it :D

Some optimization tips for game programmers

I’m enjoying myself with some optimizing today (yeah I’m weird like that). So I thought I’d jot down some of my tips for making your game faster. These are general, not language-specific tips.

Never run code you don’t have to run

Seems obvious but few people actually do this. For example, in democracy 3, the simulation calculates the popularity of each policy by asking every voter if they benefit from it. That question is complex, and there are a few hundred policies and 2,000 voters. This takes time. Solution: I only ask them about a policy if I need the answer right now. Some policies can go a dozen turns without the player ever checking their popularity, so why keep calculating it?

Batch Stuff

If you have a dozen icons that are always drawn one after another on the same screen, stick them in a texture atlas. If you are 100% they will never overlap, then draw them in a single draw call. the less texture swaps and draw calls, the faster your code. This is trivial to do in 3D, an absolute nightmare to do properly in 2D, but it’s worth it.

Cache Stuff

If you have a variable that is complex to evaluate, evaluate it once, then cache it until it changes (we tend to call that setting it ‘dirty’). If there is some data that is going to be accessed a LOT, then make a local copy of it. And if you have a lot of stuff to write to disk, to the same file, buffer it. Writing to or reading from a file is slow, especially if you are going to do it a lot. Reading in a single file is much quicker than opening 200 of them one after another.

Don’t use sqrt()

Do you ever use sqrt()? never realised how scarily slow it was? Most of the time you can keep the squared result and use some clever tricks to not actually need the sqrt() result. If you were going to get the sqrt() and compare it against a value, just multiply the comparison value by itself and check it that way instead. it’s amazingly faster.

Use the right container class

Sometimes you will use a list where a vector will do. The vector is MUCH faster. And you know what is faster still. I mean REALLY fast? An array. If you really need speed, and the array size won’t change that often, allocating an array of items is much faster and is worth the overhead.

Re-use objects

If you have a collection of objects that keep getting created and destroyed, you want to wrap that up. Stick a factory object around them to handle their construction and destruction. That way, you can just set them inactive on destruction, and save yourself the hassle of the creation and destruction when it comes time to re-use them. Setting a single flag to say an object is ‘dead’ is way faster than calling a destructor, and resetting is way faster than a constructor.

Obviously there are lots more tips, and you should get a decent commercial profiler. That PC you develop on is a super-computing beast. if your game takes more than a few seconds to load, you are being sloppy.