programming – Cliffski's Blog

Optimizing load times

Posted on June 29, 2025

I recently watched a 2 hour documentary on the ZX spectrum, which means little to people from the USA, but it was a really early computer here in the UK. I am so old I actually had the computer BEFORE that, which was the ZX81, just a year earlier. The ZX81 was laughable by modern standards, and I expect the keyboard I am using has more processing power. It had an amazing 1kb of RAM (yes kb, not MB), no storage, no color, no sound, and no monitor. You needed to tune your TV into it and use that as a black and white monitor. Its this (terrible) PC I used to learn BASIC programming on.

Anyway, one of the features of ZX81/Spectrum days was loading a game from an audio cassette, instead of the alternative, which is copying the source code (line by line) from a gaming magazine and entering the ENTIRE SOURCE CODE of the game if you wanted to play it. Don’t forget, no storage, so if your parents then wanted to watch TV and made you turn it off, you had to type the source code again tomorrow. I can now type very fast… but the documentary also reminded me of another horror of back then, which was the painfully slow process of loading a game.

These days games load…a bit quicker, but frankly not THAT much quicker, especially given the incredible speed of modern hard drives, and massively so when talking about SSDS. Everything is so fats now, from SSD to VRAM bandwidth, to the CPU. Surely games should be able to load almost instantly…and yet they do not. So today I thought I’d stare at some profiling views of loading a large battle in Ridiculous Space Battles to see if I am doing anything dumb…

This is a screengrab from the AMD UProf profiler. My desktop PC has an AMD chip. I’ve started the game, gone to the ‘select mission’ screen, picked one, loaded the deployment screen, then clicked fight, let the game load, and then quit. These are the functions that seem to be taking up most of the time. Rather depressing to see my text engine at the top there… but its a red herring. This is code used to DISPLAY text, nothing to do with loading the actual game. So a better way to look at it is a flame graph:

I love flame graphs. They are so good at presenting visual information about where all the time is going, and also seeing the call-stack depth at various points. This shows everything I did inside WinMain() which is the whole app, but I can focus in on the bit I care about right now which is actual mission loading…

And now its at least relevant. It looks like there are basically 3 big things that happen during the ‘loading battle’ part of the game, and they are “Loading the ships” “Loading the background” “Preloading assets”. The GUI_LoadingBar code is given a big list of textures I know I’ll need in this battle, and it then loads them all in, periodically stopping to update a loading progress bar. Is there anything I can do here?

Well ultimately, although it takes a bit of a call stack to get there, it does look like almost all of the delay here is inside some direct9 functions that load in data. I am very aware of the fact that directx had some super slow functions back in directx9, in its ‘d3dx’ API, which I mostly replaced, but ultimately I am using some of that code still, specifically D3DXCreateTextureFromFileInMemoryEx…

Now I have already tried my best to make stuff fast, because I’ve made sure to first find the texture file (normally a DDS format, which is optimised for directx to use) on disk, and load the whole file into a single buffer in RAM before I even tell directx to do anything. Not only that, but I do have my own ‘pak’ file format, which crunches all of the data together and loads it in one go, which presumably is faster due to less windows O/S file system and antivirus accessing slowdowns. However I’m currently not using that system… so I’ll swap to it (its a 1.8GB pak file with all the graphics in) and see what difference it makes…

Wowzers. It makes almost no difference. I wont even bore you with the graph.

And at this point I start to question how accurate these timings are, so I stick some actual timers in the code. In a test run, the complete run of GUI_Game::Activate() takes 3,831ms and the background initialise is just 0.0099. This is nonsense! I switched from instruction based to time-based sampling in uprof. That doesn’t now give me a flame graph, but it does also flag up that the D3DX png reading code is taking a while. The only png of significance is the background graphic, which my timers suggest is insignificant, but I think this I because it was loaded in the previous screen. I deliberately free textures between screens, but its likely still in RAM… I’ll add timers to the code that loads that file.

Whoah that was cool. I can now put that into excel and pick the slowest loaders…

Loaded [data/gfx/\backgrounds\composite3.png] in 73.0598
Loaded [data/gfx/\scanlines.bmp] in 20.0463
Loaded [data/gfx/\planets\planet6s.dds] in 11.8662
Loaded [data/gfx/\ships\expanse\expanse_stormblade_frigate_damaged.dds] in 10.7132
Loaded [data/gfx/\ships\ascendency\g6battleship.dds] in 9.3622
Loaded [data/gfx/\ships\ascendency\g5frigate.dds] in 6.9765

OMGZ. So yup, that png file is super slow, and my bmp is super slow too. The obvious attempted fix is to convert that png to dds and see if it then loads faster. Its likely larger on disk, but requires virtually no CPU to process compared to png so here goes… That swaps a 2MB png for a 16MB (!!!!) dds file, but is it faster?

Its 208ms compared with 73ms earlier. But frankly this is not an accurate test as some of this stuff may be cached. Also when I compare pngs of the same size, I’m noticing vast differences between how long they take to load:

Loaded [data/gfx/\backgrounds\composite11.png] in 113.9637
Loaded [data/gfx/\backgrounds\composite3.dds] in 208.7471
Loaded [data/gfx/\backgrounds\composite5.png] in 239.3122

So best to do a second run to check…

Loaded [data/gfx/\backgrounds\composite11.png] in 112.8554
Loaded [data/gfx/\backgrounds\composite3.dds] in 84.9467
Loaded [data/gfx/\backgrounds\composite5.png] in 108.4374

WAY too much variation here to be sure of whats going on. To try and be sure my RAM is not flooded with data I’d otherwise be loading, I’ll load Battlefield 2042 to use up some RAM then try again… Interestingly it only takes up 6GB. Trying again anyway…

Loaded [data/gfx/\backgrounds\composite11.png] in 114.0210
Loaded [data/gfx/\backgrounds\composite3.dds] in 85.6767
Loaded [data/gfx/\backgrounds\composite5.png] in 105.8643

Well that IS actually getting a bit more consistent. I’ll do a hard reboot…

Loaded [data/gfx/\backgrounds\composite11.png] in 104.3017
Loaded [data/gfx/\backgrounds\composite3.dds] in 207.8332
Loaded [data/gfx/\backgrounds\composite5.png] in 141.2645

Ok so NO, a hard reboot is the best test, and swapping to DDS files for the huge background graphics is a FAIL. These are 2048 x 2048 images. At least I know that. The total GUI_Game::Activate is 7,847ms. That png is only about 1-2% of this, and it makes me wonder if converting all the dds files to png would in fact be the best solution to speed up load times? The only other option would be to speed up DDS processing somehow. Having done some reading, it IS possible to use multithreading here, but it looks like my actual file-access part of the code is not vaguely the bottleneck, although I’ll split out my code from the directx code to check (and swap back to a png…)

Creating texture [data/gfx/\backgrounds\composite11.png]
PreLoad Code took 1.0205
D3DXCreateTextureFromFileInMemoryEx took 111.4467
PostLoad Code took 0.0001
Creating texture [data/gfx/\backgrounds\composite3.png]
PreLoad Code took 28.4150
D3DXCreateTextureFromFileInMemoryEx took 71.1481
PostLoad Code took 0.0001
Creating texture [data/gfx/\backgrounds\composite5.png]
PreLoad Code took 0.9654
D3DXCreateTextureFromFileInMemoryEx took 105.2158
PostLoad Code took 0.0001

Yeah…so its all the directx code that is the slowdown here. Grok suggests writing my own D3DXCreateTextureFromFileInMemoryEx function, which sounds possible but annoying.

Ok…mad though it sounds, I’ve done that. Lets try again!

Creating texture [data/gfx/\backgrounds\composite11.png]
PreLoad Code took 0.8327
D3DXCreateTextureFromFileInMemoryEx took 103.4365
PostLoad Code took 0.0001
Creating texture [data/gfx/\backgrounds\composite3.png]
PreLoad Code took 0.6053
D3DXCreateTextureFromFileInMemoryEx took 73.9393
PostLoad Code took 0.0002
Creating texture [data/gfx/\backgrounds\composite5.png]
PreLoad Code took 0.9069
D3DXCreateTextureFromFileInMemoryEx took 105.0180
PostLoad Code took 0.0001

Am I just wasting my life? at least I now have the source code to the DDS loader because it is MY code bwahahaha. So I can tryu and get line level profiling of this stuff now… I’ll try the visual studio CPU profiler:

Thanks Microsoft. But there may be more…

The Visual studio flame graph is saying that actually the raw reading from disk of the file IS a major component of all this, and so is a memcpy I do somewhere… Actually its inside the fast DDS loader, so the flame graph is confusing. The DDS loops doing memcpy calls for each line of data. This is very bad. With a big file, there will be 2,048 calls to memcpy just to read it in. Surely we can improve on that? and yet its clear thats what D3DXCreateTextureFromFileInMemoryEx is doing, as seen earlier. Hmmmm. And now people have come to visit and I have to stop work at this vital cliffhanger…

Optimising Ridiculous Space Battles

Posted on May 18, 2025

Due to what seemed to be a compiler bug (omgz) yesterday I thought that large complex battles in Ridiculous Space Battles were hitting performance limits. That appears to not be the case, but it got me back into profiling the bigger battles (20×20 grid size, with up to 25 ships in each square, probably 600 ships in total) to see where the bottlenecks are.

The game is already multithreaded for some tasks, but the first profiling runs for he first 50% of a battle gives me this flame graph from uprof for the main thread:

In practice what I really care about is stuff inside GUI_Game::Draw(). I have to say I am pleasantly surprised with the breakdown as it seems nicely distributed, without any real obvious processor hogs at first glace. Drawing the ship sprites, processing the simulation, post processing (mostly lightmaps), particles, hulks, bullets and then a bunch of minor stuff. Nothing seems too bad. On the other hand there are some things in there that seem quite big given what I know they are doing. For example the lightmap stuff shouldnt be that big a deal, and perhaps should be threadable more? Lightmaps are being drawn not only for every laser ‘bullet’ but also every engine glow, and every one of the many sprites that make up a beam laser. I also draw ship ‘mattes’ to ensure that light does not glow through objects like asteroids. Even so, this doesn’t sound like a lot of CPU?

So this breakdown is showing as expected that a lot of it is within the drawmattes, but even so that seems too much to me. There might be a single ship sprite, but 6 engine glows and the flares from 20 missiles or 6 beam lasers associated with it. How come mattes are so slow? At first I assumed that I was not checking to see if they were onscreen, but I definitely am. Here is the code for that function:

GUI_GetShaderManager()->StartShader("matte.fx");
for (SIM_Ship* ps : PlayerShips)
{
    ps->DrawMatte();
}
for (SIM_Ship* ps : EnemyShips)
{
    ps->DrawMatte();
}

GUI_GetHulks()->DrawMattes();

GUI_GetRenderManager()->Render();
GUI_GetShaderManager()->EndShader("matte.fx");

So nothing exactly too complex. It could be that just traversing the STL list for 600 ships takes time, but frankly 600 is not a big list. Could it actually all be the hulks? Actually uprof suggests a lot of it is… but checking that code, its basically doing the same thing. The game made a note early whether to render a hulk matte, so there is no redundant rendering taking place. Hmmm. Maybe best to look elsewhere for problems. I tried running a battle involving a ludicrous number of bullets. I gave the Expanse ‘Apocalypse Cruiser’ a huge bunch of Gravimetric impulse cannons which have lost of burst fire, and filled a fleet with them.

And the flame graph is actually not too bad again:

Ok, bullet process is now the top task, but it has not gone insane, which is good. And the bulletmanager Draw() is also obviously bigger, but again not insane. I dug a little deeper, and found this nonsense inside the bullet process function:

float radangle = D3DXToRadian(Angle);
float cosa = COS(radangle);
float sina = SIN(radangle);

Looks innocent, but I actually wrote faster versions of sin and cos that use a lookup table for 3,600 versions of each. So basically my precision there is within a tenth of a degree. I probably left it like this because I worried that the quantizing there would make the bullets look like they miss their target when my sim shows that they hit. I checked with grok:

The bullet could be off target by up to approximately 2.62 pixels when using a sine/cosine lookup table with 1/10th degree precision over a distance of 3000 pixels.

Thats interesting, because frankly none of my bullets are firing 3000 pixel ranges, and being visibly off by that amount is actually ‘no big deal’. Its absolutely not a big deal if the bullet has been pre-determined to miss anyway… I guess I *could* be super-cautious and have a flag where if the bullet will miss, I use the lookup table for its movement, and if not I use real math?

And then this is the point in the thinking process where I realize that all my code is bollocks. Its only when the bullets WILL hit the target that they need to change their angle once shot anyway. In other words, some bullets are (for sim purposes) effectively homing-bullets, and some are not. In other words, not only can I use a lookup table for the angles of non-hitting bullets, I do not even need to recalculate the angle for them at all once they have been fired. Jesus I need a coffee…

Coffee consumed… And now checking that this change (caching one-off sin and cos for missing bullets) works and does not break the game… And it works fine. I am now aware of a much bigger issue: When a bullet expires, it removes its sprite from the list of lightmaps to be drawn in the lightmap code. This is a big deal, because that list may well have 6-10,000 entries. Removing an item from a list that big all the time is hitting performance in this perverse case with thousands of bullets. I need a better solution…

Checking UProf I can see that GUI_Bullet::SetActive() has a cache miss rate of 66%. Thats pretty dire, the worst of the top 20 most processor intensive functions. Yikes. And yet…

2 minutes chatting to grok (XAI’s chatbot) gave me the frankly genius solution: “Why have a pointer to a sprite stored in a list, and then go back later and try and find it. When you add it, store the ITERATOR, and then later when you want to remove it, you already have the iterator ready. No more list searching.

Holy Fucking Crap.

I’ve been coding for 44 years, and its never occurred to me that with a tightly managed list of objects that will have constant addition and removal, and the list may be huge, that its worth storing the iterator in the stored object. Thats genius. Maybe you do this all the time, but its new to me, and its phenomneal for this use case. Not only is GUI_Bullet::SetActive() no longer in the top 20 functions by CPU time, I cannot even find it. Its literally too fast to measure, even with my extreme stress test.

Buy shares in NVDA and TSMC!

Update hell. How did we get here?

Posted on January 21, 2025

I have a web server that runs my website, this blog, the back-end stuff for some of my games (which report bugs or browse mods etc). I also, sadly, have a ‘smart’ TV, a samsung galaxy phone, a kindle and a laptop and desktop PC. As a result, I am in a state of perpetual update hell.

Long ago when I worked in IT, we were very obsessive about updating everyone’s PCs. The danger of having a virus run rampant on a corporate network was always a concern, and when I worked on city trading stuff, obviously security was the first priority. We were super paranoid about everything being updated and patched. We were the IT gods of security, who would update everyone’s PC to the latest patch for everything before they even knew they were needed. Updates were essential and good. Otherwise all hell would break loose.

These days my attitude has become: Fuck you, updates. I’ll update only if I have no choice.

How on earth have we fallen so far? I think, speaking for myself, its because software has just generally become so staggeringly badly made. To pick on just one example, I’ll describe the hellscape of bug ridden crap that is Battlefield 2042. This is a game I try to play every day, and sometimes it even works. On a good day, I can launch the game, and it will show me a splash screen trying to sell me shit which I can ‘dismiss’, except the coders are too inept to record that I skipped it, so I see it every day. Every time I launch the game this wastes my time and makes me hate the developers.

So then I can select a game and start playing. HAHAHA. Don’t be so naieve. This is not 2001, I don’t get to choose which map to play, or what server to play on. That functionality was ripped out of the DICE engine so it could matchmake for me, to make my user experience 100x worse. Routinely that means joining a game with 30 seconds to the end, and maybe 75% of the time, it kicks either me or my squadmate back to the menu anyway, so I have to start again. Obviously I have zero idea what map we will be playing, because the developer has thought ‘fuck the player, who cares. They will play what we tell them’.

Actually I’m kidding, its way, way worse that this. Because when the game ends, I then have to watch a load of tiktok inspired US bro-culture dance-move/posturing bullshit before being dumped right back at the main menu. Back in the days of Battlefield 1942 or Call of Duty, I could just stay on the same server and keep playing, but not any more. Why? because fuck the player, thats why.

Speaking of which, in call of duty 2, you could play modded maps, in their thousands, and they would auto-download as you connected to the server. The mod tools were free of course. There was no attempt to sell you stupid fucking hats. A server browser, mod support, flawless multiplayer code, and no anti-consumer F2P pay-to-win game-ruining immersion breaking bullshit. That was the peak era for gaming, unlike now. As a gamer, I am just cannon fodder for F2P whales, or someone to be exploited, marketed at and used. Why? because fuck the player. Nobody at the top of EA even plays games anyway, they might as well be selling fucking washing machines.

So…thats just trying to use an existing bit of software. Imagine the joy of being FORCED to download a 6GB ‘patch’ to play that same game, with zero notice, and finding out that this 6GB is a new weapon ‘skin’ and some ‘balance changes’. Who the hell is in charge of the patch process? How the hell can any software be this badly made? I will never buy any game made by DICE or published by EA ever again. Fuck them.

But forgetting games, and returning to PCs and Smart TVs and phones…it gets even worse. I must have updated the Disney app for my TV 20 times in the last year. Why? What new functionality has been added? Anything? I cannot see any difference whatsoever, but we are expected to constantly run updates. With skype its even worse. Skype’s software is so appalling now that it cannot even patch itself. Every time I reboot my PC skype reinstalls itself. And those skype updates have done nothing but make it worse, and slower. Every single ‘upgrade’ is a downgrade.

I will continue to use Windows 10 on my main PC until a bunch of software police smash through my front door with guns. Windows 11 is another 50GB of bloated crap that manages to make all of the UI WORSE but now you get very slightly rounded window corners. Amazing. Truly the work of 1,000 software engineers for a year. Meanwhile it cannot even synch video to sound on video playback on any website, and the ‘upgrade’ to windows 11 permanently halves the volume of all speakers, with no way to ever fix it. The windows 11 upgrade is just a way to wreck your PC. In 2025, Windows cannot play video, or control volume. Things Windows 3.11 could do just fine. Its embarrassing.

Do I worry about malware? yes, and I have a paid anti-malware suite running on both PCs. But over the last ten years the worst malware I have experienced is Windows update. The second worse is probably my samsung phone. Updates take about 20 minutes, never add ANYTHING I want, but randomly change the icons of every installed app, just to aggravate me. Its a phone. The calculator app does not need an update. Ever.

Right now, I have a laptop that works (apart from video and sound, which is unusable) and a desktop that works. I do not want ANY updates from microsoft on anything, ever again. I simply do not trust the people working there to be able to write code. Ditto samsung. Just leave me the fuck alone. If it was possible to globally opt out of all updates on my TV I would do so. These are not new games getting cool feature improvements and new content. They are apps that work. My expectations of software in 2025 are now so low, that simply having programs that vaguely work is the gold standard, and I will not risk any new code written by people who clearly have zero clue what they are doing.

An experienced coder’s approach to fixing business mistakes

Posted on December 21, 2024

I was the victim of an email today that was basically an invoice that was wrong. A company that has sent me a wrong invoice in the past has sent me it again another 3 times now, with increasing levels of angry demands and warnings about credit ratings. This raised my heart rate and anger level to extremely high distressing levels, but this is not a rant about how some companies are just awful, that would be too simple. This is more of an investigation into why stuff like that happens, and how software engineering can help you learn how to not fuck up like that.

I have been coding for 44 years now, since age 11. In all that time I’ve only coded in BASIC, C, C++ and some PHP, but 90% of it has been C++, so that’s the one I’m really good at. I don’t even use all of C++, so lets just say when it comes to the bits I DO use, I’m very experienced. This is why I’m using the term software engineering, and not coding or hacking, which implies a sort of amateurish copy-pasting-from-the-internet style of development. I’ve both worked on large game projects, and coded relatively large projects entirely from scratch, including patching, re-factoring and a lot of debugging. I coded my own neural networks from scratch before they were really a popular thing. Anyway…

Take the simple example of my experience today. An invoice is wrong, and sent to a customer anyway. This has happened three times now, despite the customer replying in detail, with sources and annotations and other people copied in, AND including replies from other people at the offending company who openly admit its wrong and will be fixed. Clearly this is a smorgasbord of incompetence, and irritating. But why do companies do this? what actually goes wrong? I think I know the answer: Once a problem like this is finally flagged up (which often involves threats from someone like me to visit the CEOs home address at 3AM to shout at them), the company ‘fixes’ the mistake, maybe apologizes, and everyone gets on with their life…

This is the mistake.

Companies want to ‘fix’ the mistake in the quickest, easiest and cheapest way possible. If an invoice was accidentally doubled, then they ‘go into the system’ (they just mean database) and they halve it. Problem solved, customer happy. Now I don’t give a damn about the sorry state of this company’s awful internal processes, but I am aware of the fact that companies like this try to fix things in this way, rather than the true software engineering way. So how SHOULD it be done?

Before I was a full time computer programmer I worked in IT. I had a bunch of jobs, gradually going up in seniority. I saw everything from ‘patch it and who cares’ mentality (consumer PC sales) to ‘The absolute core problem must be fixed, verified, documented and reported on with regards to how it could have happened’ (Stock trading floor software). I can imagine that most managers think the first approach is faster, cheaper, better, and the second is only needed for financial systems or healthcare or weapons systems, but this is just flat out wrong.

When a mistake happens, fixing it forever, at a fundamental level, will save way, way more time in the lifetime of the company than patching over a bug.

Spending a lot of time in IT support taught me that there are many more stages to actually finding the cause of a problem than you might think. Here is a breakdown of how I ended up thinking about this stuff:

Stage 1: Work out what has actually happened. This is where almost everyone gets it wrong. For example, in my case I got an email for a £1k bill. This was not correct. Thats the error. Thats ALL WE KNOW RIGHT NOW. You might want to leap immediately to ‘the customer was charged too much’ or ‘the customers invoice was doubled’ but we do not know. Right now we only know the contents of the email are wrong. Is that a problem with the email generation code? Does the email match the amount in the credit control database? Until you check, you are just guessing. You might spend hours trying to fix a database calculation error you cannot find until finally realizing the email writing code is borked…

Stage 2: Draw a box around the problem. This was so helpful in IT. Even when you know what you know, you need to be able to define the scope of the problem. Again, right now all we know is the customer ‘cliff’ got a wrong invoice. Is that the problem? Or is it that every invoice we send out is wrong? or was it only invoices on a certain date? or for a certain product? Until we check a bunch of other emails we do not even know the scope of the problem. If its systemic, then fixing it for cliff is pointless.

Stage 3: Where did things go wrong. You need to find the last moment when everything worked. Was the correct amount applied to the product in the database? was the correct quantity selected when entering the customer’s order? was the amount correct right up until the invoice was triggered? Unless you know WHERE something went wrong, you cannot work out WHAT or WHY.

Stage 4: Developer a theory that fits all available data. You find a line of code that seems to explain the incorrect data. This needs to not only explain why it screwed things up for cliff, but also explain why the invoice for dave was actually ok.

Stage 5: Test a fix: You can now fix the problem. Hurrah. Change the code or the process and run it again. Is everything ok? If so you may be preparing a victory lap but this is laughably premature.

Stage 6: Reset back to the failed state: Now undo your fix and run it again. Seems redundant doesn’t it? Experience has taught me this is vital. If you REALLY have the fix, then undoing your fix should restore the problem. Maybe 1 in a 100 times it will not. The ‘random’ problem just fixed itself for some indeterminate period. Your fix was a red herring. A REAL fix is like a switch. Turn it on and its fixed, turn it off and its broken. Verify this.

Stage 7: Post Mortem: This is where you work out HOW this could ever have happened. This is actually by far the most important stage of the whole process. If you just fix some bad code, then you have just fixed one instance in one program. The coder who made that mistake will make it again, and again and again. The REAL fix is to make it IMPOSSIBLE to ever have that error again. This takes time, and analysis. The solution may be better training, or it may mean changing an API to make it impossible to process bad data. It might mean firing someone incompetent. It might mean adding a QA layer. Whatever the mistake was, if you do not go through this stage, you failed at the task of fixing the problem.

Coders and organizations that work like this have fewer bugs, fewer mistakes. They need smaller QA teams, and less time devoted to fixing mistakes and implementing patches. Their complaints department is tiny, and yet still provides great communication, because there are hardly any complaints. In short, companies that run like this are awesome, popular, profitable, and great places to work. Why is everywhere not like this? Because it requires some things:

A culture of giving people fixing problems free reign to follow the problem everywhere. If only the department that sends the emails is allowed to get involved in complaints about email, database errors or coding errors will never get fixed because they CAN never be fixed. The phrase ‘not my department’ needs to be banned.
A culture of accepting that someone is STILL working on a proper fix, even after the complaint has been handled and the customer is happy. When there is a bug, there are TWO problems to fix: the customers bad experience AND the company process failure that allowed this. Managers who pull people away from bug-fix post-mortems to fix the next thing right now are a curse.

Of course your mileage may vary. Be aware I am a hypomanic workaholic who works for himself so my ‘advice’ may not be applicable to everybody, but I hope this was interesting and thought provoking anyway. Too many coders are just copy-pasting from stack overflow or asking chatgpt to quickly patch their bad code. Working out how to fix things properly is a worthwhile pursuit.

Re-thinking painters algorithm rendering optimisation

Posted on June 9, 2024

Before I even start this post, I should point out that currently the project I am coding is for fun. I actually really ENJOY complex programming design challenges, unlike many people, so the comment ‘this is done for you in X’ or ‘there is middleware that solves this without you doing anything in Y’ is irrelevant to me, and actually depressing. Whatever happened to the joy of working on hard problems?

Take an imaginary game that has a lot of entities in it. Say 500. Each entity is a sprite. That sprite has an effect drawn on top of it (say its a glowy light) and a different effect below it (say its another glowy light). The objects have to be rendered in a sensible way, so that if one object passes over another (for example) the glowy lights ‘stack’ properly, and you don’t see aberrations where the lights from something are seen through something else. Here is a visualisation:

In this case red is the ‘underglow’ grey is the sprite, yellow is the overglow. The bottom right duo shows the issue where things have to be rendered correctly. In a 3D game with complex meshes and no antialiasing, you just use a zbuffer to draw all this stuff, and forget about it, but if you are using sprites with nice fuzzy edges (not horrible jaggy pixels, you really need to use the classic ‘painters algorithm’ of drawing from back to front for each object. There are clear performance problems:

A naïve way to code this would be my current system which handles each object in 3 draw calls. So we set up the blend mode for the underglow, and draw that, then set a normal blend mode, draw the sprite, then a glowy blend mode to draw the top glow, then repeat for every other entity. The trouble is, with 500 entities, thats 1,500 draw calls simply to draw the most simple part of the game…

Unfortunately this only scratches the surface because there may also be other layers of things that need drawing on top of each entity, and between drawing each sprite. However there is a major optimisation to be had…. <drumroll>. Actually my game really works like this:

Where there is a grid, and <generally speaking> the entities stay in one grid box at a time. WITHIN that grid box, all sorts of stuff may happen, but items in the top left grid box will not suddenly appear in the bottom right grid box. In other words I can make some clever assumptions, if only I can find a way to generalize them into an algorithm.

For example I could draw the underglow in one object in each grid box all together in a single draw call like this:

So this is suddenly one draw call instead of 8. I then do the same for the sprites, then the overglow, and then do the other object in each grid square as a second pass (basically a second Z layer. Assuming 8 grid squares, 2 per square, 3 passes, thats 48 draw calls for naïve method and 6 for the new method. Pretty awesome! However, thats oversimplifying things. Firstly in some cases there are 16 entities in each grid, in others just 1. Secondly, in some cases items ‘spill’ over into the next grid square, so I need to ensure I do not get an anomaly where 2 objects would overlap just slightly and thus z-fight when drawn…

Like everything in game coding, the simple code design always accelerates towards spaghetti, but I would like to do my best to avoid that if possible…

In order to decide what is drawn at what time, I basically need 2 pieces of metadata about each object. Item 1 is what grid square it is in, and item 2 is the Z value of that object, where I can cope with say 16 different Z ‘bands’ for the whole world, meaning a group of 16 entities in a square will definitely be drawn with 16 different draw calls, and thus be painter-algorithm-ed ok.

I also want to do this without having to do insane amounts of sorting and processing every frame, for obvious efficiency reasons…

So my first thoughts on this are to handle everything via Z sorting, and make the list of entities the de-facto Z sort. In other words, when I build up the initial list of entities at the start of the game, I have pre-sorted them into z order. So instead of a single list of randomly jumbled Z orders. I get this situation, where the black -number is the Z position, and the blue references the grid square:

So my now pre-sorted object list goes like this:

A1 B1 C1 D1 E1 F1 G1 H1 A2 C2 F2 etc..

I can then do batches of draw class so my actual code looks like this

DrawUnderGlowFor1()
DrawSpritesFor1()
DrawOverglowFor1()

The nature of the game means that generally speaking, Z order can be fixed from the start and never change, which simplifies things hugely. The bit where it becomes a real pain is the overlap of objects to adjacent grid squares. I would also like a super-generic system that can handle not just these fixed entities, but anything else in the game. What I really need is a two layered graphics engine:

Layer 1 just send out a ton of draw calls, probably with some sort of metadata. This is like a draw call, plus relevant texture, plus blend state for the object. This is done in a naïve way with no thought to anything else.

Layer 2 processes layer 1, does some analysis and works out how to collapse draw calls. It can calculate bounding boxes for everything that is drawn, see what overlaps with what, and what can be batched, and then decides on a final ‘optimised’ series of calls to directx. This is done before any textures get set or draw calls are made. It then hands all this off to ideally a new thread, which just streams those draw calls to directx, while the other threads get on with composing the next frame.

I have attempted such a system once before in the past, but I was a less experienced coder then, and full-time working, and on a (self-imposed) deadline to ship the game, rather than obsess over noodling with rendering engine design. So hopefully this time, I will crack it, and then have something super cool that can handle whatever I throw at it :D. Also the beauty of this design is it would be easy to bypass stage 2 and just render everything as it comes in, so I could even hook up a hotkey to watch the frame rate difference and check I’m not wasting all ny time :D.

Cliffs Solar Panels:
	CO2 emission reduced 445.05 kg
	Equivalent trees planted 26.93 trees
	Equivalent lightbulbs 6973.96 lightbulbs