Due to what seemed to be a compiler bug (omgz) yesterday I thought that large complex battles in Ridiculous Space Battles were hitting performance limits. That appears to not be the case, but it got me back into profiling the bigger battles (20×20 grid size, with up to 25 ships in each square, probably 600 ships in total) to see where the bottlenecks are.
The game is already multithreaded for some tasks, but the first profiling runs for he first 50% of a battle gives me this flame graph from uprof for the main thread:
In practice what I really care about is stuff inside GUI_Game::Draw(). I have to say I am pleasantly surprised with the breakdown as it seems nicely distributed, without any real obvious processor hogs at first glace. Drawing the ship sprites, processing the simulation, post processing (mostly lightmaps), particles, hulks, bullets and then a bunch of minor stuff. Nothing seems too bad. On the other hand there are some things in there that seem quite big given what I know they are doing. For example the lightmap stuff shouldnt be that big a deal, and perhaps should be threadable more? Lightmaps are being drawn not only for every laser ‘bullet’ but also every engine glow, and every one of the many sprites that make up a beam laser. I also draw ship ‘mattes’ to ensure that light does not glow through objects like asteroids. Even so, this doesn’t sound like a lot of CPU?
So this breakdown is showing as expected that a lot of it is within the drawmattes, but even so that seems too much to me. There might be a single ship sprite, but 6 engine glows and the flares from 20 missiles or 6 beam lasers associated with it. How come mattes are so slow? At first I assumed that I was not checking to see if they were onscreen, but I definitely am. Here is the code for that function:
GUI_GetShaderManager()->StartShader("matte.fx");
for (SIM_Ship* ps : PlayerShips)
{
ps->DrawMatte();
}
for (SIM_Ship* ps : EnemyShips)
{
ps->DrawMatte();
}
GUI_GetHulks()->DrawMattes();
GUI_GetRenderManager()->Render();
GUI_GetShaderManager()->EndShader("matte.fx");
So nothing exactly too complex. It could be that just traversing the STL list for 600 ships takes time, but frankly 600 is not a big list. Could it actually all be the hulks? Actually uprof suggests a lot of it is… but checking that code, its basically doing the same thing. The game made a note early whether to render a hulk matte, so there is no redundant rendering taking place. Hmmm. Maybe best to look elsewhere for problems. I tried running a battle involving a ludicrous number of bullets. I gave the Expanse ‘Apocalypse Cruiser’ a huge bunch of Gravimetric impulse cannons which have lost of burst fire, and filled a fleet with them.
And the flame graph is actually not too bad again:
Ok, bullet process is now the top task, but it has not gone insane, which is good. And the bulletmanager Draw() is also obviously bigger, but again not insane. I dug a little deeper, and found this nonsense inside the bullet process function:
float radangle = D3DXToRadian(Angle);
float cosa = COS(radangle);
float sina = SIN(radangle);
Looks innocent, but I actually wrote faster versions of sin and cos that use a lookup table for 3,600 versions of each. So basically my precision there is within a tenth of a degree. I probably left it like this because I worried that the quantizing there would make the bullets look like they miss their target when my sim shows that they hit. I checked with grok:
The bullet could be off target by up to approximately 2.62 pixels when using a sine/cosine lookup table with 1/10th degree precision over a distance of 3000 pixels.
Thats interesting, because frankly none of my bullets are firing 3000 pixel ranges, and being visibly off by that amount is actually ‘no big deal’. Its absolutely not a big deal if the bullet has been pre-determined to miss anyway… I guess I *could* be super-cautious and have a flag where if the bullet will miss, I use the lookup table for its movement, and if not I use real math?
And then this is the point in the thinking process where I realize that all my code is bollocks. Its only when the bullets WILL hit the target that they need to change their angle once shot anyway. In other words, some bullets are (for sim purposes) effectively homing-bullets, and some are not. In other words, not only can I use a lookup table for the angles of non-hitting bullets, I do not even need to recalculate the angle for them at all once they have been fired. Jesus I need a coffee…
Coffee consumed… And now checking that this change (caching one-off sin and cos for missing bullets) works and does not break the game… And it works fine. I am now aware of a much bigger issue: When a bullet expires, it removes its sprite from the list of lightmaps to be drawn in the lightmap code. This is a big deal, because that list may well have 6-10,000 entries. Removing an item from a list that big all the time is hitting performance in this perverse case with thousands of bullets. I need a better solution…
Checking UProf I can see that GUI_Bullet::SetActive() has a cache miss rate of 66%. Thats pretty dire, the worst of the top 20 most processor intensive functions. Yikes. And yet…
2 minutes chatting to grok (XAI’s chatbot) gave me the frankly genius solution: “Why have a pointer to a sprite stored in a list, and then go back later and try and find it. When you add it, store the ITERATOR, and then later when you want to remove it, you already have the iterator ready. No more list searching.
Holy Fucking Crap.
I’ve been coding for 44 years, and its never occurred to me that with a tightly managed list of objects that will have constant addition and removal, and the list may be huge, that its worth storing the iterator in the stored object. Thats genius. Maybe you do this all the time, but its new to me, and its phenomneal for this use case. Not only is GUI_Bullet::SetActive() no longer in the top 20 functions by CPU time, I cannot even find it. Its literally too fast to measure, even with my extreme stress test.
Buy shares in NVDA and TSMC!