Game Design, Programming and running a one-man games business…

Making a hobby game!

I’ve been getting very motivated about a little hobby game I’ve been working on in-between dealing with solar farm stuff, and playing the guitar. I have a lot of really cool space-game assets from my old game (some might say classic!) ‘Gratuitous Space Battles‘ and it just feels wrong to have all the art to make a space-invaders style game and not just do it! I decided to call the game ‘Gratuitous Space Shooty Game’.

I am so disorganised that its simpler for me to code a new game engine from scratch than find the hard drive with GSB code on it (or at least all of it), but luckily I have time plus experience, and I can type stupidly fast, so I’ve basically written a new game engine for this little hobby project. Its nothing amazing, the game currently only uses 2 shaders, no clever effects, no amazing visuals, just a simple ‘shoot at static sprites and enjoy some primitive particle effects’ style game:

Obviously its 2023, so just making simple ‘space invaders’ wont cut it even for a hobby game, so there is a lot of influence from stuff like galaxian, and pheonix, and all the other space shooters out there. Right now, the alien movement is very generic and simple, and nothing to shout about. No fancy splines, just left right and down!

The thing thats motivating me about this game is the small scope, and ease of adding new stuff. When I work on a giant commercial game of mine like Production Line or Democracy, every single line of code or change to a single data item needs to be checked and balanced for 11 different languages and every conceivable screen resolution and hardware, then uploaded as a patch to itch, gog, epic, humble and steam. The amount of admin, and busywork required to make marginal changes to a large project can be pretty overwhelming.

With this game, its 1920×1080 res or nothing (stretched to actual resolution, and bordered if necessary), only in English. And right now its not even on any store. This means I can have a cool idea, start typing code, and be testing it within minutes, which makes the development process pretty fun.

I don’t want to put up a public build for it quite yet, because so much of it is just totally broken, or half assed. The current font sucks, and doesn’t even display percentage symbols :D. The gameplay is unbalanced, and there is no high score system that actually stores anything anywhere yet. I reckon I need to code a primitive online high score system, and include music and sfx volume controls before I make it public. Oh and a pause button might be nice too!

I have to say though… its already very very fun. There is something very adrenaline-rushy about playing it on the harder levels, where everything gets a bit hectic. In these days of F2P, monetization, competitive e-sports, multi gigabyte patches, and achievements and so on… there is something very pleasurable about a simple game where you move left and right and hit the fire button!

When I stick it on itch or the humble widget I’ll post about it here :D.

What I learned from fixing a dumb bug in my graphics code

I’ve recently been on a bit of a mission to improve the speed at which my game Democracy 4 runs on the intel Iris Xe graphics chip. For some background: Democracy 4 uses my own engine, and its a 2D game that uses a lot of text and vector graphics. The Iris Xe graphics chip is common on a lot of low end laptops, and especially laptops not intended for gaming. Nonetheless, its a popular chip, and almost all of the complaints I get regarding performance (and TBH there are not many) come from people who are unlucky enough to have this chip. In many cases, recommending a driver update fixes it, but not all.

Recently a fancy high-end laptop I own basically bricked itself during a bungled windows 11 update. I was furious, but also determined to get something totally different, so got a cheap laptop made partly from recycled materials. By random luck, it has this exact graphics chipset, which made the task of optimising code for that chip way easier.

If you are a coder working on real-time graphics stuff like games, and you have never used a graphics profiler, you need to fix that right away. They are amazing things. You might be familiar with general case profilers like vtune, but you really cannot beat a profiler made by the hardware vendor for your graphics card or chip. In this case, its the intel graphics monitor, which launches separate apps to capture frame traces, and then analyze them.

I’m not going to go through all the technical details of using the intel tools suite, as thats specific to their hardware, and the exact method of launching these programs, and analyzing a frame of a game varies between intel, AMD and nvidia. They all provide programs that do basically the same thing, so I’ll talk about the bug I found in general terms, not tied to vendor or API, which I think is much more useful. The web is too full of hyper-specific code examples and too lacking in terms of general advice.

All frame capture programs let you look at a single frame of your game, and lists every single draw-call made in that frame, showing visually whats drawn, what parameters were passed, and how long it took. You are probably aware that the complexity of the shader (if any), the number of primitives and the number of pixels rendered all combine in some way to determine how much GPU time is being sent on a specific draw call. A single tiny triangle flat shaded is quick, a multi-render-target combined shader that fills the screen with 10,000 triangles is slow. We all know this.

The reason I’m writing this article is precisely because this was NOT the case, and discovering the cause therefore took a lot of time. More than 2 weeks in fact. I was following my familiar route of capturing a frame, noting that there were a bunch of draw calls I could collapse together, and doing this as I watched the frame rate climb. This was going fine until I basically hit a wall. I could not reduce the draw calls any more, and performance still sucked. Why?

Obviously my first conclusion was that the Iris Xe graphics chip REALLY sucks, and such is life. But I was doing 35-40 draw calls a frame. Thats nothing. The amount of overdraw was also low. Was it REALLY this bad? can it be that a modern laptop would struggle with just 40 draw calls a frame? Luckily there was a way to see if this was true. I could simply run other games and see what they did.

One of the games I tested was Shadowhand. I chose this because it uses a different engine (gamemaker). I didnt even code this game, but the beauty of graphics profilers is this: You do NOT NEED A DEBUG BUILD OR SOURCE CODE. You can use them on any game you like! So I did, and noticed Shadowhand sometimes had 600 draw calls at 60 frames a second. I was struggling with 35 draw calls at 40fps. What the hell?

One of the advanced mode options in the intel profiler is to split open every draw call so you see now only the draw calls, but every call to an opengl API that happens between them. This was very very helpful. I’m not an opengl coder, I prefer directx, and the opengl code is legacy stuff coded by someone else. I immediately expect bad code, and do a lot of reading up on opengl syntax and so on. Eventually, just staring at this of API calls makes me realize there is a ton of redundancy. Certain render states get set to a value, then reverted, then set again, then reverted, then a draw call is made. There seems to be a lot of unnecessary calls to setting various blend modes. Could this be it?

Initially I thought that some inefficiency was arising from a function that set a source blend state, and then a destination blend state as two different calls, when there was a perfectly good OpenGL API call that did both at once. I rewrote the code to do this, and was smug about having halved the number of blend mode state calls. This made things a bit faster, but not enough. Crucially, the number of totally redundant set and reset calls was still scattered all over the place.

To understand why this matters, you need to understand that most graphics APIs are buffered command lists. When you make a draw call, it just gets put into a list of stuff to be done, and if you make multiple draws without changing states, sometimes the card gets to make some super-clever optimisations and batch things better for you. This is ‘lazy’ rendering, and very common, and a very good idea. However, when you change certain render states, graphics APIs cannot do this. They effectively have to ‘flush’ the current list of draw calls, and everything has to sit and wait until they are finished before proceeding. This is ‘stalling’ the graphics pipeline, and you don’t want to do it unless you have to. You REALLY don’t want to constantly flip back and forth between render states.

Obviously I was doing exactly that. But how?

The answer is why I wrote this article, because its a general case piece of wisdom every coder should have. Its not even graphics related. Here is what happened:

I wrote some code ages ago that takes some data about a chunk of text, and processes all the data into indexed vertexes in a vertexbuffer full of vector-rendered crisp text. It makes a note of all this stuff but does not render anything. You can make multiple calls to this AddText() function, without caring if this is the first, last or middle bit of text in this window. The only caveat is to remember to call DrawText() before the window is done, so that text doesnt ‘spill through’ onto any later windows rendered above this one.

DrawText() goes through the existing list, and renders all that text in one huge efficient draw call. Clean, Fast, Optimised, Excellent code.

Thats how all my games work, even the directx ones, as its API-agnostic. However, there is a big, big problem in the actual implementation. The problem is this: The code DrawText() stores the current API render states, then sets them to be the ones needed for text rendering, then goes through the pending list of text, and does the draw call, then resets all those render states back how they were. Do you see the bug? I didn’t. Not for years!

The problem didn’t exist until I spotted the odd bug in my code where I had rendered text, but forgotten to call DrawText() at the end of a window, so you saw text spill over into a pop-up dialog box now and then. This was an easy fix though, as I could just go through every window where I render some text and add a DrawText() call to the end of that window draw function. I even wrote it as a DRAWTEXT macro to make it a bit easier. I spammed this macro all over my code, and all of my bugs disappeared. Life was good.

Have you spotted it now?

The redundant render state changes eventually clued-me-in. Stupidly, the code for DrawText() didn’t make the simple, obvious check of whether or not there was even anything in the queue of text at all. If I had spammed this call at the end of a dialog box that already had drawn all its text, or even had none at all, then the function still went through all the motions to draw some. It stored the current render states, set new ones, then did nothing…because the text queue was empty, then reset everything. And this happened LOTS of time each frame, creating a stupid number of stalls in the rendering pipeline in order to achieve NOTHING. It was fixed with a single line of code. (A simple .empty() check on a vector and some curly brackets… to return without doing anything).

Three things conspired to make finding this bug hard. First: I previously owned no hardware I could reproduce it on. Second: It was something that didn’t even show up when looking at each draw call, it manifested as making every draw call slower. Third: it was not a bad API call, or use of the wrong function, or a syntax error, but a conceptual code design fuck-up by me, My design of the text renderer was flawed, in a way that had zero side-effects apart from redundant API calls.

What can be learned?

Macros, and functions can be evil, because they hide a lot of sins. When we write an entire game as a massive long list of assembly instructions (do not do this) it becomes painfully obvious that we just typed a bazillion lines of code. When we hide code in a function, and then hide even the function call in a macro, we totally forget whats in there. I managed to hide a lot of sins inside this:

DRAWTEXT

Whereas what it really should have been though of was this

STORERENDERSTATESANDTHENSETTHEMTHENGOTHROUGHALISTTHENRESETEVERYTHINGBACK

This is an incredibly common problem that happens in large code bases, and is made way worse when you have a lot of developers. Coder A writes a fast, streamlined function that does X. Coder B finds that the function needs to do Y and Z as well, and expands upon it. Coder A knows its a fast function so he spams calls to it whenever he thinks he needs it, because its basically ‘free’ from a performance POV. Producer C then asks why the game is slow, and nobody knows.

As programmers, we are aware that some code is slow (saving a game state to disk) and some is fast (adding 2 variables together). What we forget is how fast or slow all those little functions we work on during development have become. I’ve only really worked on 3 massive games (Republic: The Revolution, an unshipped X Box game, and The Movies), but my memory of large codebases is that they all suffer from this problem. You are busy working on your bit of the code. Someone else coded some stuff you now need to interface with. They tell you that function Y does this, and they get back to their job, you get back to yours. They have no idea that you are calling function Y in a loop 30,000 times a frame. They KNOW its slow, why would anybody do that? But you don’t. Why would you? its someone else’s code.

Using code you are not familiar with is like using machinery you are not familiar with. Most safety engineers would say its dangerous to just point somebody at the new amazing LaserLathe3000 and tell them to get on with it, but this is the default way in which programmers communicate,

Have you EVER seen an API spec that lists the average time each function call will take? I haven’t. Not even any supporting documentation that says ‘This is slow btw’. We have got so used to infinite RAM and compute that nobody cares. We really SHOULD care about this stuff. At the moment we use code like people use energy. Your lightbulb uses 5 watts, your oven probably 3,000 watts. Do you think like that? Do you imagine turning on 600 light bulbs when you switch the oven on? (You should!).

Anyway, we need better documentation of what functions actually do, what their side effects are, what CPU time they use up, and when and how to use them. An API spec that just lists variable types and a single line of description is just not good enough. I got tripped up by code I wrote myself. Imagine how much of the API calls we make are doing horrendously inefficient redundant work that we just don’t know about. We really need to get better at this stuff.

Footnote: Amusingly, this change got me to 50 FPS. It really bugged me that it was still not 60 FPS> Hilariously I realised that just plugging my laptop in to a mains charger bumped it to 60. Damn intel and their stealth GPU-speed-throttling when on battery power. At least tell me when you do that!

Speeding up my game from 59fps to 228 fps.

I recently saw a comment online that the ‘polls’ screen in Democracy 4 was horribly slow for a particular player. This worried me, because I pride myself in writing fast code, and optimizing to a low min spec. The last thing I want to hear is that my game seems to be performing badly for someone. I thus went to work on improving it. This involved about 15 mins looking at the code, about an hour musing, while trying to sleep, and about 20 mins coding the next day, plus an hour or more of testing, and profiling. Here is what I did, and how I did it.

The game in question is the latest in my series of political strategy games: Democracy 4. its a turn-based 2D icon-heavy game that often looks a lot like a super-interactive infographic. Here is the screen in question, which shows the history of the opinion polling for all of the different voter groups:

There is actually quite a lot of stuff being drawn on that screen, so I’ll explain what is going on, in ‘painters algorithm‘ terms.

Firstly, there is a background screen, containing a ton of data and icons, which is displayed in greyscale (black & white) and slightly contrast-washed out to de-emphasize it. This is non-interactive, and is there just to show that this is, effectively, just a pop-up window on top of the main UI, and you can return there by hitting the dialog close button at the top right. This is drawn really easily, because its actually a saved ‘copy’ of the backbuffer from earlier, so the whole background image is a single sprite, a quad of 2 triangles, drawn in a single draw call, filling the screen.

Secondly I draw the background for the current window, which is very simple as its basically a flat white box, with a black frame around it. This is actually 2 draw calls ( TRIANGLESTRIP for the white, LINESTRIP for the frame).

Then I draw a grid that represents the chart area, as a bunch of single-pixels lines. Thankfully, done efficiently as a single big draw call.

Then for each of the poll lines I do the following:

  • A single draw call for a bunch of thin rectangles representing the faded out lines from pre-game turns
  • A single draw call for a bunch of circular sprites representing the dots for pre-game turns
  • A single draw call for the lines from player-turns (full color)
  • A single draw call for the dots for player turns (full color)

Then a bunch of other stuff, such as the filled-progress-bar style blue bars at the right (all a single draw call) and then text on them, and then the title, the buttons at the top, and the close button.

In total, this amounts to a total number of draw calls for this screen of 102. This is almost best-case though, because its possible for that grey bar at the bottom to fill up with a ton of icons too, which would make things worse.

Now…you may well be sat there puzzled, thinking ‘but cliff, who cares? 102 draw calls is nothing? My FragMeister 9900XT+++ video pulverizer can apparently do 150,000 draw calls a millisecond’. You would be correct. But let me introduce you to the woes of a ten year old laptop, and not a bad 10 year old laptop, a pretty hefty, pretty expensive HP elitebook 8470p, which was inexplicably a gift to me from intel (thanks guys!), and came with their HD 4000 graphics chip.

I appreciate free laptops, and the intel GPA graphics debugging tools are incredible, but the horsepower of the actual GPU: not good at all. It could only manage 58fps on that screen. Assume some extra icons at the bottom, and assume a poorly-looked after laptop with some crapware running the background, and we could easily drop to 40fps. Assume a cheaper-model laptop, or one a few years older, and we suddenly have bad, noticeable ‘lag’. Yikes.

When you delve into the thrills of the intel Graphics Frame Analyzer, the problem is apparent. This chip hates lots of draw calls. It hates fill-rate too, but the draw calls seem to really annoy it, so its clear that I needed to get that number down.

The obvious stupidity is that I am drawing 21 sets of lines individually instead of as a group. There was actually method to the madness though. The lines were composed of lines AND dots, and they were being drawn differently. Each thick ‘line’ between 2 dots is actually just a flat-shaded rectangle. To do this, I create a ‘sprite’ (just a quad of 2 triangles), with the current texture set to NULL, and draw it at the right angle. Draw a whole bunch of them and you get a jagged thick ‘line’. The dots however, use an actual texture, a sprite thats a filled circle. To draw them, I need to set the current texture to be the circle texture, then draw a quad where the dot should be.

Luckily I’m not a total idiot, so I do these in groups, not individually as line/dot/line/dot. So I draw 32 lines, then I draw 32 dots, and thats 2 draw calls, one with a null texture, one with a circle texture. Actually its WORSE, because I make the mistake of treating the pre-game (simulated) turns as a separate line, because its ‘faded out’. This is actually just stupid. When you send a long list of triangles to a GPU, the color values are in the vertexes, you don’t need to swap draw calls to change colors! but hey ho.

Clearly when I wrote this code, I thought ‘well one uses a null texture, and another uses a circle texture, so this needs to be 2 separate calls’. and because I cant even make a single dotted line 1 call, no way can I make the whole thing 1 call right?

This was stupid. I could have changed my ‘circle’ texture to be 2 images, a circle, and a flat white quad, and change the UV values so that the dots used the circle, and the rectangles used the flat white quad, but there was actually an even easier method. I just had both systems use the circle texture, but set up UVs on the rectangles so that they were using only the very very center of the circle sprite, which was effectively all white anyway…

By setting those UVs, I can avoid having to use a NULL texture. An all-white texture is exactly the same thing. This way, all the lines and dots are exactly the same thing, its just a big long list of sprites, which means just a big long triangle list, which means a single draw call. And because its just 1 draw call, that means it can merge with the next line, and the next. So instead of doing 21x2x2 = 84 draw calls, I can just do one. Just one. Awesome.

Amusingly, in my actual code, I was sloppy and didnt even use the matching UVs perfectly, but it doesn’t matter:

/////////////////////////////////////////////////////////////////////////

void GUI_AnimatedGraphLine::RenderThickLines(V2* ppoints, int countpoints,
 float width, RGBACOLOR color)
{
      BaseSprite sp;
      sp.SetUV(0.49f, 0.48f, 0.51f, 0.51f);
      sp.width = width;
      sp.SetVertexColor(color);

Just making this one change made a ridiculous difference to the frame rate for this screen, sending it from 89fps to 228 fps, a completely huge win. It will be in the next update to the game.

Its absolutely not the final word on speeding up that part of the UI though. I have some very un-optimised nonsense in there to be honest. Those last little lines at the right of the chart, that connect the plotted lines to their matching text bars…. those are done as a draw call separate from the main graph. Why? Simplicity of code layout I guess. There is also a LOT of wasted CPU stuff going on. The position of all those lines and dots is calculated every frame, despite the fact that this window is not draggable or resizable, and they never change. I am 99% sure the game is always GPU bound on screens like this, so I haven’t bothered looking into it, but its still a waste.

According to the GPA Frame analyzer, the BIG waste of time in this screen is definitely the fact that I have massive overdraw going on with this whole dialog box. The window is a perfect, simple rectangle, so its easy to work out how much fill-rate was wasted drawing that background image behind it. There is zero opacity, so it would actually be fine to cookie-cutter out the top, bottom, left and right of this screen and then only draw the sections of the background image I needed. That could boost the FPS even more.

In practice though, we always have to strike a balance between code readability and simplicity, and the target of fast code. I could probably code all sorts of complex kludges that speed up the code but cause legibility hell. Arguably, I have made my code more confusing already, because there is no simple bit of code where ‘the dotted line for this group gets drawn’. There is code that copies some data into a vertex buffer…and then you have to trust I remember to actually draw it later in the screen…

Luckily I’m the only person looking at this code, so I don’t need to debate/explain myself to anybody else on this one. I’ll take happy players of old laptops over some minor inconvenience regarding following the code by me later.

I’m always surprised by how seemingly simple code can be slow on some GPUs. Whats your experience of hardware like this? Do you have an HD4000 intel GPU? Whats it like for games in your experience?

Optimization for fun!

I am well aware that my game Democracy 4 is not exactly slow with huge framerate issues. However, optimization is fun! or at least it should be, but in practice, getting profiling to work on remote PCs is not exactly easy. I have basically used every profiling software imaginable and still have not got one that I think really does the job well…

I have basically wasted about an hour today trying to work out why I couldn’t get the intel vtune amplifier stuff to work with event based profiling and get rid of this pesky error that was clearly nonsense about ‘not able to recognize processor… until I finally realized that I actually have an AMD chip in my (relatively) new PC so…yeah… That drove me to try out the AMD uProf profiler, which is something I had not used before.

It took me a moment to realize that this software, good though it is, does not suggest to you ‘hey if you run me in administrator mode I will show you 50x more config options’ but luckily I worked that out. My first act was a brief run of Democracy 4, starting a new game then immediately going to the next turn. In the list of functions taking up all the time (and ignoring windows system functions) I get this list:

Which is about what I would expect. The game is implemented as custom-coded neural network structure, hence the terminology. Mostly everything is a neuron, and most of the processing is where each neural effect (the links between neurons) processes its equation, and then neurons do some math on their inputs and outputs.

The inner machinery of the neural network ultimately comes down to that top item there:

SIM_EquationProcessor::Interpret Value.

This is code that basically takes those equations in the game’s csv files like this:

StateHealthService,0-(0.4*x),2

And actually calculates a value from that. There are 2,000 voters with about 10 connections each, pre-processed on a new game 32 times, so thats 640,000 equations right there, plus all of the actual simulation stuff layered before that. In other words, that equation processer probably runs a million times on a new game, and the equation might have 5 values in it, so max case is 5 million values get interpreted when you click on ‘new game’.

Can I speed it up?

First step is to see how stable these values are any way, so I’ll do an identical profile run and check that the +/- errors on different profiling runs are small…

I think thats pretty close. I definitely have numbers here that are in the same ballpark. So now lets try some optimisations to speed this puppy up. Looking at the top function with a double-click gives me a whole bunch more data:

The function is much longer than this, but thats mostly catering to relatively rare edge cases. Looking at the bits that actually have numbers on it show pretty clearly that its pretty much all about the pesky strcmp() call. A separate piece of code has already parsed the full equation of 0-(0.4*x), so I have a bunch of char buffers for each variable, declared like this:

char Vals[MAX_VARIABLES][32];

The thing is, do I need the overhead of calling strcmp() when I am only really checking for whether the first letter is x? Sadly I cannot JUST check that, because that would prevent we having a named variable starting with an x. Lets imagine this equation:

0-(0.4*xylophone)

Obviously not very likely, but theoretically possible. If the length of the buffer was 1, and the first letter is x, then thats a hit, but the question is, will inlining 2 manual checks be faster than a strcmp function call? Lets replace that code with

if(Vals[valindex][0] == 'x' && Vals[valindex][1] == '\0')

And check out the results:

Hmmm. Actually WORSE as far as I can tell. So it looks like whether we strcmp or not, just checking the value of two bytes at that point is slow. probably because its not immediately available memory? Its notable that the code at line 232 is super fast by comparison, as its just checking a bool value we cached earlier. Maybe I should try that? When I parse the function, just keep a bool for each Value, saying if its ‘x’ or not?

Whoahh. This looks like a pretty major speedup. 326 cycles versus 1,143. What the hell? why didn’t I do this earlier? Lets look at the line by line…

This is awesome. I then tried to make this code inline, but it seemed to not make things any better. I haven’t fully explored uProf yet, but it does do cool flame graphs:

Profiling UIs are great fun :D

Code bloat has become astronomical

There is a service I use that occasionally means I have to upload some files somewhere (who it is does not matter, as frankly they are all the same). This is basically a simple case of pointing at a folder on my hard drive and copying the contents onto a remote server, where they probably do some database related stuff to assign that bunch of files a name, and verify who downloads it.

Its a big company, so they have big processes, and probably get hacked lot, so there is some security that is required, and also some verification that the files are not tampered with between me uploading and them receiving them. I get that.

…but basically we are talking about enumerating some files, reading them, uploading them, and then closing the connection with a log file saying if it worked, and if not what went wrong. This is not rocket science, and in fact I’ve written code like this from absolute scratch myself, using the wininet API and php on a server talking to a MySQL database. My stuff was probably not quite that robust compared to enterprise level stuff, but it did support hundreds of thousands of uploaded files (GSB challenge data), and verification and download and logging of them. It was one coder working maybe for 2 or 3 weeks?

The special upload tool I had to use today was a total of 230MB of client files, and involved 2,700 different files to manage this process.

You might think thats an embarrassing typo, so I’ll be clear. TWO THOUSAND SEVEN HUNDRED FILEs and 237MB of executables and supporting crap, to copy some files from a client to a server. This is beyond bloatware, this is beyond over-engineering, this is absolutely totally and utterly, provably, obviously, demonstrably ridiculous and insane.

The thing is… I suspect this uploader is no different to any other such software these days from any other large company. Oh and BTW it gives error messages and right now, it doesn’t work. sigh.

I’ve seen coders do this. I know how this happens. It happens because not only are the coders not doing low-level,. efficient code to achieve their goal, they have never even SEEN low level, efficient, well written code. How can we expect them to do anything better when they do not even understand that it is possible?

You can write a program that uploads files securely, rapidly, and safely to a server in less than a twentieth of that amount of code. It can be a SINGLE file, just a single little exe. It does not need hundred and hundreds of DLLS. Its not only possible, its easy, and its more reliable, and more efficient, and easier to debug, and…let me labor this point a bit… it will actually work.

Code bloat sounds like something that grumpy old programmers in their fifties (like me) make a big deal out of, because we are grumpy and old and also grumpy. I get that. But us being old and grumpy means complaining when code runs 50% slower than it should, or is 50% too big. This is way, way, way beyond that. We are at the point where I honestly do believe that 99.9% of the code in files on your PC is absolutely useless and is never even fucking executed. Its just there, in a suite of 65 DLLS, all because some coder wanted to do something trivial, like save out a bitmap and had *no idea how easy that is*, so they just imported an entire bucketful of bloatware crap to achieve it.

Like I say, I really should not be annoyed at young programmers doing this. Its what they learned. They have no idea what high performance or constraint-based development is. When you tell them the original game Elite had a sprawling galaxy, space combat in 3D, a career progression system, trading and thousands of planets to explore, and it was 64k, I guess they HEAR you, but they don’t REALLY understand the gap between that, and what we have now.

Why do I care?

I care for a ton of reasons, not least being the fact that if you need two thousand times as much code as usual to achieve a thing, it should work. But more importantly, I am aware of the fact that 99.9% of my processor time on this huge stonking PC is utterly useless. Its carrying out billions of operations per second just to sit still. My PC should be in super-ultra low power mode right now, with all the fans off, in utter silence because all thats happening is some spellchecking as I type in wordpress.

Ha. WordPress.

Computers are so fast these days that you should be able to consider them absolute magic. Everything that you could possibly imagine should happen between the 60ths of a second of the refresh rate. And yet, when I click the volume icon on my microsoft surface laptop (pretty new), there is a VISIBLE DELAY as the machine gradually builds up a new user interface element, and eventually works out what icons to draw and has them pop-in and they go live. It takes ACTUAL TIME. I suspect a half second, which in CPU time, is like a billion fucking years.

If I’m right and (conservatively), we have 99% wastage on our PCS, we are wasting 99% of the computer energy consumption too. This is beyond criminal. And to do what? I have no idea, but a quick look at task manager on my PC shows a metric fuckton of bloated crap doing god knows what. All I’m doing is typing this blog post. Windows has 102 background processes running. My nvidia graphics card currently has 6 of them, and some of those have sub tasks. To do what? I’m not running a game right now, I’m using about the same feature set from a video card driver as I would have done TWENTY years ago, but 6 processes are required.

Microsoft edge web view has 6 processes too, as does Microsoft edge too. I don’t even use Microsoft edge. I think I opened an SVG file in it yesterday, and here we are, another 12 useless pieces of code wasting memory, and probably polling the cpu as well.

This is utter, utter madness. Its why nothing seems to work, why everything is slow, why you need a new phone every year, and a new TV to load those bloated streaming apps, that also must be running code this bad.

I honestly think its only going to get worse, because the big dumb, useless tech companies like facebook, twitter, reddit, etc are the worst possible examples of this trend. Soon every one of the inexplicable thousands of ‘programmers’ employed at these places will just be using machine-learning to copy-paste bloated, buggy, sprawling crap from github into their code as they type. A simple attempt to add two numbers together will eventually involve 32 DLLS, 16 windows services and a billion lines of code.

Twitter has two thousand developers. Tweetdeck randomly just fails to load a user column. Its done it for four years now. I bet none of the coders have any idea why it happens, and the code behind it is just a pile of bloated, copy-pasted bullshit.

Reddit, when suggesting a topic title from a link, cannot cope with an ampersand or a semi colon or a pound symbol. Its 2022. They probably have 2,000 developers too. None of them can make a text parser work, clearly. Why are all these people getting paid?

There was a golden age of programming, back when you had actual limitations on memory and CPU. Now we just live in an ultra-wasteful pit of inefficiency. Its just sad.