Game Design, Programming and running a one-man games business…

Programming video: VTune for thread concurrency

Not of interest to many, but I made a little demonstration video (using Production Line) to show how I can check out the concurrency of threads in the game using Intels VTune software. Enjoy:

Slow file access

If you read my blog often you will know I can be very irritated by poor performance in my code, or for that matter, anybodies. Firefox is possibly the most memory-wasting application in the known universe. Quite why it needs >3% of my entire CPU right now for me to just type these characters is beyond me. Despite this, such performance is not ‘noticeable’ in the same way that a sluggish GUI can be. When you click a button in a game, the resulting action needs to happen IMMEDIATELY for you to feel like you are using the interface, not fighting it. Thus when launching a dialog box in a game, the aim is always to have its initial loading time to be as minimal as possible. often thats easy, sometimes…not.

When you click the ‘load’ button on the main menu of Production Line (my latest game), it loads a dialog with a list (scrollable) of windows for each save game on the disk. There are thumbnails for each one showing the screen grab from when they were saved, plus some data about each save game. Example:

 

This probably sounds like it should be pretty fast to create, but actually its annoyingly, painfully slow. Before you ask, yes I do the initialisation ‘lazily’ in that I am not loading in textures for the save games until I draw them, so the ones that are currently not visible due to the scroll position have not slowed me down. Actually the slowdown is much simpler than that.

There are currently 25 savegames in my list, in a folder with 50 files (a thumbnail for each one is in the folder too). The files range from 600k to 176MB for the actual save games (XML format) and the thumbnails are tiny 50k jpgs. Why so slow?

At the very least I need to query data about 25 files here. The dialog box puts them in order of creation, and to ensure its really accurate, I dont use windows file attributes but actually crack open the XML to take a look at the header data inside. At this point, I extract the date, and time, and do a version check to reject super-old unusable saves. I strongly suspect that the delay I sometimes experience (only when I’ve been doing other stuff, and the files are not in the cache of the hard drive, or in windows RAM already) is actually not even the reading of the files, or the enumerating of them (50 is not many) but the accessing of them.

When you access a file in windows quite a lot of behind the scenes crap happens. Drives may have to be spun up (or not, depending on tech), maybe even network shares may have to be connected to (not in this case), maybe wireless network drivers need kicking out of sleep. Windows needs to check that you have permission to access that file, to compare the desired access against permitted access. It needs to navigate a chain of block links if the file is fragmented on disk, and as it does all of this, the users anti virus program will kick into gear, scanning the file (maybe even the entire thing, like my big 176MB xml?) for malware.

All of this takes TIME.

The worst thing is, this stuff all happens for each individual file, which is why game engines tend to use pak files. (I have support for them in my engine, just not using it yet). The problem is, users save games are one area where you likely really cannot use them. These are files created by the user, and its often helpful (especially during beta) for them to be simple files the users can access, delete if necessary, copy if necessary, email to the dev if necessary. So pak-filing them is not an option. There are many hacks I can think of, including maintaining a summary of the games in a single file I can update lazily at another time, but nothing that doesn’t generate more complexity and potential for bugs.

One solution, if I was really bored and desperate for speed, would be to embed the jpg into the xml, so that the umber of files instantly halved. Certainly a future option. I could also swap to compressed save games that were likely 1/10th (or less) the size, which would make debugging them a tad harder, but would mean much less raw data for windows and file-scanners to deal with.

I’m definitely not happy with this tiny, tiny (under half second) delay when you click that button :D

An impossible bug. ARGGGH

Check out this code from production line for displaying pie charts of expenses.  I declare arrays of float totals for each of 3 pie charts.

 float totals[NUMPIES];
 totals[PIE1] = 0;
 totals[PIE24] = 0;
 totals[PIEALL] = 0;

 

Then I declare a 2-dimensional array of floats which I fill with some data. As I build up those amounts I also update the totals:

float amounts[NUM_FINANCE_CATEGORIES][NUMPIES];

for (int r = 0; r < NUM_FINANCE_CATEGORIES; r++)
 {
 if (r != FC_CAR_SALES)
 {
 amounts[r][PIE1] = SIM_GetFinanceRecords()->GetAmount(1, (FINANCE_CATEGORY)r);
 amounts[r][PIE24] = SIM_GetFinanceRecords()->GetAmount(24, (FINANCE_CATEGORY)r);
 amounts[r][PIEALL] = SIM_GetFinanceRecords()->GetAmount(-1, (FINANCE_CATEGORY)r);

totals[PIE1] += amounts[r][PIE1];
 totals[PIE24] += amounts[r][PIE24];
 totals[PIEALL] += amounts[r][PIEALL];
 }
 }

 

Then some simple resetting of data, which is irrelevant for this bug, and a check that I am not about to divide by zero:

Pies[PIE1]->Clear();
 Pies[PIE24]->Clear();
 Pies[PIEALL]->Clear();

if (totals[PIE1] <= 0 || totals[PIE24] <= 0 || totals[PIEALL] <= 0)
 {
 return;
 }

 

Then the final code:

 //now create
 for (int r = 0; r < NUM_FINANCE_CATEGORIES; r++)
 {
 if (r != FC_CAR_SALES)
 {
 for (int p = 0; p < NUMPIES; p++)
 {
 float perc = amounts[r][p] / totals[p];
 assert (perc >= 0 && perc <= 1.0f)

Hold ON STOP!. How can that assert ever trigger? EVER? (it does for some people). Its driving me mad :D. It can ONLY trigger if one of the amounts in the array is less than 0% or more than 100% of the total. I KNOW that the total is greater than zero, so the amount must be greater than zero. The total is only comprised of the sum of the amounts. There is no way these numbers can be out of synch, PLUS, they are all floating point vars so there is no rounding going on… or is it maybe a tiny tiny quantizing thing? (I’ve update the game with extra debug data for this error so I’ll find out soon). Surely thats the only explanation, and perc is something like 1.000000001%?

 

 

Coding your own ‘middleware’ is easier than you think

I have a system in the early access version of production Line that when an error happens that I assert() on, it logs that error out to a debug file, along with a timestamp, and the file name and line number of code where the assert failed. This is easy to do (google it). I recently changed this code so it also posts the error message, line number and filename data to my server (anonymously, I have no idea which player triggered it). That code basically just uses the WININET API to silently post some URL variables to a specific php page. That page then validates the data to make it safe, and runs an SQL query to stick the data into a database on my server.

IU can run SQL queries on the server to see what bugs have happened in the last 24 hours, which ones happened the most, what the version number of the game was, and how often they are triggering etc.

This has proven to be awesome.

The default solution these days to this kind of stuff is to buy into some middleware to do this, but thats not how I roll. I also have some balance-tracking stats analysis stuff that does the same thing. I can tell if the game is too hard, or too easy, or if players dont get to build electric cars until hour 300 in the game…and all kinds of cool stuff. This is the sort of thing that middleware companies with 20 staff, a sales team and a slick marketing campaign try to flog to you at GDC in the expo. Its really not that hard, it really is not rocket science. I get the impression that the majority of coders think that writing this sort of stuff is an order of magnitude harder than it actually is.

Oh also, production Line checks once per day on start-up to see if there is a newer version, then pops up a notice and a changelist if there is (for non steam versions). Thats home-grown code too, and its easy, EASY to do,m once you actually sit down and work out whats involved.

I know the standard arguments in favor middleware ‘it comes with tech support, its faster than coding it yourself, it allows you to leverage the experience and knowledge of others’, but I reject all of them. The only reason I know how to use WinInet, php and SQL and all that sort of stuff is because I taught myself. I taught myself ONCE in the last 36 years of programming, and I know how to do it now., I have my old code, and my own experience and skills. I dont care if XYZ Middleware INC is going to go bust, stop supporting my platform, double their prices, or stop answering emails, I have all this experience in house.

When you look at most middleware, its vastly bigger, more feature packed and complicated than you need it to be. Middleware has to be, because it has to be all things to all people. My stats tracking and error-reporting doesn’t have to work on IOS or on OSX or on mobile, or XBox or Playstation. it doesn’t have to be flexible enough to talk to four different database types, or support Amazons AWS. I don’t NEED any of that stuff, so guess what… the code to do what I *actually* need to do, is incredibly, incredibly simple. Don’t think about middleware contracts and APIs and excuses not to write the code, ask yourself ‘what exactly does the code need to do here anyway?’

I think when coders do that you will find, its actually way easier than you think. I did my error reporting code, both the client side and the server side, and tested and debugged in in about an hour. This stuff is not complex. Stop buying ‘fully-featured’ bloatware you don’t need.

You should aim to be the Elon Musk of software

I’m an Elon Musk fanboy, drive a tesla and own tesla stock, I’m a true believer. One of the things I like about the man is the way he does everything in reverse, when it comes to efficiency and optimization. The attitude of most people is

‘This thing runs at 10m/s. How can we make it run at 12 m/s?

Whereas Elon takes the opposite view:

‘Are there any laws of physics that prevent this running at 100,000m/s? if not, how do we get it to do this?’

This is why he makes crazy big predictions and sets insane targets, most of which don’t get met on time, but when they do, its pretty phenomenal. If the next falcon heavy launch recovers the center core too, thats even more game changing, and right now, the estimate is that the falcon heavy launch cost is $90 million verses $400 million of its nearest competitor (which only launches half the weight anyway). That not just beating the competition, but thats bludgeoning them, tying them up, putting them in a boat, pushing the boat out into the middle of the lake, and laughing from the shore during a barbecue as the boat sinks.

When it comes to my favorite topic (car factory efficiency, due to me making the game Production Line), he comes out with even crazier targets.

“I think we are … in terms of the extra velocity of vehicles on the line, it’s probably about, including both X and S, it’s maybe five centimeters per second. This is very slow,” he said. Musk then added he was “confident” Tesla can get a twentyfold increase of that speed.”

Now we can debate all day long whether the guy is nuts, and over promising and whether or not we could ever, ever get a production line that fast, but you have to admire the ambition. You dont get to create privately-made reusable rockets without ambition. I just wish we had the same sort of drive in software as he has for hardware. The efficiency of modern software is so bad its frankly beyond embarrassing, its shameful, totally and utterly shameful. let me dredge up a few examples for you.

I’m running windows 10, and just launched the calculator app. Its a calculator, this is not rocket science. A glance at task manager shows me that its using up 17.8MB of RAM. I am not kidding, try it for yourself. I’m pretty sure that there was a calculator app for the sinclair ZX-81 with its 1k (yes 1k) of RAM. Sure, the windows 10 app has…err nicer fonts? and the window is very slightly translucent… but 17MB? We need 17MB to do basic maths now? As I type this, firefox has got 1,924MB of RAM assigned to it, and is regularly hitting 2% of my CPU. I’m just typing a blog post, just typing… and thats 2% of a CPU that can do 49,670 MIPS or roughly 50 BILLION instructions per second. Oh…we have slightly nicer fonts too. wahey?

I’d wager the percentage of people coding games who have any real idea how the underlying engine works is tiny, maybe 5%, and of those maybe 1% understand what happens at a lower level. Unity doesn’t talk to your graphics card, it does it through OpenGL or DirectX, and how many of us really understand the entire code path of those DLLS? (I don’t) and of those, how many understand how the video card driver translates those directx calls into actual processor instructions for the card hardware? By the time you filter your code through unity, directx and drivers, the efficiency of what actually happens to the hardware is laughable, LAUGHABLE.

We should aspire to do better, MUCH better. Perhaps the biggest obstacle is that most of us do not even know what our code is DOING. Unless you have a really good profiler, you can easily lose track of what goes on when your game runs, and we likely have zero idea what happens after our instructions leave our code and disappear into the bowels of the O/S or the graphics API. Decent profilers can open your eyes to this stuff, one that can handle displays of each thread and show situations where threads are stuck waiting is even better. Both AMD and nvidia provide us with tools that let us step through the rendering of individual frames to see how each pixel is rendered, then re-rendered and re-rendered many times per frame.

If you want to consider yourself not just a hacker but an ENGINEER, then you owe it to yourself, as a programmer to get familiar with profilers and code analysis tools. Some are free, most are not, but they are a worthy investment. Personally I use AQTime, and occasionally intel XE Amplifier, plus the built-in visual C++ tools (which are a bit ‘meh’ apart from the concurrency visualizer). I also use nvidias nsight tools to keep an eye on graphics performance. None of these tools are perfect, and I am by no means an especially good programmer, but I am at the very least, fully aware that the code I write, despite my efforts, is nowhere REMOTELY close to as efficient as it could be, and that there is plenty of room to do better.  If Production Line currently runs at 60FPS for you (average speed across all players is 58.14) then eventually I should be able to get it so you can play with a factory at least 10x that size for the same frame rate. I just need to keep at it.

I’ll never be the Elon Musk of software, but I’m trying.