Game Design, Programming and running a one-man games business…

Something I learned today about STL and Z-sorting…

So here is a thing, you might be interested in if you use STL, if you don’t…well sorry :D

if use use the sort()n function thats built into an STL list, it guarantees to preserve the order of identical objects in the list. if you use the vector version, all bets are off.

Bloody hell.

So if you have a bunch of asteroids with these Z values

5,9,3,0,0,0,-2,-5

And you use a list to sort them, all is good in the world. if you use a vector, those 3 asteroids at 0 are going to Z-fight like crazy things.

the solution?

use stable_sort()

well call me mr-picky but I think I’d be happier if stable_sort() was the default, and we actually renamed sort() to be take_your_chances_and_do_random_crap_sort().

I presume stable_sort is slower… Luckily I’m not sorting asteroids every frame (that would be NUTS), and I only sort things when I have to, so it isn’t mega critical. it led to a bug where the biggest hulk chunks from spaceships did Z-fighting if theyu weighed ewnough to all have a Z-speed of zero, and thus a Z position(relative) of 0, so when other objects spinning away caused a z-sort, their order got scrambled. If you are a non coder and don’t know what Z-fighting is, it’s a flickering effect you get in 3D games where two images seem to be undecided about which one is in front. You often see it on ‘decals’ such as blood splats on the floor or posters on a wall. It’s annoying…

Anatomy of rendering a game dialog box in a custom engine (GSB2)

So I was tweeting that this took forever:

dialog1

It’s just a dialog box for gratuitous space battles 2., why did it take more than twenty minutes to put together? Now… I’ve seen unity, I know it has all these plug-ins that do stuff like this, and that it’s all very user-friendly etc blah blah. But I’m old school. I’m rocking my own custom-written engine, including all the GUI. That gives me huge advantages (mostly speed) and also some disadvantages. The best advantage is there isn’t anything I can’t make the code do.

The pain with this dialog box came in three flavours.

Flavour one was those circular clock-like indicators. In theory, this is really easy, you can just generate a tri-strip of a lot of polygons and draw a curve thats smooth and crisp as you like, as long as you can spare the vertexs. I’m not drawing many, so it’s not an issue. The problem is, when you do that, you get a too-blocky, too un-aliased clunky mess that just doesn’t look ‘right’ when surrounded by lovely aliased everything. I’m not drawing 3D models, so my game has a  nice smooth look to it, and it jarred badly. So I have a sprite of that curve, and I draw a subset of it using a tri-strip arc. It’s a bit fiddly, ant took a while to get right.

Flavour two was the outline of the right-hand part of the window. It’s a bit complex, as it goes in and out and then around the close button and then loops around those circles, and it has to be really slick too, and ironically in this case it looks better drawn as a crisp 1 pixel line, so there is actual hand-crafted code in there to work out all those positions and curve ncie arcs and lines around them.

Flavour three was speed. I like everything in my game to render fast, including GUI. No point in having a fast engine where 95% of the frame is spent drawing a dialog box. That means ensuring that ouline on the dialog is a single draw call with no fuss, that all those tiny animated bits of fluff in the dialog corners and outside the edges are drawn efficiently, that the calculations on that arc outline are as fast as possible, and that the dialog in general; doesn’t use many draw calls.

It’s all horribly, laughably slow really. I probably have a ‘spare render target’ knocking about that I could use to blap this whole dialog to (BTW they resize dependent on the ship, which adds to the complexity), and then only update it when it changed, otherwise just blapping it as a single quad. In practice, the windows various elements update quite a bit.. but I’m sure I could speed up the module icon rendering with runtime aliasing onto spare render targets. I love all this stuff.

But even I know when I’m getting obsessed and need to move on!

Creeping inefficiency

Here is why I reckon that triple-A game runs slow on your PC. The real reason :D

Step 1: Geniuses at Intel / AMD / ARM design an unbelievable;e processor capable of a bazillion operations per second. Efficiency 100%

Step 2: Someone writes a compiler that converts C++ into assembly language / processor specific stuff that makes a lot of assumptions and loses a big chunk of efficiency Efficiency 80%

Step 3: A coder like me waltzes in and writes some code that is as optimized as he can possibly manage, but has deadlines etc and knowledge gaps meaning it’s slightly less efficient than optimal Efficiency 75%

Step 4: He then writes it to run on a single core, because the headache of smoothly spreading tasks over all the cores is unbelievable, plus game code doesn’t multithread easily so… Efficiency 25%

Step 5: Because writing a new engine for each game is un-trendy these days, the coder decides to use an off the shelf engine that makes even more assumptions and compromises… Efficiency 15%

Step 6: Coder #2, not knowing the assumptions Coder #1 made when we wrote those handy functions, calls them every frame instead of once… Efficiency 5%

Step 7: The game gets run on a typical desktop PC, with 30 different apps fighting for CPU and RAM, IM clients, P2P stuff, web browsers, email, all that crapware that shipped with the PC, anti-virus scanners, cool desktop widgets that tell you the weather, music streaming as you play… Final Efficiency 3%.

eff

My numbers are wild guesses, but I reckon there is some truth to it all. For inexperienced coders using off the shelf engines probably boosts efficiency. Maybe some engines under some circumstances on some hardware multithreading is more possible. I can’t help[ fantasizing about a PC that absolutely locked everything down in a big way when you launched a fullscreen game. Turned off everything that could possibly use some CPU or RAM and let the game run like an xbox. Maybe that is what steambox will become?

That’s more likely than many programmers learning how to optimize, that’s for sure :(

 

GSB2 Multithreading a single frame (so far)

Here is a big battle on GSB2 running at 1920 1200 res, on a GTX 670, quad core windows 7 PC. This was taken using the visual C++ concurrency visualizer. 3732 is the main game thread. Green is busy, red is idle, light blue is sleeping (end of frame, waiting for flip). Click to enlarge.

multi

1284 seems to be the thread where directx or the nvidia driver does it’s stuff (not sure which).

7596 2692 and 2788 are my additional threads of GSB2 doing processing. Each of those colored bubbles represents one or more tasks that a thread has grabbed and is working through. The big red stretches are obviously gaps I could potentially fill as I find ways to break apart dependencies of tasks and push more of the main thread into the other cores. It’s obviously already been worthwhile, as I reckon I’m currently doubling the framerate (just about) thanks to multithreading. Almost all the grey blobs are transformation of particles within particle emitters, packed into arrays. These are too numerous and cause too much thread-scheduling right now so I might make those arrays bigger, or even dynamic sized.

 

My big slowdown function

Well I feared as much when I wrote this: https://positech.co.uk/cliffsblog/2014/04/14/reading-back-from-gpu-memory-in-directx9 but it turns out that, yup, that function is the slowest in the entire engine, at least until battle is joined. I really need to fix it.

Essentially what I’m doing there is maintaining a depth buffer for objects in the game. I then do some fancy processing on that buffer (all taking place in video card memory. I then really badly need to know the values of the depth buffer for about 100 different points, and based on the outcome of that, I either don’t draw, draw some stuff quite small, or draw it really big. In short, I’m scaling an object based on specific values of the depth buffer.

Right now, my engine does not use vertex shaders at all. I just use pixel shaders, and have vertex shaders as NULL. I’m pretty sure the solution to my problem is easy if I go to draw all of these objects, then write a vertex shader that can scale the object accordingly. the thing is, I’m using directx9 and therefore I really do have separate vertex and pixel; shaders. This is going to involve me reading up on the most undocumented stuff ever, which how vertex shaders and pixel shaders can be used in a 2D game under directx9.

Bah.