Game Design, Programming and running a one-man games business…

THE GSB2 engine optimizing post

So Gratuitous Space Battles 2 is running really well (50-60FPS even at dual-monitor 5120 res mode) on my dev PC. dev PC specs:

win 7 64bit 8 GB RAM, i7-3770k CPU @ 3.50 GHZ. GeForce GTX 670.

However it can drip pretty low on my HD4000 intel laptop (also an i7, but lower spec). I’ve seen things go to 25 FPS at 1600×900, although that is without all the fancy options off, so maybe it will go higher with them deselected. Ideally I’d get that much better. So what is the problem?

I think it’s too many small draw calls, and sadly, thats kinda the way my engine works.

The basic algorithm of my engine is this:

Update_Everything() (game simulation, partly multithreaded)
Check_what_is_onscreen()
SetRenderTarget()
DrawBigListOfObjectsToRenderTarget()
SetRenderTarget()
DrawBigListOfObjectsToRenderTarget()
...
CompositeAllTheTargetsIntoFinalImage()
GUI()

The problem is all of those lists of objects being drawn. The solution to this in a conventional 3D game (before you all suggest it), is to use a z-buffer, sort all those objects by render state or texture or both, and blap them in a few draw calls. Thats fab, but it doesn’t work with alpha blending. People who do 3D games think alpha blending means particles, but nope, it also means nice fuzzy edges of complex sprite objects. To do the order-independent Z-buffer rendering method, you have to disable proper alpha-blending, and then everything starts to look sharp, boxy and ugly as hell. 3D games sprinkle antialiasing everywhere to try and cover it. With complex sprites layered on top of each other, this just looks dreadful.

The solution is the good old fashioned painters algorithm, meaning drawing in Z order from back to front. This works well and everything looks lovely.

screen1

The problem is that you end up with 4,000 draw calls in a frame, and then  the HD4000 explodes. Why 4,000? well to get some of my more l33t effects I need to draw a lot of objects four times, so thats only really 1,000 objects. to do proper lighting on rotated objects I can’t group objects of a different rotation, so each angle of an identical object means a separate draw call. Some of my render targets let me draw regardless of that angle, but the problem then becomes textures. If you draw painters algorithm and draw this…

ShipA
ShipB
ShipA
ShipB

Then there is no way to group the ships by texture without screwing it up, if those ships overlap. This is “a pain”. There are some simple things I can do…and I have a system that does them. For example, if I have this big list of sprites to draw and it turns out that occasionally I *do* get ShipA,ShipA Then I identify that, and optimize away the second call by making a single VertexBuffer call for both sprites. (or both particle systems, in those cases) I even have a GUI that shows me when this happens….

engine1

The trouble is, the majority of the time this is NOT happening. There are to my mind two potential solutions, both of them horribly messy:

1) Go through the list  and calculate where I have a ‘ShipA….ShipA’ pair where there is nothing in between them that overlaps either of them, and then re-arrange them so that they are next to each other, thus allowing for a lot more grouping. (This involves some hellish sorting and overlap detection hell).

2) Pre-process everything, building up a database at the start of the rendering of which textures seem to naturally follow on from each other, then render those textures to a temporary ‘scratch’ render target atlas, which I can then index into. This would be fun to code, also amusing to watch the render target itself in the debugger :D Adds a lot of ‘changing texture pointers and UVs after the event’ complexity though.

Be aware that I’m using Direct9, mostly for compatibility reasons, which means that rendering to multiple render targets at once, or doing multithreaded rendering really isn’t an option.

Edit: just spotted a bug with method 2. If I draw 10 instances of ShipA, they may be at different Zooms, so I will only be caching (in my temp atlas) a single image, not the full mip-map chain, meaning the rendering of atlased sprites would lose effective neat mip-mapping and potentially look bad :(

Using Intels GPA Frame analyzer

I love tools like this. Click below. This is the intel GPA frame analyzer rendering a frame on my intel laptop, but me analyzing it remotely over the LAN from my main desktop :D

frame1

Some random NVidia Nsight stats for GSB2

These DrawPrimUP calls are a pain, must find a way to reduce those…

nsight1

 

 

 

May have to do some reading to work out the significance of some of this, but draw calls is definitely too high…

nsight2

Why (as a consumer) you should love and support advertising

Not a trendy POV, especially for the younger net-savvy crowd. After-all, what kind of dumbass doesn’t have ad-block installed right? That will ‘stick it to the man’, and make your life easier right? Well frankly…some sites may as well have a huge banner that says ‘INSTALL ADBLOCK’, because they have flashing strobing monstrosities designed by people who have no idea how proper 21st century ads actually work but hey…don’t tarnish all ads with the same brush.

I’d suggest that as a consumer, you should LOVE advertising. Here is why. Like Democracy, advertising is a shit system, but its better than all the alternatives. There is basically a dilemma for anyone who makes a product, and that is ‘how do I get people to hear about my product’. The most honest answer to the question is that you set aside part of your budget to rent space where people will look, and use that space to inform people that your product exists. This is, of course, simple advertising. It works. It’s also fair. People would say it is biased towards those with money, but that money is simply an expression of faith in the product. Where some people have money, others have time. Time basically is money.

If you block all ads, or worse still, tar product-makers who advertise as being ‘evil’ or ‘corporate’ or, to quote some reddit replies to my ads ‘shilling scum’ (yup, under a ‘promoted link’, clearly paid for openly and proudly by me…’go figure’), then you are simply forcing business to find an alternative solution to promoting the product. The thing is, people WILL find another way to get their product promoted over the opposition. It is the life-blood of business. If you prevent them using ads, then the temptation to go with less obvious, less honest, frankly underhand and shady methods is going to win out.

From a business POV, spending $30k on ads vs handing $30k in a paypal payment to a famous you-tuber are essentially the same thing. The end result is the same, and the cost is the same. The first choice is very up-front and honest and people know what is going on. The second choice (assuming its undeclared to the viewer) is basically subverting what claims to be impartial information and manipulating it to push an agenda. Do people want that?

There are a vast range of businesses (I get emails from them all the time) offering to sell people twitter followers, or post on forums on your behalf, or up-vote social media posts, and all the rest of it, no-doubt linked to click-farms in China or India. This is the dark-side of ‘social-network-marketing’. If you want to just ‘buy’ popularity on a site where commercial concerns are banned, then it’s easy, just fill out this form and send the money. Unethical as fuck, clearly, but do you really think that nobody does it? if they didn’t, the spam emails wouldn’t be economic, for starters.

There is a myth, in the ‘anti-corporate, anti-ads’ world, that you can block out all ‘corporate’ influence, but you cannot. Not outside of North Korea, anyway. Even if your site has no ads, and absolute rock-solid captcha stuff to ensure there are no bots, and that nobody from (perish the thought) a games company is posting on your site, then it would still be trivial, trivial, trivial, to completely rig the odds.

Anyone with their own forums knows that preventing spam is almost impossible because a lot of it is ‘human spam’, in other words, accounts created by actual people (paid minimum wage in India/China) who can enter the captcha quite easily, make a few seemingly innocent posts, before (in my experience), spamming your site with links to cheap kitchen fitting. When you see this, it is basically human-marketing agents done really really badly.

Now imagine a situation with a smarter ‘black hat style’ marketing company. Say they have $100k to spend to promote game X. Why spend it on 10,000 Chinese kids who are obvious as hell, when you can just employ 10 full time western ‘social marketing agents’ for 3 months to actually go out there and hustle for the game. They can join dozens, if not hundreds of sites, read loads of threads, make loads of posts, look like any other member of the community, just hanging out, chilling, talking about games, and they all just so happen to have recently picked up a copy of X, and you know what? to be honest, game X is the best damned game they ever played, no seriously.

That is the world of game marketing without ads. It’s not always obvious. There is a spectrum. On the one hand, you have 10,000 Chinese kids spamming the world about Civony, or some other browser-based crap. On the other end of the spectrum, you have just two or three marketing experts who do their job so well you have absolutely no idea they have any connection to a games company whatsoever. What they have in common, is they are trying to subvert a non-commercial arena into being a commercial one.

Ads are different. there is a clear dividing line. When you see an ad for my games, It’s not disguised as anything else. It’s honest. It’s me saying ‘I believe enough in you liking the look of this game, I’m actually paying out money to tell you about it’.  I reckon thats good, thats fair, thats what I like, and thats why I have adblock off for the majority of my surfing.

I could take the hint, realize gamers have decided that ads are evil, that actually ‘lets players  deserve to be paid’, and just say ‘fuck it’, and hand over loads of cash to a PR company to do whatever the fuck they like, and ask no questions, but I’d rather not. I don’t want to be a full time promoter and schmoozer. I’m a game designer and programmer. Don’t let the underhand schmoozers take over.

 

Too much stuff on screen

This is a screenshot of GSB2 (click to enlarge). Nothing particularly impressive, but when looking at it, and then stepping into code to see whats going on, it’s clear that the engine is kinda pushing against limits for hardware on laptops etc (This is with graphical detail at maximum, so that will be less of an issue eventually).

screenshot_28-12-2014_13-09-31

I think that ultimately, I’m making too many draw calls, and lesser hardware can’t handle it. because of the nature of the engine, those draw calls are vastly higher than the amount of actual objects on screen, because there is depth, and lightmaps and other trickery that magnifies the effect of, for example, just rendering a single sprite of a fighter ship. That fighter ship probably involves more like a dozen draw calls :(. In the scene shown, we have 915 3D objects (most are not onscreen), 266 depth objects, 648 lightmap objects, 89 ‘splats’, 372 effects and 216 saturated effects. Thats clearly a lot :(

I’ve seen the game do 5,000 or more draw calls in one frame, and thats kinda bad, so the way I see it my approach to optimization could take various paths:

1) degrade some less important stuff when we exceed a certain number of calls / drop below 60 FPS. Not ideal, but a brute force way to fix it.

2) Further optimize some stuff that is currently done in single draw calls, like parts of the GUI, to get the general number of calls down.

3) Slot in a layer between my current engine and DirectX, which caches states yada, and collapses draw calls into fewer calls where the texture/shader/render states are the same.

Theoretically 3) is vastly better than the rest, but I fear that I’m adding another ‘layer’ here which could in fact be slower, and also that I’m keeping poorly optimized code and fixing it after the event. After all, the best solutions to speedups are always algorithmic, not close-to-the-metal; tweaking. However, another benefit of 3) would be that such an abstraction layer makes the job of porting the game slightly easier. I’m considering implementing it anyway, so I can at least see how often such a ‘collapse’ of draw calls can happen. In other words, would this reduce the count by 5% or 90%?

So that means replacing the DrawPrimitve() calls with a macro, maintaining a cache of the render states (or maybe just letting any RS change flush the buffer? and just (for now) initially keeping track of the collapsible draw calls. I’m going to give that a go… Or maybe I should see what the hell all those objects are first…