Well I did it. I am 99.9% sure I finally crushed my random crash bug. Here are the gory details.

It crashed in debug (or release) and seemingly somewhere random. Possibly much later than the actual cause, as is the way with these things. By using Microsoft’s handy CheckHeap functions, I was able to progressively narrow it down to a certain bit of code. (Essentially you check the integrity of the whole heap at two points, then when one of them triggers, you know whether it was inside or outside the enclosed code.

So what I narrowed it down to was some code that command bunkers, hospitals and repair shops run, where they go through their list of ‘supported’ units (the ones they are affecting) and updates them. Essentially this means telling all the units in the list that they now are not supported, then rebuilding a new up-to-date list. For speed reasons this is only done every X frames.

Now the obvious case for a crash is that some unit that was in the list when we added it, has been deleted, and thus when we go to tell them they are no longer being supported, we write to gibberish and cause mayhem. BUT! I know that this is never the case because no units are deleted mid-battle, ever.

Except that it turns out I’m wrong about that :D. They ARE deleted if… I drop a 25 man squad of infantry into a 12-man trench or bunker. At that point 13 of them get deleted. If I dithered when placing them long enough for a nearby command bunker to ‘claim’ them, then when I release them, I set off a time bomb, and after X frames, the command bunker will write to freed memory and hilarity ensues.

All fixed now, obviously. The easy fix would be to not support units until they are placed, but that isn’t nice for the GUI, and also I needed to rewrite the code to use a different system anyway so that multiple bunkers could overlap (they don’t stack though).

2 days. probably 20+ hours of debugging. What a pain. Insert comment about needing to get my act together regarding smart pointers here.

Bah.

3 Responses to “Fixing my heap crash bug”

  1. Byron says:

    I keep nagging you to use a decent memory manager instead of the general purpose heap stuff, would have picked it up immediately :)

  2. John Burton says:

    I’m all for smart pointers in c++, shared_ptr and unique_ptr pretty much replace raw pointers with no real overhead over doing it yourself, and extra features :)

    However it sounds like a crash here was a good thing…. If you’d had a shared_ptr then you’d probably never have deleted those thing. You’d either have a memory leak or some much harder to diagnose problem where *something* didn’t work right some time later on because some data structure didn’t quite contain what you thought it did, but the cause was 15 minutes earlier in a completely unrelated thing.

  3. Ben Hymers says:

    Seriously, shared_ptr and weak_ptr are fantastic. They’re definitely not a replacement for managing your memory properly (since if you just find/replace T* with shared_ptr you could end up leaking memory like John says above), but they would have caught this issue much sooner. Your support unit would go to iterate its weak_ptrs to supported units, fail to lock one of them and you’d either handle that explicitly or crash immediately reading nullptr.