I thought I might try another of those ‘live blog’ entries where I go through my thought processes as I do some code profiling to speedup the route-finding slowdowns on super-huge Production Line factories. Here is the map I’m analyzing usage on.

I’m loading the map, waiting 20 seconds, then making a change to the resource route layout, then letting it run another 20 seconds, and taking a look at the function level profiling snapshot using aqtime. This save gamne has 8,994 slots in it (which is massive) and uses a custom map.

Here is the function breakdown for the GeneratePath() code which seems to be a major cause of any framerate issues when changing routes:

I probably need a refresher in my mind as to how this code works so… Lets look at when I change a route by (for example) placing down a new conveyor. This makes the following calls:

			SIM_GetResourceConveyors()->RefreshExits();
			SIM_GetResourcesRoutes()->PurgeAllImpossible();
			SIM_GetResourcesRoutes()->InvalidateRoutes();
			SIM_GetResourceObjects()->OnRouteAddition();

RefreshExits is trivial, taking on average 0.002ms, so not a problem, PurgeAllImpossible() doesnt show up in the profiler, but it basically sets a lot of flags to be false, and calls a sub function that also is presumably too quick to show up, so likely not a culprit.

InvalidateRoutes() is likely the culprit. It tells every slot, and every stockpile that it needs to begin verifying its current cached route table in case something has changed.

Finally OnRouteAddition goes through every intransit resource object and tells them they need to verify their route(over the next two seconds) in case a newer route is available, or the current one just got deleted. This is also too quick to show up.

So it looks like all that route invalidation is the slowdown. But why? The actual function takes 0.2ms, which is slowish…but not noticeable when a frame is 16.0 ms. Its the delay over the next few seconds that causes the problems… Basically every slot with a stockpile calls PrepareToVerifyBays()…

void SIM_PlaceableSlot::PrepareToVerifyBays()
{
	BayVerificationPending = true;
	int limit = BAY_VERIFICATION_INTERVAL;
	if (APP_GetPerformance()->DoRoutesLimitFramerate())
	{
		limit *= 2;
	}
	BayVerificationTimer = GRandom::RandomChoice(limit);
}

So in English, this code tells the slot that it needs to verify its nearest import bays, and to do so at some random time over the next 60 frames (1 second). A globally calculated value works out if we have a poor framerate, and if we are verifying bays, and thus notes that given this PC, this map etc…we need to allocate twice as much time as usual (4 seconds) to verify those bays…

Thus we have maybe 4 seconds max (4,000ms) over which to spread the effect of all this route re-calculation. That might mean up to 4 seconds of lag, which may seem a short time, but if you are watching a resource item getting delivered in the game, and it travels down the wrong route for more than four seconds…you would be perplexed, so its a reasonable target. How to make it faster?

Bays are verified every frame. a check happens every frame to see if a verification is pending, and if it is, a function gets called. That function is pretty huge and its called SIM_ProductionSlot::RecalculateNearestBay(). In my profiled sample it was called 2,200 times and on average takes 2.8ms. OH MY GOD. Almost all that time is spent inside SIM_ResourceConveyorManager::GetNearestBays(). This function itself does some fancy multithreading, spinning off taks to a lot of threads (8 on my PC) each of which processes a list of routes, eventually calling SIM_PathFinderManager::FindPath(). This gets complex so lets look at a multithreaded analysis of it in vtune:

This actually looks pretty efficient and packed, so i think that the real answer to speeding this up is not speeding up the actual route-verification, but culling the cases in which I actually even bother verifying a route. For example…Here is a simple map with production slot stockpiles at A and D, and import bays at B,C,E,F…

Both A and D keep a track of the shortest route along conveyor belts to the 2 nearest import bays. Note not every tile is a route, only each one marked with conveyors, so we have the following.

Now lets say I add a new conveyor belt tile to improve the routing from A to its two nearest bays…

Now as mere humans, its pretty clear to us that as soon as possible, we need to get A to recalculate those two routes it has to its nearest bays, because there is now a shorter route from A to B (but actually not C). Its also pretty clear that D’s routes (to E and F) are totally unaffected. Right now… my algorithm is dumb as fuck, because it tells A and D ‘OMG something changed! redo everything!’, which is a massive waste of resources. The problem is…How exactly does a better algorithm look?

This is one of those things that sounds simple but actually is not. If I learned computer science at school I’d likely be better at this, but I’m self taught. Here is my current thought process:

The route from D to E is 4 tiles. The distance from D to the new tile (pointed at by the arrow) is greater than 4. Thus there is NO way that my route from D to E can possibly be affected by this change. Thus I can leave D as a slot that keeps its old fashioned routes.

So an algorithm that made this optimization would be something like:

For Each Slot
bool bcheck = false
int max_dist_bay = GetMaxDistToBay()
For Each changed tile
If CrudeDistanceToTile <= max_dist_bay
bcheck = true

So its worth trying to see if that helps…

…And I tried it! I even wrote some debug output to verify that the number of slots that got verified each time was below 100%. Something I didn’t realize was that obviously for changes taking place right in the middle of a map, the majority of the map ends up being covered by this algorithm, as the distance for slots to the center of the map is often about the same as the distance to the nearest importer. However, when making changes nearer the edges and corners of the map, a way smaller percentage of the slots need verifying, sometimes just 2-4% of them.

Even if I just assume that we average 50% of slots being verified instead of 100%, this represents a 2x speedup in the game during these events, which should be super noticeable. Of course the only way to be sure this feeds through to user experience is to stick the new algorithm on a togglable switch and watch the frame rate counter and…

YES

It boost the framerate a LOT. Obviously its still spread out over the same amount of time, but the total amount of recalculating needed is way lower so… obviously we get a big framerate boost.

This is NOT a perfect optimised way to do it, but you can bet its already roughly twice as fast, which is great. I need a better way to work out what slots are needlessly recalculating, AND I need to ensure I can use similar techniques on situations that cause a recalc by other means, such as when stuff is deleted but…its definite progress.

I’ll probably work on the deletion case tomorrow, do some more extensive testing, then try it out in the 1.80 preview build (unstable beta) on steam.

8 Responses to “Speeding up Production Line (large factories)”

  1. Sam says:

    You can increase the lower bound on a route being affected by the new conveyor even further without significant additional work, I think. For, say, the E->D route, the best case distance involving the new conveyor is the Manhattan distance from D to the new conveyor, as you have, plus the Manhattan distance from the new conveyor to E. There’s still a pathological case – if both the slot and the new conveyor are central – but it should be a lot more limited. Not sure if this is necessary – it sounds like you’ve got your performance acceptable already – but might be worth a look.

  2. cliffski says:

    I’m not sure thats true, if I understand you correctly. Note that what I need to check here, is the nearest 2 import slots to each red slot. The closest importer to D are E and F, but that might change and become A and E with change.
    You are correct that the real difference to compare to is newpoint to D PLUS newpoint to the nearest importer to newpoint, not just newpoint to D, but the calculation of newpoint to its nearest importer is a big part of the code I’m trying to avoid calculating.

    There are probably way faster ways to do the whole thing, including a flood fill of the whole map that then stores every distances point to every other point in a massive matrix, but then that results in a hueg amount of stored data and potentially a lot of cache misses…

    I’m pleased with todays speedup anyway, because I thought I’d stopped optimizing ages ago :D

  3. Sam says:

    Ah, yes, I see what you’re saying. You need the closest importer to the new point, not the existing closest importer to the slot – but for the purposes of setting a bound on the routes that need calculating, you can use the Manhattan distance as in your existing optimisation, not a pathfound true distance, so it’s linear in the number of importers and happens once, at the creation of newpoint. If that saves even one actual route calculation it’s likely to be faster, right?

  4. Ryan says:

    I’m probably missing the context that precludes it, but couldn’t newly-placed conveyors trigger a flood-fill over the existing conveyor tiles to find connected production slots?

    This would avoid the (possibly rare?) case where we find ourselves pointlessly recalculating paths for nearby but disconnected conveyor graphs.

    • Cliff Harris says:

      the objective is to ensure routes from slots to importers are the shortest routes to the two nearest importers. Thats all that matters, so if I place down a new conveyor and its in the middle of nowhere, or doesn’t change the shortest routes for any slot, then nothing needs recalculating.

      A flood fill from a new conveyor tells me the shortest dist from it to any slots, but until we route from the slot to each importer i havcent actually found any useful data, which is why I do it from slot –> importer (and where the new conveyor may or may not be part of that route).

      • Ryan says:

        Apologies, I haven’t been clear. The flood-fill is not intended to calculate path lengths, but just as a way to find the production slots for which paths should be recalculated – those that are connected to the newly-placed conveyor. It could obviously be combined with the technique described in the post: flood fill to find connected slots, then distance check to disregard slots whose paths cannot be improved by the new conveyor.

        Given that recomputing paths is the expensive operation, it should be worthwhile to spend a bit more time on working out if it’s actually necessary. Assuming a reasonable conveyor graph I’d be surprised if the floodfill was even particularly expensive, although I’m sure you’d encounter players who would confound that assumption.

        • Cliff Harris says:

          in 95% of cases the flood-fill is not expensive, but you would be amazed just how expensive it gets on a 200×200 map when you delete something (or add something) right in the center, especially when the map is nearly full and half the tiles represent conveyor belt intersections.

  5. Cliff Harris says:

    Yup, I am currently using manhattan distances, I’m not actually working out the real route, I’m using manhattan as the worst case (shortest possible).