Game Design, Programming and running a one-man games business…

Multithreading concurrency bug?

I have a theory, help me out if you know about this stuff. take the following image from the visual studio concurrency profiler for GSB2 pre-draw code…

8156,6792 and 8404 are my additional worker threads I spawn to help me process stuff. Click to enlarge…

threadsWhat I do is basically build up a queue of tasks. The threads are always running and checking for whats next available to process for them. Meanwhile the main worker thread also does the same thing, ensuring it is not idle while the other threads are busy. Critical sections surround access tro the queue stuff to ensure there are no nasty bugs.

I think my problem is illustrated by the red section with the black line connecting above it. This is a thread sat there doing nothing. Here is what I think happens…

  • The main thread builds up the queue of stuff to do.
  • 6792 jumps in and grabs a task to do
  • 8404 jumps in and grabs a task to do
  • The main thread then thinks ‘right then, I’ll do this next task’
  • 8156 wants to jump in now and also grab a task, but the main thread is busy doing actual work. In fact, it seems to ‘miss’ its opportunity to grab a task for ages, even though the other threads do ok getting task after task.

Is this just a problem of my code design because the allocation of tasks is done by a thread that is not otherwise idling? It seems horribly wasteful to have a whole thread work just as a 99% idle ‘task allocator’. I thought cpus were clever enough to allow interruption of one cpu by another in these instances?

I know I could queue up the tasks ahead of time, but each task takes a variable amount of time, and also varies each frame. I *could* work off the last known task timings and write a clever allocator that tried to assign things in the best order, but that seems possibly like overkill, and something the cpu surely handles anyway? Or am I totally misreading this data. IO checked a few frames, they all seem to have the same pattern.


9 thoughts on Multithreading concurrency bug?

  1. I’m not sure about the details of your case, of course, but when you say:


    It seems horribly wasteful to have a whole thread work just as a 99% idle ‘task allocator’.

    Bear in mind that the thread doesn’t need to run all that time. You can tell it to sleep for a certain period of time (I don’t know your api, but usually in terms of milliseconds), and generally it won’t get scheduled until after that time and thus won’t keep burning CPU.

    Of course, if by “main thread” you mean the one handling the core sim handling and/or graphics, then you probably don’t want it sleeping.

    Either way, it seems desirable to restructure things so that the thread responsible for queuing work for other threads not also be responsible for part of that queued work. At least, if worker-thread saturation is a major goal.

    Best,
    Keith

  2. Keith has it right. Sleeping threads have minimal CPU overhead, and if your Producer thread is unavailable for a long time (i.e. executing a long-running task), that can introduce bubbles in the pipeline for workers.

    As far as variable-length tasks, I’m not sure how you can get better speedup than “the first worker available executes the highest priority (longest running, most critical) task in the queue available”. But keep in mind you can also create different worker pools (some of size 1) to manually tune task consumption.

  3. Very interesting replies. Stupidly it had never occurred to me to have a non-main thread as the dispensing thread. Just a mental block on my part, I shall definitely give it a try. Thanks

  4. Maybe I’m getting the wrong end of the stick here but there should be no reason that doing work on the main thread should interfere with worker threads pulling work off a Threadsafe queue unless the main thread has some kind of lock on the queues data?

    When I’ve written job queues there is minimal lock time.

    -main thread locks when adding items to job queue (very quick, I.e adding element to array of jobs)
    – worker threads lock to check if queue has items to do and then pulls items off the queue (also quick, simply removing item from array)
    – main thread and worker threads run doing whatever
    – worker thread adds to another separate queue with a DIFFERENT lock which holds the results (quick, adds results to another array)
    – main thread periodically checks results queue when appropriate to see if data has been stored, locks and retrieves results clearing queue (should be relatively quick)

    I.e queuing jobs on one lock object and retrieving results on another. Also means one job can be recording results at the same time as the main thread is queuing another job.

    Maybe this is too simplistic for your requirements though? *shrug*

    -wolfman

  5. No I think you understand fine. I thought it should work ok too. The system currently has a bunch of event HANDLEs and each thread waits on those single objects. The main thread allocates tasks to the next available thread and notifies them by setting the handle, to prevent constant checking by each thread.
    Something in the code must be wrong. In my head I’m sure I should be fine with one handle as a global ‘there are tasks in the queue’ notifier, but trying it that way caused bugs. arggghhh….

  6. 2 things.

    1. I agree that the main thread shouldn’t have any effect on the worker threads being able to grab jobs, assuming that the main thread can produce enough jobs to keep the queue non empty. Once the jobs are in the queue then it shouldn’t matter what the main thread is doing. This points to either the queue not having enough jobs, or the locking code in the main thread is messed up.
    2. I would say is that having extra threads is pretty cheap (until you get to huge numbers), so unless there’s a good reason for the main thread to be producing, I would probably try to put the producer on it’s own thread. It keeps the mental model much easier to deal with. The only case I’ve seen adding threads cause a true performance issue on modern systems is if locking isn’t being done intelligently, or if the work done per job is so small that getting the job takes as much time as running the job.

    My first reaction would be to put the producer on it’s own thread, and only put it on the main thread if there is proof that having an extra thread causes performance issues. If that is the case I would love to see a follow up article on that. It would be interesting to see why that is. Although I’m sure you have better things to do with your time, so might not be as interesting to you.

  7. Summary of the link:

    “During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design.”

    “At a crude level you can think of a Disruptor as a multicast graph of queues where producers put objects on it that are sent to all the consumers for parallel consumption through separate downstream queues. When you look inside you see that this network of queues is really a single data structure – a ring buffer.

    Each producer and consumer has a sequence counter to indicate which slot in the buffer it’s currently working on. Each producer/consumer writes its own sequence counter but can read the others’ sequence counters. This way the producer can read the consumers’ counters to ensure the slot it wants to write in is available without any locks on the counters. Similarly a consumer can ensure it only processes messages once another consumer is done with it by watching the counters.”

    “A more conventional approach might use a Producer Queue and a Consumer Queue, each using locks as concurrency mechanisms. In practice, what happens with producer and consumer queues is that the queues are either completely empty or completely full most of the time, which causes lock contention and wasted clock cycles. The disruptor alleviates this, in part, by having all of the producers and consumers use the same queue mechanism, coordinating with each other by watching the sequence counters rather than using locking mechanisms.”

    I found this while puzzling over similar issues in my simulation engine. I haven’t actually implemented it as I think that in my case I can get better results just by cutting down the amount of processing I need to do through filtering out useless data, so the motivation for spending time figuring out how to use the disruptor library is not there right now.

Comments are currently closed.