Generic Collections and Thread Safety
Recently I was involved in a discussion about generic collections and thread-safety. I was under the impression the framework had thread-safe versions of the generic collections, but I was unable to locate them and resorted to Google. I was wrong. There aren't any.
I thought the subject would make an interesting post, since there seems to be a lot of bad information on the subject.
Why do we need thread-safe collections?
Imagine you're running multiple threads that take an item from a queue and perform some processing. Your code might look like this:
// Get the next request from the queue var request = infoPackRequests.Dequeue(); // Perform some processing SendInfoPack(request);
At first glance, you might think this code is pretty thread-safe. The first thread will get the first item, and the next thread will get the following one, right? Wrong. Sometimes you might be lucky and it will work out that way. Other times you might find your application does strange things, such as processing the same item twice.
If you take a peek at the Dequeue method, you'll find some code that looks like this:
Object removed = _array[_head]; _array[_head] = null; _head = (_head + 1) % _array.Length; _size--; _version++; return removed;
This seems fairly straightforward. It grabs the item, nulls it out from the array and decreases the size. Now consider what would happen if both threads executed the first line at the same time. They'll both return the same item. And that's not the only thing that could go wrong.
So, how do we fix it?
Fortunately, this problem has been considered, and Microsoft provided the SyncRoot property on ICollection to help us synchronise access. Usage is simple - we take a lock on SyncRoot before doing anything that's not thread-safe:
// Get the next request from the queue
var request;
lock(infoPackRequests.SyncRoot)
{
request = infoPackRequests.Dequeue();
}
// Perform some processing
SendInfoPack(request);
Now we can be sure our Dequeue methods won't be called at the same time. Note that we don't need to (and shouldn't) include the SendInfoPack call inside the lock, because it does not touch the collection. Including it in the lock would have a negative impact on performance, because other threads would block for the entire duration of SendInfoPack on any other thread.
Problem solved! Why the long face?
Well, this all works great when you're using a Queue or ICollection. Most of us however are now using Queue<T> or ICollection<T>. These classes don't expose a SyncRoot property. The BCL Team Blog explains why this decision was taken. Developers were using "thread-safe" wrappers as if they didn't have to care about the threading issues, eg:
if (myQueue.Count > 0)
{
DoSomething(myQueue.Dequeue());
}
The problem here is that the locks provided by the wrapper within each method/property are released between the calls to Count and Dequeue, so there's no guarantee an item still exists when the Dequeue method is called.
So what's the preferred method?
Like most things, that depends on exactly what you're doing. The reason for taking the SyncRoot property away was to encourage developers to think about the synchronisation between threads rather than just blindly using synchronised wrappers and assuming it'll all be fine. You'll want to hold a lock for the duration of any operation that could cause issues if another thread was doing something at the same time. Using the above example, you'd want to ensure nothing was removed from the queue after you checked the Count property, so you would probably have something like this:
var item = null;
lock (lockObject)
{
if (myQueue.Count > 0)
{
item = myQueue.Dequeue();
}
}
if (item != null)
DoSomething(item);
This method makes it easier to synchronise multiple objects (imagine you were moving items to a "processed items" collection after processing) since you have a central lock object, rather than two independent SyncRoots.
If you found this article interesting, why not Subscribe to my RSS feed or follow @DanTup on Twitter for more? :)