Today I wanted to talk a bit about a deceptive (and yet blantly obvious once you see it) little bug we had at work the other day. We were awaiting some Tasks in a dotnet core web api, and for some reason, the tasks seemed to be running twice.
What are tasks, again?
If you can already answer this in an interview question, then feel free to skip this part. For everyone else though, a Task (in .net land) is an abstraction of work that needs to be done, expressed and managed using the TPL (Task Parallel Library). Tasks are often simplified as threads, but you shouldn’t just assume that a task is a thread. Instead, think of it as tasks running on threads - almost threads within threads, if you will. Whenever a task is
awaited, the synchronisation context is captured, which includes the current line being executed, and other state such as instance variables. The thread which is executing the task then returns to the threadpool, where it can be used to process another task while the framework waits for the I/O operation to complete. At this point, the task is picked back up again - potentially by the same thread, or perhaps a different one - depending on the value passed to
ConfigureAwait(bool continueOnSameThread), which is true by default. There are helper functions in the framework to wait for one-of-many tasks, or to wait for a whole collection of tasks, which might be useful if you’re batching work that can be executed concurrently - which is what we were doing at work.
Anyway, this is just the tip of the iceberg - but the general takeaway is that tasks and the TPL greatly simplify the readability of async code, and help a great deal in taking advantage of concurrent computing and multi-core hardware. Having said that, they do introduce some new (sometimes subtle) bugs - we’ll look at one of them today.
Okay, so what happened?
Long story short (and without giving away any sensitive Intellectual Property), we had a bunch of tasks that would be saving records, and a bunch of tasks that would be deleting records - and we wanted to await all of them. I put together a little snippet in dotnetfiddle which simultates the conditions.
Just looking at the code, one might naively assume (as I did, admittedly) that it should just output the numbers 1 through 5, followed by a count. What we actually get though, are the numbers one through 5 TWICE, followed by a count, followed by the numbers 1 through 5 again. I imagine most of you have found the problem by this point, but for everyone else and for those of you who just want to enjoy the show, here it is.
Enumerating a collection of tasks (re)runs the work
This means that whenever you’re calling
ToList() or even
Count() on a collection of tasks, the work is being re-run. It sounds obvious now, but it may not be so when the
ToList() is hidden behind an extension method and you’re wondering what is going on. Moving forward, I think I’ll treat an
IEnumerable<Task<T>> with the same caution as an
IQueryable<T>, and always consider whether the operation you’re about to invoke can cause extra work to run.
Anyway, that’s all from me today. Do you have any other example of tasks coming back to bite you? Let me know in the comments.