Concurrency in Python

Concurrency in Python

When high-performance is required

There are several ways to achieve concurrency in Python and each of them has its own trade-off: A-sync, Multi-Thread and Multi-Process.

First of all, let’s break the myth — Python is NOT a single threaded language! You’re probably confused because you heard Python has a GIL: Global Interpreter Lock mechanism which blocks it from running multiple threads at the same time.

Well, that’s correct — but it doesn’t mean Python cannot run multiple threads and I’ll explain later.

Multi-Thread vs Multi-Process

Let’s leave the fun stuff to later and start with a question — what is the difference between multi-thread and multi-process and when we should use each of those ?

Multi-Thread

When we need to run multiple tasks which takes significant amount of time to complete — for example, I/O tasks: Storage access, Network access, Database query, etc. We don’t want to wait for each task to be completed sequentially, so we prefer to run them all in parallel — in this case, multi-threading will match our needs.

Multi-Process

When we need to run multiple computational tasks — additional threads won’t help. We will need a separated process to utilize more cores of our CPU, thus gaining more computing power — in this case multiprocessing will be our friend.

So why don’t we use multiprocessing all the time ?

We can — but it will be a wasteful strategy and soon we will “eat” our system resources, as multiprocessing is more demanding request system-wise and memory-wise.

The recommended way is to consider using multithreading first, leaving the multi-process only for the computational demanding tasks. Multithreading is more efficient and takes smaller footprint in memory with relative to multiprocessing.

What about the GIL ?

The GIL prevents from the tasks to run exactly at the same time. Instead, it’s alternating very fast between tasks and the result is quite similar to a regular multithreading (with a little cost).

Let’s use the following simple example to demonstrate it:

So, as we can see from the above code, there’s a small addition (~1.5% in average) to the time when using multithreading which is the cost of using GIL, but it’s not significantly high.

ASync

Async function is another way to achieve concurrency but within a single thread — so the overall cost is even smaller than using multithreading.

So why don’t we use it all the time ?

We can, but it’s not simple as using the other methods. The code will be more complicated and less easy to understand for everyone. Also, async code has some disadvantages like co-oping only with async code — so if the other modules we use are blocking (not async) then the whole code won’t run async for us and we lose the concurrency advantage.

New terms and structures are added to the code

To achieve async functionality, the code has to change and to adapt new way of thinking and calling new functions like: await, async, loop, etc. For example, we can’t just call directly to an async function — we need to call it via another async functions by using the asyncio module.

  • async — added prefix for each function to identify to the interpreter is an async function.

  • await — a marker to the interpreter to suspend running and to switch to another async function.

  • loop — the manager of the async functions running.

  • coroutine — the async function.

  • task — a higher level of the async function.

Let’s see an example for a simple async code:

So, as we can see from the above code — the async code run faster than the multithreaded code and without the additional memory usage of more threads (and without GIL). But there are no free lunches — the code is significantly harder to manage and you may see some variations in the documentations along Python versions. There are too many chances on the way to get an error or to get a non-async code — for example, forgetting to put ‘await’ before calling to an async function will raise: coroutine ‘<function_name>’ was never awaited. Also, you can cause the code to run in blocking mode which will increase the time from 1 second to 3 seconds in our example.

Also, there’s a need to master the async system: coroutines, loops and tasks and to avoid confusing errors like: “DeprecationWarning: There is no current event loop”

Multithreading vs Async

Adding a multithreading functionality to the code is much easier to understand and to maintain compared to converting the code into async code. However, the tradeoff is the memory consumption of each thread and the time for each thread to spawn. So if we want to avoid the time and memory footprint in calling to many threads, we may consider the async path.

Summary

When we want to use concurrency in Python we can consider the following based on the task we want to achieve:

  • Async code — The most ‘cheap’ method system wise, but it’s relatively new on Python and the implementation may be bumpy.

  • Multithreading — Well known method, work well and without too many hassles. However it targeted mainly for I/O bound tasks and not for heavy CPU tasks.

  • MultiProcess — The hard-core method to rule all the heavy CPU tasks and the only bottleneck it has is the CPU cores benchmarks and the amount of available RAM.