The GIL and Thread Safety

What are your thoughts on multi-threading in Python?

A common interview question

Like many veteran Python programmer, you probably already know the answer: because of the Global Interpreter Lock, (in the context of the CPython interpreter) Python actually doesn’t run more than one thread at any given moment.

You’ll probably also go on and talk about why Python’s multi-threading is still relevant: it is good for IO intensive applications. When a thread is often blocked for IO, being suspended by the GIL doesn’t actually affect performance that much. Rather, other threads that are not waiting on IO to complete get the chance to be executed, giving us the illusion of concurrent execution.

All of these answers are good answers, and would usually satisfy even the most serious interviewers. But here’s a follow-up question that often take people off guard:

So because of the GIL, Python essentially runs one thread at a time. Does this mean Python is by default thread safe?

A follow-up question.

At the surface, it seems to make sense. But it is also easy to give a counter-example with some poorly designed multi-threading application:

Imagine we have two threads updating on a shared variable V. Let’s say V currently has the value of 5, and thread 1 takes the value of V, but before it could update V, its execution is suspended by the GIL. Now thread 2 acquires the value of V, does its calculations and decides to update it by decrement it by 3. Now V has the value of 2. At this time, GIL hands back execution to thread 1, which decides to increment V by 1. Given the original value of V, (from thread 1’s perspective) 5, thread 1 sets the value of V to 6.

We have now lost the updates made to V by thread 2. Python is not thread safe by default, even with GIL.

To demonstrate this clearly, let’s consider the following example, which involves a simple shared variable update:

from threading import Thread

class WillThisWork:
    a = 0

    def add_by_one_many_times(self):
        for _ in range(1000000):
            self.a += 1 

test = WillThisWork()
threads = [Thread(target=test.add_by_one_many_times) for _ in range(5)]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print("The value of a is: {}".format(test.a))

If the += operation is thread safe, then we should expect to see the final value of a to be 5000000.

$ The value of a is: 2044547
$ The value of a is: 2514763
$ The value of a is: 1417187

After running the code multiple times, it seems that we never get the full 5000000 in the output, suggesting the += operation is not thread safe.

It seems that GIL was able to stop the operation right inside a +=. How does it manage to do that? To answer the question we’ll make use of the disassembler utility dis. If we disassemble the add_by_one_many_times() function into its corresponding Python byte code, we get the following:

  8           0 LOAD_GLOBAL              0 (range)
              2 LOAD_CONST               1 (1000000)
              4 CALL_FUNCTION            1
              6 GET_ITER
        >>    8 FOR_ITER                18 (to 28)
             10 STORE_FAST               1 (_)

  9          12 LOAD_FAST                0 (self)
             14 DUP_TOP
             16 LOAD_ATTR                1 (a)
             18 LOAD_CONST               2 (1)
             20 INPLACE_ADD
             22 ROT_TWO
             24 STORE_ATTR               1 (a)
             26 JUMP_ABSOLUTE            8
        >>   28 LOAD_CONST               0 (None)
             30 RETURN_VALUE

Here we can clearly see that the simple (on source code level) += operation is actually multiple byte codes, spanning from offset 16 to 24. When the GIL stops the execution of a thread, it stops it at the byte code level. So it is entirely possible to find your thread loaded with old values that are no longer relevant after GIL releases control back to it.

And in fact, very few things we write can be executed in a single byte code. One exception being function calls to C functions wrapped in Python. The built-in sorted() function is one such example.

>>> def sort_wrapper(array):
...     return sorted(array)
... 
>>> dis.dis(sort_wrapper)
  2           0 LOAD_GLOBAL              0 (sorted)
              2 LOAD_FAST                0 (array)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE

The call to sorted() is done in one byte code at offset 4, meaning the sorting process will not be blocked by GIL.

In fact, Python even give developers the ability to release the GIL back to other threads in C functions.

Leave a Comment