Using Poetry to Manage Your Python Packages

Python is a great language for tinkering around. It’s quick and easy to implement your ideas into code, but it’s not always easy to turn your code into sharable packages.

One of the most popular package manager is pip, and it relies on the PyPI repository. If you want to share your package with an easy-to-use pip install command, you have to publish your code as a package on PyPI.

But the process is not usually that straight forward when it comes to publishing a package. You can take a look at PyPI’s official guide on publishing a package. It’s a little bit overwhelming to say the least, despite the guide’s authors attempt by trying to ease the tension a little by incorporating everyone’s favorite: emoji.

Congratulations, you’ve packaged and distributed a Python project! ✨ 🍰 ✨.

Python Packaging User Guide, Tutorials

“There must be an easier way to do this.” You might think. And you are absolutely right to think so.

Poetry

Introducing Poetry, a pip alternative that’s on steroid.

What does Poetry do? In short, it manages your packages for you: whether it’s managing your projects dependencies or packaging up your own packages.

That means Poetry can help you package up your project and publish it to PyPI, with minimal effort.

Installation of Poetry is relatively straight forward, however you must pay attention that Poetry is not installed using pip, and it is installed as a global package. When you are developing inside virtualenvs, you might want to instruct your virtualenv to copy global packages into the environment.

Poetry tries to automatically make use of the current Python version in the virtualenv, if there is one. So you generally don’t have to worry about weird compatibility issues or missing site-packages.

Publish Your Packages

Publishing your package is easy. If you already have a project that you have been working on for a while, you can initialize the environment by an interactive command:

# poetry init project-name

Poetry will ask you several questions relevant to the project. Examples include the type of license the project uses, the name of the authors, etc.

However, what Poetry asked here are the bare minimum. Oftentimes you want to include more information. You can add more information about the project by looking for the pyproject.toml Poetry just created.

Here is a sample file that includes some more configuration options, only parts relevant to the project information is shown:

...

name = "project-name"
version = "0.1.0"
description = "Some short descrption about the project."
authors = ["Your Name", "Second Name"]
license = "MIT License"
readme = "README.md"
homepage = "https://link-to-home-page.com"
repository = "https://link-to-repository.com"
keywords = ["Keyword1", "Keyword2"]
classifiers = ["Development Status :: 4 - Beta"]

...

You can find a more detailed explanation of the options inside Poetry’s documentation. It’s worth noting that some of the options require specific values from a list of allowed values, so consulting the documentation will be a good idea whenever you are in doubt.

Now we have all the information needed to publish a package. But before we can actually publish the package, we need to build the package first. To do that, use this simple command:

# poetry build

It’s important that your Python code is all inside a folder that has the same name as the project name you specified when you initialized the project using Poetry.

And after building, we can publish the package to PyPI (by default) using another simple command:

# poetry publish

Poetry will ask for your PyPI username and password, and after some uploading, your package is now published!

It’s just that easy.

Hopefully now you no longer have any excuses to not publish your projects on PyPI. Happy coding!

The GIL and Thread Safety

What are your thoughts on multi-threading in Python?

A common interview question

Like many veteran Python programmer, you probably already know the answer: because of the Global Interpreter Lock, (in the context of the CPython interpreter) Python actually doesn’t run more than one thread at any given moment.

You’ll probably also go on and talk about why Python’s multi-threading is still relevant: it is good for IO intensive applications. When a thread is often blocked for IO, being suspended by the GIL doesn’t actually affect performance that much. Rather, other threads that are not waiting on IO to complete get the chance to be executed, giving us the illusion of concurrent execution.

All of these answers are good answers, and would usually satisfy even the most serious interviewers. But here’s a follow-up question that often take people off guard:

So because of the GIL, Python essentially runs one thread at a time. Does this mean Python is by default thread safe?

A follow-up question.

At the surface, it seems to make sense. But it is also easy to give a counter-example with some poorly designed multi-threading application:

Imagine we have two threads updating on a shared variable V. Let’s say V currently has the value of 5, and thread 1 takes the value of V, but before it could update V, its execution is suspended by the GIL. Now thread 2 acquires the value of V, does its calculations and decides to update it by decrement it by 3. Now V has the value of 2. At this time, GIL hands back execution to thread 1, which decides to increment V by 1. Given the original value of V, (from thread 1’s perspective) 5, thread 1 sets the value of V to 6.

We have now lost the updates made to V by thread 2. Python is not thread safe by default, even with GIL.

To demonstrate this clearly, let’s consider the following example, which involves a simple shared variable update:

from threading import Thread

class WillThisWork:
    a = 0

    def add_by_one_many_times(self):
        for _ in range(1000000):
            self.a += 1 

test = WillThisWork()
threads = [Thread(target=test.add_by_one_many_times) for _ in range(5)]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print("The value of a is: {}".format(test.a))

If the += operation is thread safe, then we should expect to see the final value of a to be 5000000.

$ The value of a is: 2044547
$ The value of a is: 2514763
$ The value of a is: 1417187

After running the code multiple times, it seems that we never get the full 5000000 in the output, suggesting the += operation is not thread safe.

It seems that GIL was able to stop the operation right inside a +=. How does it manage to do that? To answer the question we’ll make use of the disassembler utility dis. If we disassemble the add_by_one_many_times() function into its corresponding Python byte code, we get the following:

  8           0 LOAD_GLOBAL              0 (range)
              2 LOAD_CONST               1 (1000000)
              4 CALL_FUNCTION            1
              6 GET_ITER
        >>    8 FOR_ITER                18 (to 28)
             10 STORE_FAST               1 (_)

  9          12 LOAD_FAST                0 (self)
             14 DUP_TOP
             16 LOAD_ATTR                1 (a)
             18 LOAD_CONST               2 (1)
             20 INPLACE_ADD
             22 ROT_TWO
             24 STORE_ATTR               1 (a)
             26 JUMP_ABSOLUTE            8
        >>   28 LOAD_CONST               0 (None)
             30 RETURN_VALUE

Here we can clearly see that the simple (on source code level) += operation is actually multiple byte codes, spanning from offset 16 to 24. When the GIL stops the execution of a thread, it stops it at the byte code level. So it is entirely possible to find your thread loaded with old values that are no longer relevant after GIL releases control back to it.

And in fact, very few things we write can be executed in a single byte code. One exception being function calls to C functions wrapped in Python. The built-in sorted() function is one such example.

>>> def sort_wrapper(array):
...     return sorted(array)
... 
>>> dis.dis(sort_wrapper)
  2           0 LOAD_GLOBAL              0 (sorted)
              2 LOAD_FAST                0 (array)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE

The call to sorted() is done in one byte code at offset 4, meaning the sorting process will not be blocked by GIL.

In fact, Python even give developers the ability to release the GIL back to other threads in C functions.