Python’s module and package system has many features. Steve Love explores some more advanced ones.
This is the second instalment, following on from the introduction to Python modules [ Love20 ]. In that article, we looked at how to create your own modules, a little on how to split your program into modules to make sharing of the code easier, and how to structure packages to make testing them easier. In this article, we will take a more detailed look at making the packages you create easier to import and use. We will explore more ways to share your packages with others, and some ways of ensuring you can always have a dependable environment in which your code runs.
A little more on the import statement
In the previous article, we described a simple package with code to take input in one structured format, e.g. JSON or CSV, and turn it into another format, perhaps performing simple transformations on the way.
Listing 1 shows the basic usage of the code in our own package
. For the sake of keeping the package contents tidy, we created some sub-packages so that the code to perform transformations was separate from the main package, and the tests for the package were all in one place, also separate. The package structure we ended up with is shown below.
<project root>/ |__ main.py |__ textfilters/ |__ __init__.py |__ csv.py |__ json.py |__ transformers/ |__ __init__.py |__ change.py |__ choose.py |__ tests/ |__ __init__.py |__ test_filters.py |__ test_change.py |__ test_choose.py
This structure explains the two import statements in Listing 1: the first such import brings in the main filters for taking (in this case) CSV input and turning it into JSON output. The second import is pulling a single function –
– from a module called
. This module is in a package named
, which is a sub-package of the
from textfilters import csv, json from textfilters.transformers.change import change_keys import sys if __name__ == '__main__': def key_toupper( k ): return k.upper() data = csv.input( sys.stdin ) result = [ change_keys( row, key_toupper ) for row in data ] print( json.output( result, sort_keys=True, indent=2 ) )
As we mentioned in the previous article [
], there are a few ways we could arrange the import statements, with alterations to the usage. The portion of the import line after the
statement effectively defines the namespace, so that first import line could be:
And the corresponding use of the csv object would become:
data = textfilters.csv.input( sys.stdin )
This demonstrates why namespaces are so important. Python already has a built-in module named
(which our package’s
module uses), and it’s not unimaginable that you would want to import
of those. Explicitly fully naming the
module allows Python’s
module to also be used alongside it.
Python provides a shortcut to import all the names from a module. Consider the following:
from textfilters.csv import * data = input( sys.stdin )
The import statement here requests that all the names from the
module are imported into the current namespace. On the face of it, this seems great – we get to use the
function unadorned! However, there are pitfalls to this approach. Programming is more than a typing exercise, and names matter.
directive did indeed bring the name of the function we wanted into the current scope, it also brought in
every other name
exported by the
module (we will return to what ‘exported’ means later). This may, or may not be what you intended. To see why it’s important, create a file called
with the code below (a cut-down version of the
import csv def input( data ): return list( csv.DictReader( data ) )
Now run a Python interpreter session in the same directory, and try the following:
>>> csv = '1,2,3' >>> csv '1,2,3' >>> from namespace import * >>> csv <module `csv` from `...`>
Here, we’re creating a variable called
, and assigning it a value. Importing
that value. I’m sure you can guess why, but to make this completely clear, when the
, it’s bringing the name
scope as an exported name along with the name
module, any exported names are brought into
scope, over-writing your own variable names where they clash.
Of course, while you can be disciplined and always avoid the use of
, you can’t very well impose that on
who might use your package. There are ways of helping to prevent your users from shooting themselves in their own feet.
|Explicit is better than implicit|
names are imported when you use the
form. Python has a convention for making names private to a module (or indeed, a class – the mechanism is the same) by prefixing it with an underscore. Consider the code in Listing 2.
import csv as _stdcsv from io import StringIO as _StringIO def input( data ): parser = _stdcsv.DictReader( data ) return list( parser ) ...
statement allows you to alter the names of things you import, and by renaming
, we make that name
to the module. If a user of this module now invokes
from textfilters.csv import *
, those names are not brought into scope. Note how this affects the usage within the module’s code. You can still
request private names when you import from a module, because in Python, private doesn’t mean
, it just means you have to try a little harder to get access to it.
Define a public API
You can also limit the set of names brought into local scope when using
by defining a module-level list of strings called
. If this value exists when
is encountered, it is taken to mean ‘this is the list of all public names in the module’. It’s just a list of the names from the module you wish to be public. In the instance of the code in Listing 2, this would be defined as:
__all__ = [ 'input', 'output' ]
Adding this line to
will change the behaviour of
for everyone so that only the names you defined will get imported.
What have we learned?
import *imports all the public names from a module.
- You can rename imported things in the import statement.
Prefixing names with an underscore makes them ‘private’, so
import *does not import them.
As the author of a module, you can also limit the names that
*imports by defining a value for the special
As the user of a module, avoid using
import *, as it can bring in unexpected names that may hide names in your code.
In the previous instalment [ Love20 ], we explored how packages are a special kind of Python module which can have sub-modules – some of which may also be packages. Python identifies a package by the existence of a file named __init__.py . What we didn’t mention was that this file gets ‘run’ by the Python interpreter when the package is imported, in much the same way that the top-level code of a simple module is run when imported.
This file can contain any Python code you like, but it’s useful for bringing sub-module names into a narrower scope. Consider again the directory layout of our package:
|__ textfilters/ |__ __init__.py |__ csv.py |__ json.py |__ transformers/ |__ __init__.py |__ change.py |__ choose.py
Functions inside the change.py sub-module of the sub-package transformers need a full-qualification when they’re imported:
from textfilters.transformers.change import change_keys
This is a bit unwieldy, but arises from the
separation of the
module from the
module. That physical separation helps us as the package author to structure the code for ease of maintenance, but imposes some unnecessary complexity on the users of our package. Listing 3 shows how I’d prefer to present the API to users.
from textfilters import csv, json from textfilters import reshape import sys if __name__ == '__main__': def key_toupper( k ): return k.upper() data = csv.input( sys.stdin ) result = [ reshape.change_keys( row, key_toupper ) for row in data ] print( json.output( result, sort_keys=True, indent=2 ) )
I’ve already mentioned there is more to programming than typing, but there is more to this than reducing key-presses. Your public API needn’t be constrained by the physical structure of the code, and how you choose to lay out your package needn’t be limited by how you wish your users to use it. We can take advantage of the fact that Python, by default, exports all public names from a module – including the modules it imports.
In order to achieve my desired result, a couple of changes are required. The first is to the transformers/__init__.py file:
from .change import change_keys
This brings the name
into the scope of the
namespace, and removes the need for users to explicitly name the intermediate
The second alteration is to the top-level package __init__.py .
from . import transformers as reshape
the namespace of
. Naturally, you could just rename the
folder, but one reason you might not want to do that could be if you already have a version ‘in the wild’, but you’d like new users to have a new API, while still supporting existing users on the ‘old’ API.
We can streamline the API even further. A common pattern when using complex modules is to import the whole package and have access to its contents, as in Listing 4.
import textfilters as tf import sys if __name__ == '__main__': def key_toupper( k ): return k.upper() data = tf.csv.input( sys.stdin ) result = [ tf.reshape.change_keys( row, key_toupper ) for row in data ] print( tf.json.output( result, sort_keys=True, indent=2 ) )
As things stand, however, this will not work. You’ll get an error:
AttributeError: module 'textfilters' has no attribute 'csv'.
A common mistake is to presume that importing a package causes Python to go and find all of its sub-modules and import the published names from them all. Such behaviour could be quite expensive! This is why the
file is so important – it is how a package defines all of its published names. In order to achieve what we want in Listing 4, we just need to bring the names
into the package scope, using the top-level package’s
from . import transformers as reshape from . import csv, json
A similar mistake is to presume that
from textfilters import *
would cause Python to automatically load all the sub-modules. For the same reason as above, it does not. Not even the top-level modules (
). The documented behaviour is that this imports the
package, but in our case,
is ‘just’ a directory. It does, however, run the
. and import any published names that result from that.
As with simple modules, packages also recognise the special
value as a list of strings naming the sub-modules to import. It’s crucial to note, however, that using
isn’t transitive. Suppose you have the following:
__all__ = [ 'transformers' ]
__all__ = [ 'change', 'choose' ]
If you invoke
from textfilters import *
, it will import the
sub-package, but the sub-packages defined by the
will not be loaded. You would also need to invoke
from textfilters.transformers import *
to also bring those names.
You can’t use the top-level
value to import sub-packages, either. For example, the following will not work:
__all__ = [ 'transformers', 'transformers.change' ]
The consequence of this is that defining the public API for a package is best done by importing or defining the names you want in
. It’s not necessary to also specify
, since importing
from a package won’t bring any unexpected names into scope, as it might with a simple module.
What have we learned?
- A package’s __init__.py file gets run when it’s imported, and this file can contain Python code.
- You can use the __init__.py to alter the public API of your package.
*from a package does not automatically bring in any of the public names, only what is defined in the __init__.py .
Creating an installable package
Sharing a package directly by copying the package directory, or even better, including it in a shared version control system, is sufficient in most cases. There can be benefits to having a cleaner separation between application and library code, however. One example might be that a package is used across multiple applications. In such a case, it is wasteful and error-prone to have the package sources duplicated in different repositories. It makes more sense to have the shared code separately version-controlled in its own shared repository.
Most modern version control systems have the facility to build a working copy from multiple repositories, so this shouldn’t present a problem. However, you can avoid the need for that by creating your own installable package. If you’ve used Python for anything more sophisticated than simple scripts, you’ll almost certainly have come across
: the standard Python package installer
. In this section we’ll explore how to create a package that can be installed using
The very simplest installable package just needs a file named setup.py , located in the parent directory of the package itself (i.e. in the same directory as main.py in the example). Listing 5 shows the bare minimum contents.
from setuptools import setup, find_packages setup( name = 'TextFilters', version = '0.0.0.dev1', packages = find_packages(), )
The name and version properties are used to create the file name of the package. The version number here follows the recommended practice that is based on Semantic Versioning (see [
] and [
]). The pre-release specifier (
in this case) departs from the Semantic Version spec, and is the format understood by
, which – when installing from a shared package repository like
pre-releases unless they’re explicitly requested.
The last line uses a tool which automatically detects and includes any sub-packages (directories containing __init__.py ). The packages property is merely a list of package and module names to be included, so you could explicitly name them:
packages = [ 'textfilters', 'textfilters.transformers' ]
This invocation would exclude the
sub-package, which might be what you intend. Note that sub-packages have to be explicitly named. If you have a large package with several sub-packages, the
utility is much more convenient. Note also that the file
be included. In our case, that’s intentional, because it’s not inside a package.
There are many more parameters accepted by the
function; we’ll examine a few of the common ones here, but a complete description, along with recommendations on version numbering schemes, and restrictions on things like the
property, can be found in the
Python Packaging Guide
]. Many of those properties are used by the Python Package Index, PyPI.
For now, we have the bare essentials needed to create an installable package. To build it, run this command within the directory containing setup.py :
python setup.py bdist_wheel
This invocation creates a ‘binary distribution’, also known in Python circles as a wheel (see [ PEP427 ] for all the gory details). If all went well 2 , you will see a couple of new directories: build and dist , and the dist folder should have your installable package in it, named TextFilters-0.0.0.dev1-py3-none-any.whl . You can create ‘source distributions’, too, if the package is pure Python code, but it doesn’t have any real benefit over a wheel format package.
The components of the file name are partly taken from the name and version parameters given to the
(refer back to Listing 5). The last 3 parts identify the targeted Python language version (
), the ABI (
, in this case) and the required platform (which we didn’t specify, and so is
). You can control these with other parameters to the
function, but for our purposes, the code in the package is indeed intended for Python 3, and is pure Python code, with no ABI or platform requirements, so the defaults are appropriate.
The file itself is just a normal Zip file with a .whl extension, so you can examine the contents for yourself (I find 7-zip especially useful).
Before we install our shiny new package, however, we should talk about segregation.
Partitioning and separation
Python comes with a rich standard library of tools, some of which our example package is using –
. You can also install 3rd party modules, and our package is using
. In [
], we looked at how Python locates modules when they’re imported. As a reminder, here is the basic Python algorithm for finding modules:
- The directory containing the script being invoked, or an empty string to indicate the current working directory in the case where Python is invoked with no script – i.e. interactively.
The contents of the environment variable
PYTHONPATH. You can alter this to change how modules are located when they’re imported.
- System defined search paths for built-in modules.
The root of the
It’s number 4 we’re interested in now – the
When you install a 3rd party package (such as pytest), it is installed into a directory named site-packages , which is a well-known location for the Python interpreter (the location may differ, depending on your platform). Whilst it is obviously convenient to have all the packages you want in one place, easily available for use in your Python programs, it can easily become cluttered. In particular, you might not want (or be able) to install the packages you create to the global site location, especially when they’re in early development.
One way to handle this might be to have multiple installations of Python, but this is wasteful unless you genuinely need multiple versions of Python available. A more light-weight way of handling it is to take advantage of Python’s virtual environments. These are a fully-featured Python environment, but cut back to the bare minimum needed. They don’t contain the 3rd party modules installed in the global Python install location (but you can choose to give a virtual environment
to those libraries) except for a few necessities – including the
installer module. The important thing is that a virtual environment is entirely independent of all other virtual environments, with its own
The implication of this is that you can create Python virtual environments with different libraries for different needs. This is useful now as a way of quarantining our custom package so that it doesn’t interfere with either the installed Python instance, or anyone else’s virtual environments. You should consider creating your environment somewhere outside of your code folders, maybe by putting the code beneath a new directory (named something like src , for example), and using the parent to hold the new environment.
python -m venv localpy
On some platforms you may be prompted to install a package for
to work, for example on my Ubuntu-based Mint distribution, I had to install
This creates a new Python environment in a directory named localpy as a child of the current directory. You can choose wherever you like for it. If all’s gone to plan, you should now have a directory structure like this:
<project root>/ |__src/ |__main.py |__setup.py |__textfilters/ |__ ... |__localpy/ |__ ...
The structure of the environment will differ, depending on your platform, but will contain Python itself (on Windows, in
, on *nix it’s in
), along with
to install more libraries, and a script named
script ensures that the virtual environment’s Python and
are at the front of the current session’s path. It’s not necessary to always activate a virtual environment, however: you can invoke the Python interpreter by fully-qualifying the directory name, and it will ‘just work’. This extends to using
to install packages.
.\localpy\Scripts\pip.exe install [package name]
./localpy/bin/pip install [package name]
Python internally keeps track of where to find the platform-independent and platform-dependent files it needs in order to run, and where to find installed libraries. These are:
When a virtual environment is in use (either by activation, or by running the Python program), these values will point to the respective locations within the virtual environment. When no virtual environment is in use, these values point to the locations of the respective Python installation locations. Furthermore, when a virtual environment is in use, two more values can be used to find the location of the Python install from which the virtual environment was created:
These values enable the virtual environment to operate independently of the main Python installation(s), as well as any other virtual environments. You can find much more detailed information on how these things work in [ venv ] and [ site ], but for our purposes, all that remains is to install our local package into the independent environment. It’s as simple as (on Windows):
.\localpy\Scripts\pip install src\dist\TextFilters-0.0.0.dev1-py3-none-any.whl
If you now run a Python session using the virtual environment’s Python, you can import the
package, and see from where it was imported:
>>> import textfilters >>> textfilters <module 'textfilters' from '\\path\\to\\localpy\\lib\\site-packages\\ textfilters\\__init__.py'>
(This will look slightly different on non-Windows platforms, but the idea is the same).
What have we learned?
- You can create your own installable package to make sharing code even easier.
- Python wheels are zip-files.
- The site module is where Python looks for installed packages for use in code.
- Python virtual environments are a powerful way of segregating requirements with its own, independent site module.
|Know your dependencies|
Sometimes, a package you create will require other packages to be installed. In the case of our package, it can be used without anything other than Python’s standard libraries, but it does have some tests. Whilst they don’t depend exclusively on
, which is the testing package we used in [
] (other frameworks are available, such as Nose2 [
], which would also work just fine), we can use it to explore another feature of package creation.
file we created for our package, we can indicate that our package requires other libraries. In this case, we can tell the setup tools that the package
should also be installed when our package is installed.
Listing 6 shows a change to
with the addition of a parameter to the
. This is a list of packages, which in this case has only one item, but you can specify as many as you need here.
from setuptools import setup, find_packages setup( name = 'TextFilters', version = '0.0.0.dev1', packages = find_packages(), install_requires = [ 'pytest' ], )
Now re-create the package, and re-install it with an upgrade:
localpy\scripts\python src\setup.py bdist_wheel localpy\scripts\pip install --upgrade dist\TextFilters-0.0.0.dev1-py3-none-any.whl
You will see that
, along with
requirements, is also automatically installed.
Sometimes you need a particular version of a dependent package, or perhaps you’ve tested on a particular stable release, and wish to constrain the versions of your dependencies. This is also specified in setup.py 3 :
install_requires = [ 'pytest>=5.0' ],
You can also depend on specific versions of Python itself in the setup.py parameters. In the case of our package, we may well want to ensure our users are on Python v3 or above. There are many reasons to do this, but chief among them is that the code in a package depends on some feature that was introduced in a specific Python release.
python_requires = '>=3',
There is much more you can specify, and describe, about your package in the setup.py file, but you can find a wealth of documentation on that in the Python packaging guide ([ PPG ]). We do need to revisit one aspect we’ve already looked at briefly – the version number.
As we’ve already seen, the version number specified in
gets used to generate the file name of the resulting package wheel. In our example, we marked the version with a trailing
, which marks the package as a pre-release – specifically, still in development – which is used by
when performing upgrades.
Given a package with a version number indicating it’s stable (e.g.
), and a
version that’s marked as a pre-release (e.g.
), when performing an upgrade,
will by default give you the latest applicable stable release, which in this case is
. You can explicitly request that pre-releases are considered by passing the
on the command line, or by specifically requesting a pre-release version.
Whilst we’re in development mode, and installing specific locally-created wheels, this isn’t an issue for our package, of course, but it
make a difference for the dependent packages in the
It also makes a difference in a file that’s normally named
(but needn’t be, necessarily), which is a file you can use alongside a virtual environment to have
install a whole collection of packages. This is a useful technique for specifying the library contents of a virtual environment, with needed packages at specific versions. It’s common to want this to ensure, for example, that different developers on a team have
environments; if one person is developing against version 1 of some package, and someone else is using version 2, chaos is bound to ensue! The
file provides a way of creating a coherent environment that the whole team can use.
The simplest way to create the requirements file is to have
itself create one:
localpy\scripts\pip freeze > requirements.txt
The requirements file should contain something similar to this (truncated here for brevity):
... pytest==5.4.1 six==1.14.0 TextFilters==0.0.0.dev1 ...
Here, the file requires a
version of each installed package. You can modify the version numbers if you need versions after a particular one, or within a range of versions, for example. Note that our own package,
, is explicitly naming the pre-release version. Suppose we had been working on the package for a while, and had a few releases available in our
TextFilters-0.0.0.dev1-py3-none-any.whl TextFilters-0.0.1-py3-none-any.whl TextFilters-0.0.2.dev1-py3-none-any.whl TextFilters-0.0.2a1-py3-none-any.whl TextFilters-0.0.3a1-py3-none-any.whl
We have stable
versions, but only a pre-release for
file might have this line:
We might create our virtual environment from scratch as follows:
python -m venv localpy localpy\scripts\pip install -r requirements.txt -f src\dist
instructs it to read the list of packages to install from the indicated file. By default, pip looks on PyPI [
] for packages, but we haven’t published our package there yet, so the
packages in the specified location (which might, for example, be a file share available to the team), and look in PyPI for packages not found there.
This would result in our new environment having version
of our TextFilters package, because it’s the latest stable version available. If we had also added the parameter
to the pip command line, the latest
– would have been installed.
What have we learned?
- An installable package can explicitly define other packages upon which it depends.
pipinstaller makes sophisticated use of the version numbers exposed by a package to determine how to install requirements.
- You can easily create a canned fully-working virtual environment by using a library requirements file.
A wider audience
In this article we’ve explored in more detail the idea of Python ‘namespaces’, and how you can take advantage of package initialization to make using your package easier for your users. We’ve looked at some of the pitfalls of wild-card imports, and highlighted the benefits of creating a public API for your modules that might not match its physical structure. We also explored virtual environments, and how to create and install your own package ‘wheels’, and looked at why this segregation is important. Finally we looked at package dependencies, and how to manage them in concert with virtual environments and the
Taken all together, these things will help you structure your packages so they can be shared easily, and your users will find your packages easier to install and use as a result.
There is more you can do with your own packages. For example, in the previous article we looked at the
unit-testing framework, and in this article we’ve looked at Python’s
. Both of these are installable modules that can be
python -m venv
This is achieved by adding another special file to the package: __main__.py , which is executed when the package is run in this way 4 .
The ultimate sharing of packages with the wider community means publishing it to the Python Package Index ([ PyPI ]). There is excellent documentation on this in the Python packaging guide ([ PPG ]). Taking this extra step involves some extra responsibility, of course, in maintaining and documenting your package.
These things – and more! – I leave for you to discover.
[Love20] Steve Love (2020) ‘The path of least resistance’ in Overload 155, February 2020, https://accu.org/index.php/journals/2749
[Nose2] Nose2: https://docs.nose2.io/en/latest/
[PEP440] ‘Python Version Identification and Dependency Specification’, https://www.python.org/dev/peps/pep-0440/
[PEP427] ‘The Wheel Binary Package Format’ (PEP 427), https://www.python.org/dev/peps/pep-0427/
[PPG] The Python packaging guide, ‘Packaging and distributing projects’ at https://packaging.python.org/guides/distributing-packages-using-setuptools/
[PyPI] The Python Package Index, https://pypi.org/
[SemVer] ‘Semantic Versioning Scheme Specification’, https://semver.org/
[site] Python Documentation – Site specific configuration hook, https://docs.python.org/3/library/site.html
[venv] Python Documentation – Creation of virtual environements, https://docs.python.org/3/library/venv.html
‘Packaging a Python library’, https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure
pipcomes as part of the Python install for versions later than 3.4
You may need to install the
wheelpackage from PyPI.
- Setting an upper limit on the version is possible too, but be careful of that. If you tie down your requirements too tightly, it might make your package unusable.
- I wanted to explore this a bit more in the example package, but was defeated by the fact I’d (deliberately) used names that clashed with built-in Python modules. Another example of why not to do that!
is an independent developer constantly searching for new ways to be more productive without endangering his inherent laziness.