Python project maturity checklist
Did you write a cool and useful Python script? Would you like to share it with the community, but you're not sure how to go about that? If so, then this is the article for you. We'll go over a list of simple steps which can turn your script into a fully fledged open-source project.
The Python community has created a rich ecosystem of tools, which can help you during the development and upkeep of your project. Complete the steps in this checklist, and your project will be easier to maintain and you'll be ready to take contributions from the community.
This is an opinionated article. I will run though a long list of tools and practices, I've had good experience with. Some of your favorite tools may be left out, some of my choices you may find unnecessary. Feel free to adapt the list to your liking and leave a comment below.
I tried to complete the entire checklist in my small open-source project named gym-demo
. Feel free to use it as a reference and submit PRs if you find room for improvement.
☑ Define your command-line interface (CLI)
If you're going to provide a command-line utility, then you need to define a friendly command-line user interface. Your interface will be more intuitive for users if it follows the GNU conventions for command line arguments.
There are many ways to parse command line arguments, but my favorite by far is to use the docopt
module developed by Vladimir Keleshev. It allows you to define your entire interface in the form of a docstring at the beginning of your script, like so:
"""Usage: gym_demo.py [--steps=NN --no-render --observations] ENV_NAME
Show a random agent playing in a given OpenAI environment.
Arguments:
ENV_NAME Name of the Gym environment to run
Options:
-h --help
--steps=<STEPS> How many iteration to run for. [default: 5000]
--no-render Don't render the environment graphically.
--observations Print environment observations.
"""
Later you can just call the docopt(__doc__)
command and use the argument values:
import docopt
arguments = docopt(__doc__)
steps = int(arguments.get("--steps"))
render_env = not arguments.get("--no-render")
print_observations = arguments.get("--observations")
env_name = arguments.get("ENV_NAME")
Note
I usually start with Docopt by copying one of the examples and modifying it to my needs.
☑ Structure your code
Python has established conventions for most things, this includes the layout of your code directory and naming of some files and directories. Follow these conventions to make your project easier to understand by other Python developers.
Code layout
The basic directory structure of your project should resemble this:
package-name
├── LICENSE
├── README.md
├── main_module_name
│ ├── __init__.py
│ └── main.py
├── tests
│ └── test_main.py
├── requirements.txt
└── setup.py
The package-name
directory contains all of the sources of your package. Usually this is the root directory of your project repository, containing all other files. Choose your package name wisely and check if it's available on PyPI, as this will be the name people will use to install your package using:
pip install package-name
The main_module_name
directory is the directory which will be copied
into your user's site-packages
when your package is installed. You can
define more than one module if you need to, but it's good practice to
nest them under a single module with an identifiable name.
According to the Python style guide PEP8:
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
If possible, the name of your package and the name of it's main module should be the same. Since underscores are discouraged in package names, you can use my-project
as the package name and my_project
as the main module name.
Code structure
Whether you're writing code in Python or any other language, you should follow Clean Code principals. One of the most important ideas behind clean code it to split up your logic into short functions, each with a single responsibility.
Your functions should take zero, one or at most two arguments. If your functions have more than 2 parameters, that's a well known code-smell. It indicates that your function is probably trying to do more than one thing and you should split it up into smaller sub-functions.
Note
Still need more than two parameters? Perhaps your function parameters are related and should come into your function as a single data structure? Or perhaps you should refactor your code so your functions become methods of an object?
In Python, you can sometimes get away with more than two parameters, if you specify default values for the extra ones. This is better, but you should still consider if the function shouldn't be split.
Small functions with a single responsibility and few parameters are easy to write unit-tests for. We'll come back to this.
Define a __main__
function
If you're writing a command-line utility, you should create a separate function which handles the parsing of user input and initiating the logic of your utility. You can call this function main()
or anything else you think fits.
This logic should be placed in the __main__
block of your script:
Note
The condition __name__ == "__main__"
is only true if you're calling the script directly. It's not true if you include the same Python file as a library module: from my_module import main
.
The advantage of splitting the main logic into a separate main()
function is that you'll be able to use the main
function as an entry point. We'll come back to this when talking about entry_points
.
☑ Write a setup.py
file
Python has a mature and well maintained packaging utility called setuptools
. A setup.py
file is the build script for setuptools
and every Python project should have one.
Writing a basic setup.py
file is very easy, all the file has to do is to call the setup
method with appropriate information about your project.
This example comes from my gym-demo
project:
#!/usr/bin/env python
import os
from setuptools import setup
setup(
name="gym-demo",
version="0.2.1",
description="Explore OpenAI Gym environments.",
long_description=open(
os.path.join(os.path.abspath(os.path.dirname(__file__)), "README.md")
).read(),
long_description_content_type="text/markdown",
author="Michal Karzynski",
author_email="[email protected]",
packages=["gym_demo"],
install_requires=["setuptools", "docopt"],
)
The example above assumes you have a long_description
of your project in a markdown README.md
file in the same directory. If you don't, you can specify long_description
as a string.
More information can be found in the Python packaging tutorial.
Using a setup.py
file
Once you have a setup.py
file, you can use it to build your Python project's distribution packages like so:
$ pip install setuptools wheel
$ python setup.py sdist
$ python setup.py bdist_wheel
The sdist
command creates a source distribution (such as a tarball or zip file with Python source code). bdist_wheel
creates a binary Wheel distribution file which your users may download from PyPI in the future. Both distribution files will be placed in the dist
subdirectory.
Note
The bdist_wheel
command comes from the wheel
package.
During development another setup.py
command is even more useful:
$ source my_venv/bin/activate
(my_venv)$ python setup.py develop
This command installs your project inside a virtual environment named my_venv
, but it does so without copying any files. It links your source directory with your site-packages
directory by creating a link file (such as my-project.egg-link
). This is very useful, because you can work on your source code directly and test it in your virtual env without reinstalling the project after each change.
You can find out about other setup.py
commands by running:
$ python setup.py --help-commands
Note
If you're not using virtual environments you're missing out.
I would also recommend using virtualenvwrapper
tools.
Alternatively you can switch to the new pipenv
tool.
☑ Add entry_points
for your script command
If you're writing a command-line utility, you should create a console script entry point for your command. This will create an executable launcher for your script, which users can easily call at the command line.
To do this, just add an entry_points
argument to the setup()
call in your setup.py
file. For example, the following console_scripts
entry will create an executable named my-command
(or my-command.exe
on Windows) and place it in the bin
path of your environment. This means your users can just use my-command
after they install your package.
setup(
# other arguments here...
entry_points={"console_scripts": ["my-command = my_module.main:main"]},
)
my_module.main:main
specifies which function to call and where to find it. my_module.main
specifies the path to the Python file main.py
in my_module
. And :main
denotes the main()
function inside main.py
. This is the "Python path" syntax and if you know which PEP it's defined in, leave me a note in the comments. Thanks.
Note
There are other cool things entry_points
can do. You can use it to customize build commands of setup.py
and even to distribute discoverable services for other tools (such as parsers for a specific file format, etc.).
Read more about automated script creation in the setuptools
docs.
☑ Create a requirements.txt
file
You should provide your users with information about which other packages your package will require to work properly. The right place to put this information is inside setup.py
as an install_requires
list.
It's also very useful to inform your users which versions of each dependency you tested your package with. A good way to do this is to put a requirements file in your repository. The file is usually named requirements.txt
and should contain the list of your dependencies along with version numbers, for example:
Users can then install these precise versions of your dependencies by running:
$ pip install -r requirements.txt
Note
It may be useful to create a separate requirements_test.txt
file for dependencies used only during testing and development.
The easiest way to generate a requirements.txt
file is to run the pip freeze
command. Be careful with this though, as it will list all installed packages, whether they are dependencies of your package, the dependencies of these dependencies, or simply unrelated packages you installed in your environment.
☑ Set up a Git repo
It's time to put your code under source-control. Everyone is using Git these days, so let's roll with it.
Let's start by adding a Python-specific .gitignore
file to the root of your project.
$ curl https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore > .gitignore
You can now create your repo and add all files:
$ git init
$ git add --all
Verify that only files you want are being added to the repo with git status
and create your initial commit.
$ git commit -m 'Initial commit'
More sample .gitignore
files may be found in the GitHub gitignore repo.
☑ Use Black to format your code
The Python community is very lucky for many reasons and one of them is the early adoption of a common code-style guide the PEP8. This is a great blessing, because we don't have to argue which coding style is better, we don't have to define a different style for each project, in each company, etc. We have PEP8 and we should all just stick to PEP8.
To that end, Łukasz Langa crated Black - the uncompromising code formatter. You should install it, run it over your code and then re-run before every commit. Using Black is as easy as:
(my_venv) $ pip install black
(my_venv) $ black my_module
All done! ✨ 🍰 ✨
1 file reformatted, 7 files left unchanged.
You may disagree with some of the formatting decisions Black makes. I would say, that it's better to have a consistent style, rather then a prettier, but inconsistent one. Let's just all use Black and get along. ☺
☑ Set up pre-commit hooks
The best way to run Black and any other code formatters is to use pre-commit
. This is a tool which is triggered every time you git commit
and runs code-linters and formatters on any modified files.
Install pre-commit
as usual:
(my_venv) $ pip install pre-commit
You configure pre-commit
by creating a file named .pre-commit-config.yaml
in the root directory of your project. A simple configuration, which only runs black would look like this:
You can generate a sample config by calling pre-commit sample-config
.
Set up a Git pre-commit hook by calling pre-commit install
.
From now on, each time you run git commit
Black will be called to check your style. If your style is off, pre-commit
will prevent you form committing your code and black
will reformat it.
(my_venv) $ git commit
black....................................................................Failed
hookid: black
Files were modified by this hook. Additional output:
reformatted gym_demo/demo.py
All done! ✨ 🍰 ✨
1 file reformatted.
Now simply re-add the reformatted file with git add
and commit again.
☑ Code linters
Python has a great set of code linters, which can help you avoid making common mistakes and keep your style in line with PEP8 and other standard conventions. Many of these tools are maintained by the Python Code Quality Authority.
My favorite Python linting tool is Flake8, which checks for compliance with PEP8. It's base functionality can be extended by installing some of its many plugins. My favorite Flake8 plugins are listed below.
flake8
flake8-blind-except
flake8-bugbear
flake8-builtins
flake8-comprehensions
flake8-debugger
flake8-docstrings
flake8-isort
flake8-quotes
flake8-string-format
Once you install all those packages, you can simply run flake8
to check your code.
You can configure Flake8 by adding a [flake8]
configuration section to setup.cfg
, tox.ini
, or .flake8
files in your project's root directory.
[flake8]
max-line-length=88
max-complexity=6
inline-quotes=double
; ignore:
; C812 - Missing trailing comma
; C813 - Missing trailing comma in Python 3
; D104 - Missing docstring in public package
ignore=C812,C813,D104
There are other code linters you may find interesting. For example Bugbear finds common sources of bugs, while Bandit finds common security issues in Python code. You can use them both as a Flake8 plugins of course.
☑ Create a tox.ini
config
tox
is a great tool, which aims to standardize testing in Python. You can use it to setup a virtual environment for testing your project, create a package, install the package along with its dependencies and then run tests and linters. All of this is automated, so you just need to type one tox
command.
$ tox
GLOB sdist-make: my-project/setup.py
py3 create: my-project/.tox/py3
py3 installdeps: -Urrequirements.txt, -Urrequirements_test.txt
py3 inst: my-project/.tox/.tmp/package/1/my-project-0.0.1.zip
py3 installed: flake8==3.7.7,flake8-comprehensions==2.1.0,flake8-debugger==3.1.0,flake8-docstrings==1.3.0,my-project==0.0.1
py3 run-test-pre: PYTHONHASHSEED='1354964057'
py3 run-test: commands[0] | flake8 my_project
__________________________________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________________________________
py3: commands succeeded
congratulations :)
tox
is quite configurable, so you can decide which commands are executed or use your requirements.txt
by creating a tox.ini
configuration file. The following simple example runs flake8
and pytest
in a Python3 venv.
[tox]
envlist=py3
[testenv]
deps=
-Urrequirements.txt
-Urrequirements_test.txt
commands=
flake8
pytest tests/
[pytest]
timeout=300
[flake8]
max-line-length=88
max-complexity=6
inline-quotes=double
; ignore:
; C812 - Missing trailing comma
; C813 - Missing trailing comma in Python 3
; D104 - Missing docstring in public package
ignore=C812,C813,D104,D400,E203
Note
You can use tox
to easily run tests on multiple Python versions if they are installed in your system. Just extend the envlist
, e.g. envlist=py35,py36,py37
.
If you automate testing using tox
, you will be able to just run that one command in your continuous integration environment. Make sure you run tox
on every commit you want to merge.
☑ Refactor your code to be unit-testable and add tests
Using unit tests is one of the best practices you can adopt. Writing unit tests for your function gives you a chance to take one more look your code. Perhaps the function is too complex and should be simplified? Perhaps there's a bug you didn't notice before or an edge-case you didn't consider?
Writing good unit tests is an art and it takes time, but it's an investment which pays off many times over, especially on a large project which you maintain over a long period. For one, unit-tests make refactoring much easier and less scary. Also, you can learn to write your tests before you write your program (test-driven development), which is a very satisfying way to code.
I would recommend using the PyTest framework for writing your unit tests. It's easy to get started with and it's very powerful and configurable. Writing a simple unit-test is as simple as creating a test
directory with test_*.py
files. A simple test looks like this:
"""Test suite for my-project."""
import pytest
from my_project import my_function
def test_my_function():
result = my_function()
assert result == "Hello World!"
Running the tests is as simple as typing pytest
:
(my_venv) $ pip install pytest
(my_venv) $ pytest
=========================== test session starts ===========================
platform darwin -- Python 3.7.2, pytest-4.5.0, py-1.7.0, pluggy-0.11.0
rootdir: my-project, inifile: tox.ini
plugins: timeout-1.3.3, cov-2.7.1
timeout: 300.0s
timeout method: signal
timeout func_only: False
collected 255 items
tests/test_main.py ....... [100%]
======================= 7 passed in 0.35 seconds ========================
Make sure to add the pytest
command to your tox.ini
file.
☑ Add docstrings
and documentation
Writing good documentation is very important for your users. You should start by making sure each function and module are described by a docstring. The docstring should describe what the function should do in an imperative mood sentence. For example:
The parameters and return values of your functions should also be included in docstrings:
def get_columns_count_and_width(strings: List[Text]) -> Tuple[int, int]:
"""Calculate how to break a list of strings into multiple columns.
Calculate the optimal column width and number of columns
to display a list of strings on screen.
:param strings: list of strings
:return: a tuple with the number of columns and column width
"""
...
Note
Notice, that I'm also using Python3 type annotations to specify parameter and return types.
Use the flake8-docstrings
plugin to verify all your functions have a docstring.
If your project grows larger, you will probably want to create a full-fledged documentation site. You can use Sphinx or the simpler MkDocs to generate the documentation and host the site on Read the docs or GitHub Pages.
☑ Add type annotations and a MyPy verification step
Python 3.5 added the option to annotate your code with type information. This is a very useful and clean type of documentation and you should use.
For example, my_function
below takes a unicode string as an argument and returns a dict
of strings mapping to numeric or textual values.
Mypy is the static type checker for Python. If you type-annotate your code, mypy
will run through it and make sure that you're using the right parameter types when calling functions.
(my_venv) $ mypy --config-file=tox.ini my_module
my_module/main.py:43:27: error: Argument 1 to "my_function" has incompatible type "int"; expected "List[str]"
You can add a call to mypy
to your tox
configuration to verify that you're not introducing any type-related mistakes in your commits.
☑ Upload to GitHub
Alright. If you completed all the previous steps and checked all the boxes, your code is ready to be shared with the world!
Most open-source projects are hosted on GitHub, so your project should probably join them. Follow these instructions to setup a repo on GitHub and push your project there.
Note
Microsoft recently acquired GitHub, which makes some people sceptical, if this should still remain the canonical place for open-source projects online. You can consider GitLab as an alternative. So far however, Microsoft have been good stewards of GitHub.
☑ Add README and LICENSE files
The first thing people see when they visit your project's repository is the contents of the README.md
file. GitHub and GitLab do a good job of rendering Markdown-formatted text, so you can include links, tables, pictures, etc.
Make sure you have a README file and that it contains information about:
- what does your project do?
- how to use it (with examples)?
- how can people contribute code to your project?
- what's the license of the code?
- links to other relevant documentation
More tips on writing a README here.
The other critically important file you should include is LICENSE
. Without this file, no one will be able to legally use your code.
If you're not sure what license to choose, use the MIT license. It's just 160 words, read it. It's simple and permissive and lets everyone use your code however they want.
More info about choosing a license here.
☑ Add a continuous integration service
OK, now that your project is online and you prepared a tox
configuration, it's time to set up a continuous integration service. This will run your style-checking, static code analysis and unit-tests on every pull request (PR) made to your repository.
There are many CI services available for free for open-source projects. I'm partial to Travis myself, but Circle CI or AppVeyor are commonly used alternatives.
Setting up Travis CI for your repository is as simple as adding a hidden YAML configuration file named .travis.yaml
. For example, the following installs and runs tox
on your project in a Linux virtual machine:
language: python
os: linux
install:
- pip install tox
script:
- tox
git:
depth: false
branches:
only:
- "master"
cache:
directories:
- $HOME/.cache/pip
All you need to do after commiting .travis.yaml
to your repo, is to log into Travis and activate the CI service for your project.
Note
You can set up branch protection on master
to require status checks to pass before a PR can be merged.
If you'd like to run your CI on multiple versions of Python or multiple operating systems, you can set up a test matrix like so:
matrix:
include:
- os: linux
sudo: false
python: '3.6'
script: tox -e py36
- os: linux
dist: xenial
python: '3.7'
sudo: true
script: tox -e py37
Travis has fairly good documentation, which explains its many settings with configuration examples. You can test your configuration on a PR, where you modify the .travis.yaml
file. Travis will rerun its CI job on every change, so you can tweak settings to your liking.
A completed Travis run will look like this example.
☑ Add a requirements updater
Breaking changes in dependencies are a common problem in all software development. Your code was working just fine a while ago, but if you try to build it today, it fails, because some package it uses changed in an unforeseen way.
One way of working around this is to freeze all the dependency versions in your requirements.txt
files, but this just puts the problem off into the future.
The best way to deal with changing dependencies it to use a service, which periodically bumps versions in your requirements.txt
files and creates a pull request with each version change. Your automated CI can test your code against the new dependencies and let you know if you're running into problems.
Single package version changes are usually relatively easy to deal with, so you can fix your code, if needed before updating the dependency version. This allows you to painlessly keep track of the changes in all the projects you depend on.
I use the PyUp service for this. The service requires no configuration, you just need to sign up using your GitHub credentials and activate it for your repository. PyUp will detect you requirements.txt
files and start issuing PRs to keep dependencies up to date with PyPI.
Note
There are alternative services, which also do a good job of updating dependencies. GitHub recently acquired Dependabot, which works with Python and other languages and is free for all projects (not only open-source).
☑ Add test coverage checker
Python unit-testing frameworks have the ability to determine which lines and branches of code were hit when running unit tests. This coverage report is very useful, as it lets you know how much of your code is being exercised by tests and which parts are not.
If you install the pytest-cov
module, you can use the --cov
argument to pytest
to generate a coverage report.
(my_venv) $ pip install pytest pytest-cov
(my_venv) $ pytest --cov=my_module tests/
========================== test session starts ==========================
tests/test_main.py ................................... [100%]
------------ coverage: platform darwin, python 3.7.2-final-0 ------------
Name Stmts Miss Cover
--------------------------------------------
my_module/__init__.py 0 0 100%
my_module/main.py 77 17 78%
my_module/utils.py 41 0 100%
--------------------------------------------
TOTAL 118 17 86%
====================== 255 passed in 1.25 seconds =======================
If you add the --cov-report=html
argument, you can generate an HTML version of the coverage report, which you can find in the htmlcov/index.html
file after running tests.
(my_venv) $ pytest --cov=my_module --cov-report=html tests/
Track test coverage over time
Online services, such as Coveralls or Codecov can track your code coverage with every commit and on every pull request. You can decide not to accept PRs which decrease your code coverage. See an example report here.
In order to start using Coveralls, sign up using your GitHub credentials and set up tracking for your repository.
You can report your coverage using the coveralls-python
package, which provides the coveralls
command. You can test it manually by specifying the COVERALLS_REPO_TOKEN
environment variable. You can find your token by going to your repository's settings on the Coveralls site.
(my_venv) $ pip install coveralls
(my_venv) $ pytest --cov=my_module --cov-report=html tests/
(my_venv) $ COVERALLS_REPO_TOKEN=__my_repo_token__ coveralls
Submitting coverage to coveralls.io...
Coverage submitted!
Job #167.2
https://coveralls.io/jobs/49180746
When running on Travis, coveralls
will be able to detect which repository is being tested, so you don't have to (and shouldn't) put COVERALLS_REPO_TOKEN
into your tox.ini
file. Instead use the -
prefix for the command to allow it fail if you are running tox
locally.
☑ Automated code review
The best thing you can do when working as a team is to thoroughly review each other's code. You should point out any mistakes, parts of code which are difficult to understand or badly documented, or anything else which doesn't quite smell right.
If you're working alone, or would like another pair of eyes, you can set up one of the services providing automated code review. These services are still evolving and are not providing a huge value yet, but sometimes they catch something your code linters missed.
Setting up a service like Code Climate Quality or Codacy is very simple. Just set up an account using your GitHub credentials, add your repository and configure your preferences.
A report can look like this example.
☑ Publish your project on PyPI
So, now you're ready to publish your project on PyPI. This is quite a simple operation, unless your package is larger than 60MB or you selected a name, which is already taken.
Before you publish a package, create a release version. Start by bumping your version number to a higher value. Make sure you follow the semantic versioning rules and add the version number to setup.py
.
The next step is to create a release on GitHub. This will create a tag you can use to look up the code associated with a specific version of your package.
Now you'll need to set up an account on the Test version of PyPI.
Note
You should always start by uploading your package to the test version of PyPI. You should then test your package from test PyPI on multiple environments to make sure it works, before posting it on the official PyPI.
Use the following instructions to create your packages and upload them to test PyPI:
(my_venv) $ pip install twine
(my_venv) $ rm -rf dist # remove previously built packages, if any exist
(my_venv) $ python setup.py bdist_wheel
(my_venv) $ python setup.py sdist
(my_venv) $ twine check dist/*
Checking distribution dist/my-package-0.2.2-py3-none-any.whl: Passed
Checking distribution dist/my-package-0.2.2.tar.gz: Passed
(my_venv) $ twine upload --repository-url https://test.pypi.org/legacy/ dist/*
Enter your username: my_username
Enter your password:
Uploading distributions to https://test.pypi.org/legacy/
Uploading my_package-0.2.2-py3-none-any.whl
Uploading my-package-0.2.2.tar.gz
You can now visit your project page on test PyPI under the URL: https://test.pypi.org/project/my-package/
Once you test your package thoroughly, you can repeat the same steps for the official version of PyPI. Just change the upload command to:
(my_venv) $ twine upload dist/*
Congratulations, your project is now online and fully ready to be used by the community!
☑ Advertise your project
OK, you're done. Take to Twitter, Facebook, LinkedIn or wherever else your potential users and contributors may be and let them know about your project.
Congratulations and good luck!