With over 120 talks and 6 parallel tracks, its not possible to see all the best talks. The conference app allows attendees to rate each talk and I compiled these ratings to make a list of the 20 talks which were best received at the conference.
It’s impossible to pick the great talks from EuroPython without leaving some cool talks out. The list above includes the top-rated talks, which had at least 15 ratings in the conference app. Unfortunately this means that some great, but less attended talks did not make the list.
If you would like to watch more, here are a few other highly-rated talks in no particular order.
Talk | Speaker | Watch |
---|---|---|
Zen of Python Dependency Management | Justin Mayer | YouTube |
How Thinking in Python Made Me a Better Software Engineer | Johnny Dude | YouTube |
Python Performance: Past, Present and Future | Victor Stinner | YouTube |
Getting Your Data Joie De Vivre Back! | Lynn Cherny | YouTube |
Why You Should Pursue Public Speaking and How to Get There | Yenny Cheung | YouTube |
Python’s Parallel Programming Possibilities – 4 levels of concurrency | Samuel Colvin | YouTube |
Static typing: beyond the basics of def foo(x: int) –> str: | Vita Smid | YouTube |
AI in Contemporary Art | Luba Elliott | YouTube |
Is it me, or the GIL? | Christoph Heer | YouTube |
How to write a JIT compiler in 30 minutes | Antonio Cuni | YouTube |
Which talk did you like best? Leave a note in the comments below.
]]>The Python community has created a rich ecosystem of tools, which can help you during the development and upkeep of your project. Complete the steps in this checklist, and your project will be easier to maintain and you’ll be ready to take contributions from the community.
This is an opinionated article. I will run though a long list of tools and practices, I’ve had good experience with. Some of your favorite tools may be left out, some of my choices you may find unnecessary. Feel free to adapt the list to your liking and leave a comment below.
You can download a printable PDF version of the checklist.
I tried to complete the entire checklist in my small open-source project named gym-demo
. Feel free to use it as a reference and submit PRs if you find room for improvement.
If you’re going to provide a command-line utility, then you need to define a friendly command-line user interface. Your interface will be more intuitive for users if it follows the GNU conventions for command line arguments.
There are many ways to parse command line arguments, but my favorite by far is to use the docopt
module developed by Vladimir Keleshev. It allows you to define your entire interface in the form of a docstring at the beginning of your script, like so:
1 2 3 4 5 6 7 8 9 10 |
|
Later you can just call the docopt(__doc__)
command and use the argument values:
1 2 3 4 5 6 7 |
|
I usually start with Docopt by copying one of the examples and modifying it to my needs.
Python has established conventions for most things, this includes the layout of your code directory and naming of some files and directories. Follow these conventions to make your project easier to understand by other Python developers.
The basic directory structure of your project should resemble this:
package-name
βββ LICENSE
βββ README.md
βββ main_module_name
βΒ Β βββ __init__.py
βΒ Β βββ main.py
βββ tests
βΒ Β βββ test_main.py
βββ requirements.txt
βββ setup.py
The package-name
directory contains all of the sources of your package. Usually this is the root directory of your project repository, containing all other files. Choose your package name wisely and check if it’s available on PyPI, as this will be the name people will use to install your package using:
pip install package-name
The main_module_name
directory is the directory which will be copied
into your user’s site-packages
when your package is installed. You can
define more than one module if you need to, but it’s good practice to
nest them under a single module with an identifiable name.
According to the Python style guide PEP8:
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
If possible, the name of your package and the name of it’s main module should be the same. Since underscores are discouraged in package names, you can use my-project
as the package name and my_project
as the main module name.
Whether you’re writing code in Python or any other language, you should follow Clean Code principals. One of the most important ideas behind clean code it to split up your logic into short functions, each with a single responsibility.
Your functions should take zero, one or at most two arguments. If your functions have more than 2 parameters, that’s a well known code-smell. It indicates that your function is probably trying to do more than one thing and you should split it up into smaller sub-functions.
Still need more than two parameters? Perhaps your function parameters are related and should come into your function as a single data structure? Or perhaps you should refactor your code so your functions become methods of an object?
In Python, you can sometimes get away with more than two parameters, if you specify default values for the extra ones. This is better, but you should still consider if the function shouldn’t be split.
Small functions with a single responsibility and few parameters are easy to write unit-tests for. We’ll come back to this.
__main__
functionIf you’re writing a command-line utility, you should create a separate function which handles the parsing of user input and initiating the logic of your utility. You can call this function main()
or anything else you think fits.
This logic should be placed in the __main__
block of your script:
1 2 |
|
The condition __name__ == "__main__"
is only true if you’re calling the script directly. It’s not true if you include the same Python file as a library module: from my_module import main
.
The advantage of splitting the main logic into a separate main()
function is that you’ll be able to use the main
function as an entry point. We’ll come back to this when talking about entry_points
.
setup.py
filePython has a mature and well maintained packaging utility called setuptools
. A setup.py
file is the build script for setuptools
and every Python project should have one.
Writing a basic setup.py
file is very easy, all the file has to do is to call the setup
method with appropriate information about your project.
This example comes from my gym-demo
project:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
The example above assumes you have a long_description
of your project in a markdown README.md
file in the same directory. If you don’t, you can specify long_description
as a string.
More information can be found in the Python packaging tutorial.
setup.py
fileOnce you have a setup.py
file, you can use it to build your Python project’s distribution packages like so:
$ pip install setuptools wheel
$ python setup.py sdist
$ python setup.py bdist_wheel
The sdist
command creates a source distribution (such as a tarball or zip file with Python source code). bdist_wheel
creates a binary Wheel distribution file which your users may download from PyPI in the future. Both distribution files will be placed in the dist
subdirectory.
The bdist_wheel
command comes from the wheel
package.
During development another setup.py
command is even more useful:
$ source my_venv/bin/activate
(my_venv)$ python setup.py develop
This command installs your project inside a virtual environment named my_venv
, but it does so without copying any files. It links your source directory with your site-packages
directory by creating a link file (such as my-project.egg-link
). This is very useful, because you can work on your source code directly and test it in your virtual env without reinstalling the project after each change.
You can find out about other setup.py
commands by running:
$ python setup.py --help-commands
If you’re not using virtual environments you’re missing out.
I would also recommend using virtualenvwrapper
tools.
Alternatively you can switch to the new pipenv
tool.
entry_points
for your script commandIf you’re writing a command-line utility, you should create a console script entry point for your command. This will create an executable launcher for your script, which users can easily call at the command line.
To do this, just add an entry_points
argument to the setup()
call in your setup.py
file. For example, the following console_scripts
entry will create an executable named my-command
(or my-command.exe
on Windows) and place it in the bin
path of your environment. This means your users can just use my-command
after they install your package.
1 2 3 4 |
|
my_module.main:main
specifies which function to call and where to find it. my_module.main
specifies the path to the Python file main.py
in my_module
. And :main
denotes the main()
function inside main.py
. This is the “Python path” syntax and if you know which PEP it’s defined in, leave me a note in the comments. Thanks.
There are other cool things entry_points
can do. You can use it to customize build commands of setup.py
and even to distribute discoverable services for other tools (such as parsers for a specific file format, etc.).
Read more about automated script creation in the setuptools
docs.
requirements.txt
fileYou should provide your users with information about which other packages your package will require to work properly. The right place to put this information is inside setup.py
as an install_requires
list.
1 2 3 4 |
|
It’s also very useful to inform your users which versions of each dependency you tested your package with. A good way to do this is to put a requirements file in your repository. The file is usually named requirements.txt
and should contain the list of your dependencies along with version numbers, for example:
1 2 |
|
Users can then install these precise versions of your dependencies by running:
$ pip install -r requirements.txt
It may be useful to create a separate requirements_test.txt
file for dependencies used only during testing and development.
The easiest way to generate a requirements.txt
file is to run the pip freeze
command. Be careful with this though, as it will list all installed packages, whether they are dependencies of your package, the dependencies of these dependencies, or simply unrelated packages you installed in your environment.
It’s time to put your code under source-control. Everyone is using Git these days, so let’s roll with it.
Let’s start by adding a Python-specific .gitignore
file to the root of your project.
$ curl https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore > .gitignore
You can now create your repo and add all files:
$ git init
$ git add --all
Verify that only files you want are being added to the repo with git status
and create your initial commit.
$ git commit -m 'Initial commit'
More sample .gitignore
files may be found in the GitHub gitignore repo.
The Python community is very lucky for many reasons and one of them is the early adoption of a common code-style guide the PEP8. This is a great blessing, because we don’t have to argue which coding style is better, we don’t have to define a different style for each project, in each company, etc. We have PEP8 and we should all just stick to PEP8.
To that end, Εukasz Langa crated Black – the uncompromising code formatter. You should install it, run it over your code and then re-run before every commit. Using Black is as easy as:
(my_venv) $ pip install black
(my_venv) $ black my_module
All done! β¨ π° β¨
1 file reformatted, 7 files left unchanged.
You may disagree with some of the formatting decisions Black makes. I would say, that it’s better to have a consistent style, rather then a prettier, but inconsistent one. Let’s just all use Black and get along. βΊ
The best way to run Black and any other code formatters is to use pre-commit
. This is a tool which is triggered every time you git commit
and runs code-linters and formatters on any modified files.
Install pre-commit
as usual:
(my_venv) $ pip install pre-commit
You configure pre-commit
by creating a file named .pre-commit-config.yaml
in the root directory of your project. A simple configuration, which only runs black would look like this:
1 2 3 4 5 |
|
You can generate a sample config by calling pre-commit sample-config
.
Set up a Git pre-commit hook by calling pre-commit install
.
From now on, each time you run git commit
Black will be called to check your style. If your style is off, pre-commit
will prevent you form committing your code and black
will reformat it.
(my_venv) $ git commit
black....................................................................Failed
hookid: black
Files were modified by this hook. Additional output:
reformatted gym_demo/demo.py
All done! β¨ π° β¨
1 file reformatted.
Now simply re-add the reformatted file with git add
and commit again.
Python has a great set of code linters, which can help you avoid making common mistakes and keep your style in line with PEP8 and other standard conventions. Many of these tools are maintained by the Python Code Quality Authority.
My favorite Python linting tool is Flake8, which checks for compliance with PEP8. It’s base functionality can be extended by installing some of its many plugins. My favorite Flake8 plugins are listed below.
1 2 3 4 5 6 7 8 9 10 |
|
Once you install all those packages, you can simply run flake8
to check your code.
1 2 |
|
You can configure Flake8 by adding a [flake8]
configuration section to setup.cfg
, tox.ini
, or .flake8
files in your project’s root directory.
1 2 3 4 5 6 7 8 9 |
|
There are other code linters you may find interesting. For example Bugbear finds common sources of bugs, while Bandit finds common security issues in Python code. You can use them both as a Flake8 plugins of course.
tox.ini
configtox
is a great tool, which aims to standardize testing in Python. You can use it to setup a virtual environment for testing your project, create a package, install the package along with its dependencies and then run tests and linters. All of this is automated, so you just need to type one tox
command.
1 2 3 4 5 6 7 8 9 10 11 |
|
tox
is quite configurable, so you can decide which commands are executed or use your requirements.txt
by creating a tox.ini
configuration file. The following simple example runs flake8
and pytest
in a Python3 venv.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
You can use tox
to easily run tests on multiple Python versions if they are installed in your system. Just extend the envlist
, e.g. envlist=py35,py36,py37
.
If you automate testing using tox
, you will be able to just run that one command in your continuous integration environment. Make sure you run tox
on every commit you want to merge.
Using unit tests is one of the best practices you can adopt. Writing unit tests for your function gives you a chance to take one more look your code. Perhaps the function is too complex and should be simplified? Perhaps there’s a bug you didn’t notice before or an edge-case you didn’t consider?
Writing good unit tests is an art and it takes time, but it’s an investment which pays off many times over, especially on a large project which you maintain over a long period. For one, unit-tests make refactoring much easier and less scary. Also, you can learn to write your tests before you write your program (test-driven development), which is a very satisfying way to code.
I would recommend using the PyTest framework for writing your unit tests. It’s easy to get started with and it’s very powerful and configurable. Writing a simple unit-test is as simple as creating a test
directory with test_*.py
files. A simple test looks like this:
1 2 3 4 5 6 7 |
|
Running the tests is as simple as typing pytest
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Make sure to add the pytest
command to your tox.ini
file.
docstrings
and documentationWriting good documentation is very important for your users. You should start by making sure each function and module are described by a docstring. The docstring should describe what the function should do in an imperative mood sentence. For example:
1 2 3 |
|
The parameters and return values of your functions should also be included in docstrings:
1 2 3 4 5 6 7 8 9 10 |
|
Notice, that I’m also using Python3 type annotations to specify parameter and return types.
Use the flake8-docstrings
plugin to verify all your functions have a docstring.
If your project grows larger, you will probably want to create a full-fledged documentation site. You can use Sphinx or the simpler MkDocs to generate the documentation and host the site on Read the docs or GitHub Pages.
Python 3.5 added the option to annotate your code with type information. This is a very useful and clean type of documentation and you should use.
For example, my_function
below takes a unicode string as an argument and returns a dict
of strings mapping to numeric or textual values.
1 2 |
|
Mypy is the static type checker for Python. If you type-annotate your code, mypy
will run through it and make sure that you’re using the right parameter types when calling functions.
1 2 |
|
You can add a call to mypy
to your tox
configuration to verify that you’re not introducing any type-related mistakes in your commits.
Alright. If you completed all the previous steps and checked all the boxes, your code is ready to be shared with the world!
Most open-source projects are hosted on GitHub, so your project should probably join them. Follow these instructions to setup a repo on GitHub and push your project there.
Microsoft recently acquired GitHub, which makes some people sceptical, if this should still remain the canonical place for open-source projects online. You can consider GitLab as an alternative. So far however, Microsoft have been good stewards of GitHub.
The first thing people see when they visit your project’s repository is the contents of the README.md
file. GitHub and GitLab do a good job of rendering Markdown-formatted text, so you can include links, tables, pictures, etc.
Make sure you have a README file and that it contains information about:
More tips on writing a README here.
The other critically important file you should include is LICENSE
. Without this file, no one will be able to legally use your code.
If you’re not sure what license to choose, use the MIT license. It’s just 160 words, read it. It’s simple and permissive and lets everyone use your code however they want.
More info about choosing a license here.
OK, now that your project is online and you prepared a tox
configuration, it’s time to set up a continuous integration service. This will run your style-checking, static code analysis and unit-tests on every pull request (PR) made to your repository.
There are many CI services available for free for open-source projects. I’m partial to Travis myself, but Circle CI or AppVeyor are commonly used alternatives.
Setting up Travis CI for your repository is as simple as adding a hidden YAML configuration file named .travis.yaml
. For example, the following installs and runs tox
on your project in a Linux virtual machine:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
All you need to do after commiting .travis.yaml
to your repo, is to log into Travis and activate the CI service for your project.
You can set up branch protection on master
to require status checks to pass before a PR can be merged.
If you’d like to run your CI on multiple versions of Python or multiple operating systems, you can set up a test matrix like so:
1 2 3 4 5 6 7 8 9 10 11 |
|
Travis has fairly good documentation, which explains its many settings with configuration examples. You can test your configuration on a PR, where you modify the .travis.yaml
file. Travis will rerun its CI job on every change, so you can tweak settings to your liking.
A completed Travis run will look like this example.
Breaking changes in dependencies are a common problem in all software development. Your code was working just fine a while ago, but if you try to build it today, it fails, because some package it uses changed in an unforeseen way.
One way of working around this is to freeze all the dependency versions in your requirements.txt
files, but this just puts the problem off into the future.
The best way to deal with changing dependencies it to use a service, which periodically bumps versions in your requirements.txt
files and creates a pull request with each version change. Your automated CI can test your code against the new dependencies and let you know if you’re running into problems.
Single package version changes are usually relatively easy to deal with, so you can fix your code, if needed before updating the dependency version. This allows you to painlessly keep track of the changes in all the projects you depend on.
I use the PyUp service for this. The service requires no configuration, you just need to sign up using your GitHub credentials and activate it for your repository. PyUp will detect you requirements.txt
files and start issuing PRs to keep dependencies up to date with PyPI.
There are alternative services, which also do a good job of updating dependencies. GitHub recently acquired Dependabot, which works with Python and other languages and is free for all projects (not only open-source).
Python unit-testing frameworks have the ability to determine which lines and branches of code were hit when running unit tests. This coverage report is very useful, as it lets you know how much of your code is being exercised by tests and which parts are not.
If you install the pytest-cov
module, you can use the --cov
argument to pytest
to generate a coverage report.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
If you add the --cov-report=html
argument, you can generate an HTML version of the coverage report, which you can find in the htmlcov/index.html
file after running tests.
(my_venv) $ pytest --cov=my_module --cov-report=html tests/
Online services, such as Coveralls or Codecov can track your code coverage with every commit and on every pull request. You can decide not to accept PRs which decrease your code coverage. See an example report here.
In order to start using Coveralls, sign up using your GitHub credentials and set up tracking for your repository.
You can report your coverage using the coveralls-python
package, which provides the coveralls
command. You can test it manually by specifying the COVERALLS_REPO_TOKEN
environment variable. You can find your token by going to your repository’s settings on the Coveralls site.
1 2 3 4 5 6 7 |
|
When running on Travis, coveralls
will be able to detect which repository is being tested, so you don’t have to (and shouldn’t) put COVERALLS_REPO_TOKEN
into your tox.ini
file. Instead use the -
prefix for the command to allow it fail if you are running tox
locally.
1 2 3 4 |
|
The best thing you can do when working as a team is to thoroughly review each other’s code. You should point out any mistakes, parts of code which are difficult to understand or badly documented, or anything else which doesn’t quite smell right.
If you’re working alone, or would like another pair of eyes, you can set up one of the services providing automated code review. These services are still evolving and are not providing a huge value yet, but sometimes they catch something your code linters missed.
Setting up a service like Code Climate Quality or Codacy is very simple. Just set up an account using your GitHub credentials, add your repository and configure your preferences.
A report can look like this example.
So, now you’re ready to publish your project on PyPI. This is quite a simple operation, unless your package is larger than 60MB or you selected a name, which is already taken.
Before you publish a package, create a release version. Start by bumping your version number to a higher value. Make sure you follow the semantic versioning rules and add the version number to setup.py
.
The next step is to create a release on GitHub. This will create a tag you can use to look up the code associated with a specific version of your package.
Now you’ll need to set up an account on the Test version of PyPI.
You should always start by uploading your package to the test version of PyPI. You should then test your package from test PyPI on multiple environments to make sure it works, before posting it on the official PyPI.
Use the following instructions to create your packages and upload them to test PyPI:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
You can now visit your project page on test PyPI under the URL: https://test.pypi.org/project/my-package/
Once you test your package thoroughly, you can repeat the same steps for the official version of PyPI. Just change the upload command to:
(my_venv) $ twine upload dist/*
Congratulations, your project is now online and fully ready to be used by the community!
OK, you’re done. Take to Twitter, Facebook, LinkedIn or wherever else your potential users and contributors may be and let them know about your project.
Congratulations and good luck!
]]>An Airflow workflow is designed as a directed acyclic graph (DAG). That means, that when authoring a workflow, you should think how it could be divided into tasks which can be executed independently. You can then merge these tasks into a logical whole by combining them into a graph.
The shape of the graph decides the overall logic of your workflow. An Airflow DAG can include multiple branches and you can decide which of them to follow and which to skip at the time of workflow execution.
This creates a very resilient design, because each task can be retried multiple times if an error occurs. Airflow can even be stopped entirely and running workflows will resume by restarting the last unfinished task.
When designing Airflow operators, it’s important to keep in mind that they may be executed more than once. Each task should be idempotent, i.e. have the ability to be applied multiple times without producing unintended consequences.
Here is a brief overview of some terms used when designing Airflow workflows:
my_task = MyOperator(...)
.AIRFLOW_HOME
is the directory where you store your DAG definition files and Airflow plugins.When? | DAG | Task | Info about other tasks |
---|---|---|---|
During definition | DAG | Task | get_flat_relatives |
During a run | DAG Run | Task Instance | xcom_pull |
Base class | DAG |
BaseOperator |
Airflow documentation provides more information about these and other concepts.
Airflow is written in Python, so I will assume you have it installed on your machine. I’m using Python 3 (because it’s 2017, come on people!), but Airflow is supported on Python 2 as well. I will also assume that you have virtualenv installed.
$ python3 --version
Python 3.6.0
$ virtualenv --version
15.1.0
Let’s create a workspace directory for this tutorial, and inside it a Python 3 virtualenv directory:
$ cd /path/to/my/airflow/workspace
$ virtualenv -p `which python3` venv
$ source venv/bin/activate
(venv) $
Now let’s install Airflow 1.8:
(venv) $ pip install airflow==1.8.0
Now we’ll need to create the AIRFLOW_HOME
directory where your DAG definition files and Airflow plugins will be stored. Once the directory is created, set the AIRFLOW_HOME
environment variable:
(venv) $ cd /path/to/my/airflow/workspace
(venv) $ mkdir airflow_home
(venv) $ export AIRFLOW_HOME=`pwd`/airflow_home
You should now be able to run Airflow commands. Let’s try by issuing the following:
(venv) $ airflow version
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
v1.8.0rc5+apache.incubating
If the airflow version
command worked, then Airflow also created its default configuration file airflow.cfg
in AIRFLOW_HOME
:
airflow_home
βββ airflow.cfg
βββ unittests.cfg
Default configuration values stored in airflow.cfg
will be fine for this tutorial, but in case you want to tweak any Airflow settings, this is the file to change. Take a look at the docs for more information about configuring Airflow.
Next step is to issue the following command, which will create and initialize the Airflow SQLite database:
(venv) $ airflow initdb
The database will be create in airflow.db
by default.
airflow_home
βββ airflow.cfg
βββ airflow.db <- Airflow SQLite DB
βββ unittests.cfg
Using SQLite is an adequate solution for local testing and development, but it does not support concurrent access. In a production environment you will most certainly want to use a more robust database solution such as Postgres or MySQL.
Airflow’s UI is provided in the form of a Flask web application. You can start it by issuing the command:
(venv) $ airflow webserver
You can now visit the Airflow UI by navigating your browser to port 8080
on the host where Airflow was started, for example: http://localhost:8080/admin/
Airflow comes with a number of example DAGs. Note that these examples may not work until you have at least one DAG definition file in your own dags_folder
. You can hide the example DAGs by changing the load_examples
setting in airflow.cfg
.
OK, if everything is ready, let’s start writing some code. We’ll start by creating a Hello World workflow, which does nothing other then sending “Hello world!” to the log.
Create your dags_folder
, that is the directory where your DAG definition files will be stored in AIRFLOW_HOME/dags
. Inside that directory create a file named hello_world.py
.
airflow_home
βββ airflow.cfg
βββ airflow.db
βββ dags <- Your DAGs directory
βΒ Β βββ hello_world.py <- Your DAG definition file
βββ unittests.cfg
Add the following code to dags/hello_world.py
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
This file creates a simple DAG with just two operators, the DummyOperator
, which does nothing and a PythonOperator
which calls the print_hello
function when its task is executed.
In order to run your DAG, open a second terminal and start the Airflow scheduler by issuing the following commands:
$ cd /path/to/my/airflow/workspace
$ export AIRFLOW_HOME=`pwd`/airflow_home
$ source venv/bin/activate
(venv) $ airflow scheduler
The scheduler will send tasks for execution. The default Airflow settings rely on an executor named SequentialExecutor
, which is started automatically by the scheduler. In production you would probably want to use a more robust executor, such as the CeleryExecutor
.
When you reload the Airflow UI in your browser, you should see your hello_world
DAG listed in Airflow UI.
In order to start a DAG Run, first turn the workflow on (arrow 1), then click the Trigger Dag button (arrow 2) and finally, click on the Graph View (arrow 3) to see the progress of the run.
You can reload the graph view until both tasks reach the status Success. When they are done, you can click on the hello_task
and then click View Log. If everything worked as expected, the log should show a number of lines and among them something like this:
[2017-03-19 13:49:58,789] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-03-19 13:49:58,789] {base_task_runner.py:95} INFO - Subtask: Starting attempt 1 of 1
[2017-03-19 13:49:58,789] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-03-19 13:49:58,790] {base_task_runner.py:95} INFO - Subtask:
[2017-03-19 13:49:58,800] {base_task_runner.py:95} INFO - Subtask: [2017-03-19 13:49:58,800] {models.py:1342} INFO - Executing <Task(PythonOperator): hello_task> on 2017-03-19 13:49:44.775843
[2017-03-19 13:49:58,818] {base_task_runner.py:95} INFO - Subtask: [2017-03-19 13:49:58,818] {python_operator.py:81} INFO - Done. Returned value was: Hello world!
The code you should have at this stage is available in this commit on GitHub.
Let’s start writing our own Airflow operators. An Operator is an atomic block of workflow logic, which performs a single action. Operators are written as Python classes (subclasses of BaseOperator
), where the __init__
function can be used to configure settings for the task and a method named execute
is called when the task instance is executed.
Any value that the execute
method returns is saved as an Xcom message under the key return_value
. We’ll cover this topic later.
The execute
method may also raise the AirflowSkipException
from airflow.exceptions
. In such a case the task instance would transition to the Skipped status.
If another exception is raised, the task will be retried until the maximum number of retries
is reached.
Remember that since the execute
method can retry many times, it should be idempotent.
We’ll create your first operator in an Airflow plugin file named plugins/my_operators.py
. First create the airflow_home/plugins
directory, then add the my_operators.py
file with the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
In this file we are defining a new operator named MyFirstOperator
. Its execute
method is very simple, all it does is log “Hello World!” and the value of its own single parameter. The parameter is set in the __init__
function.
We are also defining an Airflow plugin named MyFirstPlugin
. By defining a plugin in a file stored in the airflow_home/plugins
directory, we’re providing Airflow the ability to pick up our plugin and all the operators it defines. We’ll be able to import these operators later using the line from airflow.operators import MyFirstOperator
.
In the docs, you can read more about Airflow plugins.
Make sure your PYTHONPATH
is set to include directories where your custom modules are stored.
Now, we’ll need to create a new DAG to test our operator. Create a dags/test_operators.py
file and fill it with the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Here we just created a simple DAG named my_test_dag
with a DummyOperator
task and another task using our new MyFirstOperator
. Notice how we pass the configuration value for my_operator_param
here during DAG definition.
At this stage your source tree will look like this:
airflow_home
βββ airflow.cfg
βββ airflow.db
βββ dags
βΒ Β βββ hello_world.py
βΒ Β βββ test_operators.py <- Second DAG definition file
βββ plugins
βΒ Β βββ my_operators.py <- Your plugin file
βββ unittests.cfg
All the code you should have at this stage is available in this commit on GitHub.
To test your new operator, you should stop (CTRL-C) and restart your Airflow web server and scheduler. Afterwards, go back to the Airflow UI, turn on the my_test_dag
DAG and trigger a run. Take a look at the logs for my_first_operator_task
.
Debugging would quickly get tedious if you had to trigger a DAG run and wait for all upstream tasks to finish before you could retry your new operator. Thankfully Airflow has the airflow test
command, which you can use to manually start a single operator in the context of a specific DAG run.
The command takes 3 arguments: the name of the dag, the name of a task and a date associated with a particular DAG Run.
(venv) $ airflow test my_test_dag my_first_operator_task 2017-03-18T18:00:00.0
You can use this command to restart you task as many times as needed, while tweaking your operator code.
If you want to test a task from a particular DAG run, you can find the needed date value in the logs of a failing task instance.
There is a cool trick you can use to debug your operator code. If you install IPython in your venv:
(venv) $ pip install ipython
You can then place IPython’s embed()
command in your code, for example in the execute
method of an operator, like so:
1 2 3 4 5 6 |
|
Now when you run the airflow test
command again:
(venv) $ airflow test my_test_dag my_first_operator_task 2017-03-18T18:00:00.0
the task will run, but execution will stop and you will be dropped into an IPython shell, from which you can explore the place in the code where you placed embed()
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
You could of course also drop into Python’s interactive debugger pdb
(import pdb; pdb.set_trace()
) or the IPython enhanced version ipdb
(import ipdb; ipdb.set_trace()
). Alternatively, you can also use an airflow test
based run configuration to set breakpoints in IDEs such as PyCharm.
Code is in this commit on GitHub.
An Airflow Sensor is a special type of Operator, typically used to monitor a long running task on another system.
To create a Sensor, we define a subclass of BaseSensorOperator
and override its poke
function. The poke
function will be called over and over every poke_interval
seconds until one of the following happens:
poke
returns True
– if it returns False
it will be called again.poke
raises an AirflowSkipException
from airflow.exceptions
– the Sensor task instance’s status will be set to Skipped.poke
raises another exception, in which case it will be retried until the maximum number of retries
is reached.There are many predefined sensors, which can be found in Airflow’s codebase:
To add a new Sensor to your my_operators.py
file, add the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Here we created a very simple sensor, which will wait until the the current minute is a number divisible by 3. When this happens, the sensor’s condition will be satisfied and it will exit. This is a contrived example, in a real case you would probably check something more unpredictable than just the time.
Remember to also change the plugin class, to add the new sensor to the operators
it exports:
1 2 3 |
|
You can now place the operator in your DAG:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Restart your webserver and scheduler and try out your new workflow.
If you click View log of the my_sensor_task
task, you should see something similar to this:
[2017-03-19 14:13:28,719] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-03-19 14:13:28,719] {base_task_runner.py:95} INFO - Subtask: Starting attempt 1 of 1
[2017-03-19 14:13:28,720] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-03-19 14:13:28,720] {base_task_runner.py:95} INFO - Subtask:
[2017-03-19 14:13:28,728] {base_task_runner.py:95} INFO - Subtask: [2017-03-19 14:13:28,728] {models.py:1342} INFO - Executing <Task(MyFirstSensor): my_sensor_task> on 2017-03-19 14:13:05.651721
[2017-03-19 14:13:28,743] {base_task_runner.py:95} INFO - Subtask: [2017-03-19 14:13:28,743] {my_operators.py:34} INFO - Current minute (13) not is divisible by 3, sensor will retry.
[2017-03-19 14:13:58,747] {base_task_runner.py:95} INFO - Subtask: [2017-03-19 14:13:58,747] {my_operators.py:34} INFO - Current minute (13) not is divisible by 3, sensor will retry.
[2017-03-19 14:14:28,750] {base_task_runner.py:95} INFO - Subtask: [2017-03-19 14:14:28,750] {my_operators.py:34} INFO - Current minute (14) not is divisible by 3, sensor will retry.
[2017-03-19 14:14:58,752] {base_task_runner.py:95} INFO - Subtask: [2017-03-19 14:14:58,752] {my_operators.py:34} INFO - Current minute (14) not is divisible by 3, sensor will retry.
[2017-03-19 14:15:28,756] {base_task_runner.py:95} INFO - Subtask: [2017-03-19 14:15:28,756] {my_operators.py:37} INFO - Current minute (15) is divisible by 3, sensor finishing.
[2017-03-19 14:15:28,757] {base_task_runner.py:95} INFO - Subtask: [2017-03-19 14:15:28,756] {sensors.py:83} INFO - Success criteria met. Exiting.
Code is in this commit on GitHub.
In most workflow scenarios downstream tasks will have to use some information from an upstream task. Since each task instance will run in a different process, perhaps on a different machine, Airflow provides a communication mechanism called Xcom for this purpose.
Each task instance can store some information in Xcom using the xcom_push
function and another task instance can retrieve this information using xcom_pull
. The information passed using Xcoms will be pickled and stored in the Airflow database (xcom
table), so it’s better to save only small bits of information, rather then large objects.
Let’s enhance our Sensor, so that it saves a value to Xcom. We’re using the xcom_push()
function which takes two arguments – a key under which the value will be saved and the value itself.
1 2 3 4 5 6 7 8 9 |
|
Now in our operator, which is downstream from the sensor in our DAG, we can use this value, by retrieving it from Xcom. Here we’re using the xcom_pull()
function providing it with two arguments – the task ID of the task instance which stored the value and the key
under which the value was stored.
1 2 3 4 5 6 7 8 9 |
|
Final version of the code is in this commit on GitHub.
If you trigger a DAG run now and look in the operator’s logs, you will see that it was able to display the value created by the upstream sensor.
In the docs, you can read more about Airflow XComs.
I hope you found this brief introduction to Airflow useful. Have fun developing your own workflows and data processing pipelines!
]]>Flask is a web micro-framework written in Python. Since it’s a micro-framework, Flask does very little by itself. In contrast to a framework like Django, which takes the “batteries included” approach, Flask does not come with an ORM, serializers, user management or built-in internationalization. All these features and many others are available as Flask extensions, which make up a rich, but loosely coupled ecosystem.
The challenge, then, for an aspiring Flask developer lies in picking the right extensions and combining them together to get just the right set of functions. In this article we will describe how to use the Flask-RESTPlus extension to create a Flask-based RESTful JSON API.
Flask-RESTPlus aims to make building REST APIs quick and easy. It provides just enough syntactic sugar to make your code readable and easy to maintain. Its killer feature is the ability to automatically generate interactive documentation for your API using Swagger UI.
Swagger UI is part of a suite of technologies for documenting RESTful web services. Swagger has evolved into the OpenAPI specification, currently curated by the Linux Foundation. Once you have an OpenAPI description of your web service, you can use software tools to generate documentation or even boilerplate code (client or server) in a variety of languages. Take a look at swagger.io for more information.
Swagger UI is a great tool for describing and visualizing RESTful web services. It generates a small webpage, which documents your API and allows you to make test queries using JavaScript. Click here to see a small demo.
In this article we’ll describe how to use Flask and Flask-RESTPlus to create a RESTful API which comes equipped with Swagger UI.
To show off the features of Flask-RESTPlus I prepared a small demo application. It’s a part of an API for a blogging platform, which allows you to manage blog posts and categories.
Let’s start by downloading and running this demo on your system, then we’ll walk through the code.
You will need to have Python with Virtualenv and Git installed on your machine.
I would recommend using Python 3, but Python 2 should work just fine.
To download and start the demo application issue the following commands. First clone the application code into any directory on your disk:
$ cd /path/to/my/workspace/
$ git clone https://github.com/postrational/rest_api_demo
$ cd rest_api_demo
Create a virtual Python environment in a directory named venv
, activate the virtualenv and install required dependencies using pip
:
$ virtualenv -p `which python3` venv
$ source venv/bin/activate
(venv) $ pip install -r requirements.txt
Now let’s set up the app for development and start it:
(venv) $ python setup.py develop
(venv) $ python rest_api_demo/app.py
OK, everything should be ready. In your browser, open the URL http://localhost:8888/api/
You should be greeted with a page similar to the following.
Flask and Flask-RESTPlus make it very easy to get started. Minimal code required to create a working API is just 10 lines long.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
To make the code more maintainable, in our demo application we separate the app definition, API methods and other types of code into separate files. The following directory tree shows where each part of the logic is located.
βββ api #
βΒ Β βββ blog # Blog-related API directory
βΒ Β βΒ Β βββ business.py #
βΒ Β βΒ Β βββ endpoints # API namespaces and REST methods
βΒ Β βΒ Β βΒ Β βββ categories.py #
βΒ Β βΒ Β βΒ Β βββ posts.py #
βΒ Β βΒ Β βββ parsers.py # Argument parsers
βΒ Β βΒ Β βββ serializers.py # Output serializers
βΒ Β βββ restplus.py # API bootstrap file
βββ app.py # Application bootstrap file
βββ database #
βΒ Β βββ models.py # Definition of SQLAlchemy models
βββ db.sqlite #
βββ settings.py # Global app settings
The definition of the RESTPlus API is stored in the file rest_api_demo/api/restplus.py
, while the logic for configuring and starting the Flask app is stored in rest_api_demo/app.py
.
Take a look into the app.py
file and the initialize_app
function.
1 2 3 4 5 6 7 8 9 10 |
|
This function does a number of things, but in particular it sets up a Flask blueprint, which will host the API under the /api
URL prefix. This allows you to separate the API part of your application from other parts. Your app’s frontend could be hosted in the same Flask application but under a different blueprint
(perhaps with the /
URL prefix).
The RESTPlus API itself is also split into a number of separate namespaces. Each namespace has its own URL prefix and is stored in a separate file in the /api/blog/endpoints
directory. In order to add these namespaces to the API, we need to use the api.add_namespace()
function.
initialize_app
also sets configuration values loaded from settings.py
and configures the app to use a database through the magic of Flask-SQLAlchemy.
Your API will be organized using API namespaces, RESTful resources and HTTP methods. Namespaces, as described above, allow your API definitions to be split into multiple files, each defining a part of the API with a different URL prefix.
RESTful resources are used to organize the API into endpoints corresponding to different types of data used by your application. Each endpoint is called using a different HTTP method. Each method issues a different command to the API. For example, GET
is used to fetch a resource from the API, PUT
is used to update its information, DELETE
to delete it.
GET /blog/categories/1
– Retrieve category with ID 1PUT /blog/categories/1
– Update the category with ID 1DELTE /blog/categories/1
– Delete the category with ID 1Resources usually have an associated collection endpoint, which can be used to create new resources (POST
) or fetch lists (GET
).
GET /blog/categories
– Retrieve a list of categoriesPOST /blog/categories
– Create a new categoryUsing Flask-RESTPlus you can define an API for all of the endpoints listed above with the following block of code. We start by creating a namespace, we create a collection, a resource and associated HTTP methods.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
The api.namespace()
function creates a new namespace with a URL prefix. The description
field will be used in the Swagger UI to describe this set of methods.
The @ns.route()
decorator is used to specify which URLs will be associated with a given resource. You can specify path parameters using angle brackets, such as in @ns.route('/<int:id>')
.
You can optionally specify the type of parameter using the the name of a converter and colon. Available converters are string:
(default), path:
(string with slashes), int:
, float:
and uuid:
.
URL converters come from the Werkzeug library on which Flask is based. You can read more about them in Werkzeug docs. Unfortunately not all Werkzeug converter options are currently supported by Flask-RESTPlus. Additional types can be added using Flask’s url_map
option.
Each resource is a class which contains functions which will be mapped to HTTP methods. The following functions are mapped: get
, post
, put
, delete
, patch
, options
and head
.
If a docstring is present in any function, it will be displayed in the Swagger UI as “Implementation Notes”. You can use Markdown syntax to format these notes.
You can use the @api.response()
decorator to list what HTTP status codes each method is expected to return and what the status code means.
Once all this code is in place, your method will be nicely documented in the Swagger UI.
Swagger UI documentation also includes a form in which parameters can be set. If a request body is expected, its format will be specified on the right.
If you hit the Try it out!
button, your request will be sent to the API and the response will be displayed on screen.
We already mentioned path parameters above, but you can also document parameters in the request query (after the ?
in the URL), the headers or in the form submitted in the request body.
In order to define these parameters, we use an object called the RequestParser
. The parser has a function named add_argument()
, which allows us to specify what the parameter is named and what its allowed values are.
1 2 3 4 5 6 |
|
Once defined, we can use the @api.expect
decorator to attach the parser to a method.
1 2 3 4 5 6 |
|
Once the method is decorated with an argument parser, the method’s Swagger UI will display a form to specify argument values.
The argument parser serves another function, it can validate the argument values. If a value fails validation, the API will return an HTTP 400
error with an appropriate message.
1 2 3 4 5 6 |
|
You can enable or disable argument validation for each method using the validate
argument in the @api.expect
method. You can also enable validation globally by using the RESTPLUS_VALIDATE
configuration variable when bootstrapping your Flask application.
1
|
|
In the demo application we enable validation globally in the app.py
file.
To specify the argument’s type use the type
keyword. Allowed values are int
, str
and bool
.
You can specify arguments to be present in the query of your method, but also in the headers or request body using the location
keyword.
1 2 3 |
|
To create an argument which accepts multiple values, use the action
keyword and specify 'append'
as value:
1
|
|
To specify a list of valid argument values, use the choices
keyword and provide an iterator as value.
1
|
|
Read more about RequestParser in the Flask-RESTPlus docs.
If you want to update or create a new resource in a RESTful collection, you should send the item’s data serialized as JSON in the body of a request. Flask-RESTPlus allows you to automatically document and validate the format of incoming JSON objects by using API models.
A RESTPlus API model defines the format of an object by listing all expected fields. Each field has an associated type (e.g. String
, Integer
, DateTime
), which determines what values will be considered valid.
The demo app has a number of API models in the serializers.py
file. A simple example would look something like this:
1 2 3 4 5 6 7 8 9 10 |
|
Once the model is defined you can attach it to a method using the @api.expect()
decorator.
1 2 3 4 5 6 7 |
|
All fields share some common options which can change their behavior:
required
– is the field requireddefault
– default value for the fielddescription
– field description (will appear in Swagger UI)example
– optional example value (will appear in Swagger UI)Additional validation options can be added to fields to make them more specific:
String
:
min_length
and max_length
– minimum and maximum length of a stringpattern
– a regular expression, which the sting must match1
|
|
Numbers (Integer
, Float
, Fixed
, Arbitrary
):
min
and max
– minimum and maximum valuesexclusiveMin
and exclusiveMax
– as above, but the boundary values are not validmultiple
– number must be a multiple of this valueYou can learn more about RESTPlus model fields, by looking at their source code.
A field of an API model may use another model as its expected value. You would then provide a JSON object as a valid value for this field.
1
|
|
A field may also require a list of values, or even a list of nested objects.
1 2 |
|
If you have two similar models, you may use model inheritance to extend the definition of a model with additional fields. In the example below we have one generic API model called pagination
and we create a more specific model page_of_blog_posts
by using the api.inherit()
method.
1 2 3 4 5 6 7 8 9 10 |
|
API models can also be used as serializers. If you decorate a method with @api.marshal_with(model)
, Flask-RESTPlus will generate a JSON object with the same fields as are specified in the model
.
The method just has to return an object which has attributes with the same names as the fields. Alternatively, the method could return a dictionary with values assigned to the same keys as the names of model fields.
For example, your method can return an SQLAlchemy ORM object which has the same fields as your API model.
1 2 3 4 5 6 7 8 9 10 |
|
If you want to return a list of objects, use the @api.marshal_list_with(model)
decorator.
The attribute
keyword allows you specify which object attribute the field value should be taken from:
1
|
|
Using the attribute
parameter you can pull out a value nested deeper in the object’s structure:
1
|
|
In more complex cases you can use a lambda
function to query for the value:
1
|
|
When writing your API endpoint functions you may find yourself handling a request that cannot be fulfilled. In such cases your only recourse it to return an error message to the user. You can use the api.abort()
function to do so.
1
|
|
In cases where you don’t explicitly handle the error yourself, Flask will catch the exception and turn it into an HTTP 500
error page.
You can override the default error handler using the @api.errorhandler
decorator.
1 2 3 4 5 6 7 |
|
You can specify custom error handling logic for different types of exceptions.
1 2 3 4 |
|
The default_error_handler
function as written above will not return any response if the Flask application is running in DEBUG
mode. Instead of returning an error message, this will activate the Werkzeug interactive debugger.
If you delete the db.sqlite
file or simply want to reset your database to an empty state, you can enter the following commands in your Python console.
1 2 3 4 5 6 |
|
There are a lot of resources on the net which can guide you to full Flask enlightenment. I would recommend getting to know the following:
So why is this Docker thing so popular these days? The basic answer is that it makes your applications portable. Django applications can make use of Python’s virtualenv
to create isolated environments, but if some of your apps use Python 2 and some use Python 3, you may need to install a whole slew of additional libraries on your host server.
With Docker on the other hand, each container includes a specific version of the Linux kernel and all other dependencies of your app. You install dependencies such as libjpeg
in the container, not on the host, so if your apps need different version of this or any other system libraries, you won’t run into problems.
You can find more information about Docker in its FAQ pages.
The following procedure was tested on systems running Debian 7 and Ubuntu 14.04LTS. Everything should also work other Debian-based distributions. If you’re using an RPM-based distro (such as CentOS), you will need to replace the apt-get
commands by their yum
counterparts and if you’re using FreeBSD you can install the components from ports. If you don’t have a server to play with, I can recommend the inexpensive VPS servers offered by Digital Ocean.
A Docker package is available in the Debian 8 repositories, so installing is as simple as:
$ sudo apt-get install docker.io
On Debian 7 and Ubuntu, you will have to run an installation script provided by Docker:
$ sudo apt-get install wget
$ wget -qO- https://get.docker.com/ | sh
If you want to be able to run Docker containers as your user, not only as root
, you should add yourself to the group called docker
using the following command.
$ sudo usermod -aG docker `whoami`
Remember to log out and back in to pick up your new groups.
Once Docker is installed, you can test it by running the following command. This will take a few minutes, so be patient.
$ docker run -i -t ubuntu:14.04 /bin/bash
Unable to find image 'ubuntu:14.04' locally
14.04: Pulling from ubuntu
511136ea3c5a: Pull complete
f3c84ac3a053: Pull complete
a1a958a24818: Pull complete
9fec74352904: Pull complete
d0955f21bf24: Already exists
Digest: sha256:2a214fd5c1c2048ef34bb79b5411efe4aa1e082b53ac1de3191992fe3ec64395
Status: Downloaded newer image for ubuntu:14.04
root@a8fd0ab40b7e:/#
The above command actually downloaded (pulled) a docker container with Ubuntu 14.04 from the official Docker base images repository. After the image was downloaded, Docker fired up the container and started bash
inside.
Feel free to look around. It looks like a normal Ubuntu system, except that only your bash
process is running here:
root@a8fd0ab40b7e:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 18160 1984 ? Ss 13:07 0:00 /bin/bash
root 23 0.0 0.0 15560 1148 ? R+ 13:10 0:00 ps aux
Type exit
or hit Ctrl-D
to exit bash
and stop the container.
You can list all running Docker containers using the docker ps
command. If you add the --all
switch you will also list containers, which were running previously, but are currently closed.
$ docker ps --all
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a8fd0ab40b7e ubuntu:14.04 "/bin/bash" 6 minutes ago Exited (0) sad_galileo
Let’s proceed to create the Docker image for our Django application.
We will need a directory to work in and a copy of our application’s source. Let’s create a directory called dockyard
, where we will be making our containers and a subdirectory for the Docker image we are creating.
$ mkdir -p dockyard/hello_django_docker
$ cd dockyard/hello_django_docker
For the purposes of this article I uploaded a sample Django app to Github, so we can grab a copy of the source code using the following command:
$ git clone https://github.com/postrational/hello_django.git
We should now have a working directory containing the source code of our Django application. Please note that I assume that a PIP-compatible requirements.txt
file and the Django manage.py
script are in the main source code directory named hello_django
. The project’s settings.py
file is located in the subdirectory hello
.
~/dockyard/hello_django_docker # Our working directory
`-- hello_django # Main project source directroy (from repo)
|-- hello
| |-- __init__.py
| |-- settings.py # Project settings
| |-- urls.py
| `-- wsgi.py # Project's WSGI start script
|-- manage.py # Django's management command
|-- project_application_1
| `-- (more files...)
|-- project_application_2
| `-- (more files...)
`-- requirements.txt # File generated using pip freeze
With the code in place, let’s proceed to create Docker-related files.
A Docker container can use an script as the default command which will be fired when the container is run. In our case we will use the following script as the entry point.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
The above entry point script does a few things:
/srv/logs/
and to run the tail -f
command which will output the logs to the consolegunicorn
which will serve our Django application"$@"
notation in the last line will allow you to pass additional arguments to gunicorn
when you start the container.Save the script as docker-entrypoint.sh
and change its permissions to make it executable:
$ chmod u+x docker-entrypoint.sh
We will now make the container definition file named Dockerfile
. Create the file and give it the following content.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
Dockerfile
specifies the steps needed to create our container image:
ubuntu:14.04
and all subsequent commands will be executed inside of a running container with this base OS.ENV
commands. These variables can be used later in the Dockerfile
, but will also be available in the environment of all programs executed in the container. For this reason we use a prefix (DOCKYARD
), so that our variables don’t accidentally override anything else.apt-get
to install any system tools and libraries we may need./srv/
directory of our container. Using the VOLUME
command we make some of these directories available to other containers. This will come in handy later, see “Backing up user media files”.COPY
command to copy the source code of our app into the container.requirements.txt
file from the source code to install Python dependencies.EXPOSE
command to make the Gunicorn port (8000) accessible outside of our container.docker-entrypoint.sh
and define it as the script which should execute when the container is started.You may want to refer to the Dockerfile documentation for more information about the available commands.
All the pieces are now in place. Let’s just review to make sure the files we created are in the right spot:
~/dockyard/hello_django_docker
|-- docker-entrypoint.sh # We added the executable entry script here
|-- Dockerfile # And the Dockerfile here
`-- hello_django
|-- hello
| |-- __init__.py
| |-- settings.py
| |-- urls.py
| `-- wsgi.py
|-- manage.py
|-- project_application_1
|-- project_application_2
`-- requirements.txt
We can now build the Docker container image. I will call the image michal/hello_django
.
Docker image names follow the convention of user-name/image-name
. When you upload your image to a repository it will be added to your user account based on the name.
$ docker build -t michal/hello_django ~/dockyard/hello_django_docker
Sending build context to Docker daemon 80.38 kB
Sending build context to Docker daemon
Step 0 : FROM ubuntu:14.04
(...)
Successfully built 03c7aeb70a09
You will see output of many commands as the container is put together. At the end you should be able to see you newly created image when running the docker images
command:
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
michal/hello_django latest 17a441b8bdbd 2 seconds ago 394.6 MB
Now that the container image is created, we can use it to start a container.
$ docker run --publish=8001:8000 michal/hello_django:latest
This command starts a new container from the michal/hello_django
image.
It also makes the container’s port 8000, which is the default Gunicorn port available on port 8001 of the Docker host. Reassigning ports in this way allows you to have multiple Django applications running in different containers. You just need to assign port 8000 of each container to a different port on the Docker host.
Once the container is started in this way, you should be able to navigate to port 8001 of the Docker host and see the famous Django start page declaring that “It worked!”.
Visit your docker host in a browser (use the IP or domain of you machine): http://docker.host:8001
You can stop the container by hitting Ctrl-C in the terminal.
Starting and stopping a container as we did above is useful for debugging, but in most other cases you will want to start the container without attaching it to a terminal session.
Use the --detach=true
argument when starting the container.
$ docker run --name=hello_django \
--detach=true \
--restart=always \
--publish=8001:8000 \
michal/hello_django:latest
81512acac0e4875a218587737ea31ce09aae746e4a8248461e16ab601bb1b0aa
Note, that we specified the name of the container (--name=hello_django
). We also specified that the container should always be restarted if the process inside stops or crashes (--restart=always
).
You can now check that the container is running by listing all running Docker containers using the docker ps
command.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2c7cbc6fd5b8 michal/hello_django:latest "/docker-entrypoint. 3 seconds ago Up 3 seconds 0.0.0.0:8001->8000/tcp hello_django
You can also follow the logs which are being output by the processes running in the container using the docker logs
command.
$ docker logs -f hello_django
You can stop and restart the container:
$ docker stop hello_django
$ docker start hello_django
$ docker restart hello_django
And you can delete the container when you’re done with it.
$ docker stop hello_django
$ docker rm hello_django
Docker will pass any arguments specified after the name of the image to the command which starts the container. In our case those arguments will be passed to the docker-entrypoint.sh
script, which in turn will pass them to the gunicorn
command which it starts.
If we want to change the number of Gunicorn worker processes running in the container, we just need to add the --workers
argument to the end of docker run
.
$ docker run \
michal/hello_django:latest \
--workers 5
As we noted earlier, the VOLUME
command in the Dockerfile made some directories accessible from other containers. If you would like to make a backup up of files stored in the /srv/media/
directory, you can start another container using the --volumes-from
argument. The volume directories will be accessible in the newly stared container.
$ docker run --rm -i -t --volumes-from=hello_django ubuntu:14.04 /bin/bash
root@ed95f0967489:/# cd /srv/media/
root@d0198a264b3a:/srv/media# apt-get install -y ssh-client
root@d0198a264b3a:/srv/media# scp -r * user@remote-host:~/path/to/backup
You can find more information about managing volumes in containers in the docs.
You can also mount a Docker host machine directory as a volume inside the container, using the --volume
argument.
$ sudo mkdir -p /var/log/webapps/hello
$ docker run --name=hello_django \
--detach=true \
--restart=always \
--publish=8001:8000 \
--volume=/var/log/webapps/hello:/srv/logs \
michal/hello_django:latest \
--workers 5
$ tail -f /var/log/webapps/hello/*.log
You can find more information about managing volumes in containers in the docs.
In many cases you will want to use different settings when running your Django application in development, during testing and in production.
In order to do this, create a new settings file on the Docker host. Let’s assume you save it in the directory /etc/webapps/hello_django/local_settings.py
. In the file import all of your project’s default settings and override only what’s needed.
1 2 3 4 |
|
You can then use the --volume
argument to mount the single file local_settings.py
into your container and use the --env
argument to set the DJANGO_SETTINGS_MODULE
environment variable to the new settings module.
$ docker run --name=hello_django \
--detach=true \
--restart=always \
--publish=8001:8000 \
--env="DJANGO_SETTINGS_MODULE=hello.local_settings" \
--volume=/etc/webapps/hello_django/local_settings.py:/srv/hello_django/hello/local_settings.py \
michal/hello_django:latest \
--workers 5
You will probably want to run some Django management commands on the Docker host. In order to do this you can start another container from the same image and specify another command which will be started instead of the entrypoint script. If you use /bin/bash
, you will arrive at a shell in the container. From here you can execute Django’s manage.py
commands.
$ docker run --rm -i -t --entrypoint=/bin/bash michal/hello_django:latest
root@ac0073a6bb9c:/srv/hello_django# ./manage.py createsuperuser
Username (leave blank to use 'root'): michal
Email address: michal@docker.image
Password:
Password (again):
Superuser created successfully.
Thanks for reading. If you find any issues with this article of have any other ideas, leave a comment below.
]]>Celery uses a broker to pass messages between your application and Celery worker processes. In this article we will set up Redis as the message broker. You should note that persistence is not the main goal of this data store, so your queue could be erased in the event of a power failure or other crash. Keep this in mind and don’t use the job queue to store application state. If you need your queue to be have persistence, use another message broker such as RabbitMQ.
In this article we will add Celery to a Django application running in a Python virtualenv. I will assume that the virtual environment is located in the directory /webapps/hello_django/
and that the application is up an running. You can follow steps in my previous article to set up Django in virtualenv running on Nginx and Gunicorn.
This article was tested on a server running Debian 7, so everything should also work on an Ubuntu server or other Debian-based distribution. If you’re using an RPM-based distro (such as CentOS), you will need to replace the aptitude
commands by their yum
counterparts and if you’re using FreeBSD you can install the components from ports. If you don’t have a server to play with, I can recommend the inexpensive VPS servers offered by Digital Ocean.
Let’s get started by making sure your system is up to date.
$ sudo aptitude update
$ sudo aptitude upgrade
The first piece of software we’ll install is Redis.
$ sudo aptitude install redis-server
$ redis-server --version
Redis server version 2.4.14 (00000000:0)
Check if Redis is up and accepting connections:
$ redis-cli ping
PONG
Let’s add Celery to your application’s virtual Python environment.
First we’ll switch to the application user and activate the virtualenv
$ sudo su - hello
hello@django:~$ source bin/activate
Now we can use pip
to install Celery along with its Redis bindings and dependencies:
(hello_django)hello@django:~$ pip install celery[redis]
Downloading/unpacking celery[redis]
(...)
Successfully installed celery pytz billiard kombu redis anyjson amqp
Cleaning up...
In order to use Celery as part of your Django application you’ll need to create a few files and tweak some settings. Let’s start by adding Celery-related configuration variables to settings.py
1 2 3 4 5 |
|
Now we’ll create a file named celery.py
, which will instantiate Celery, creating a so called Celery application. You can find more information about available Celery application settings in the documentation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
In order to instantiate the Celery app every time our Django application is started, we can add the following lines to the __init__.py
file in our Django proj.proj
module. This will make sure that celery task use this app.
1 2 |
|
We will now add an app called testapp
to our Django project and add some tasks to this app. Let’s start by creating the app:
(hello_django)hello@django:~/hello$ python manage.py startapp testapp
Make sure that the app is added to INSTALLED_APPS
in settings.py
1 2 3 4 |
|
Create a file called tasks.py
in your apps’s directory and add the code of your first Celery task to the file.
1 2 3 4 5 6 7 |
|
Find more information about writing task functions in the docs.
If you created all files as outlined above, you should see the following directory structure:
/webapps/hello_django/hello
βββ hello
βΒ Β βββ celery.py # The Celery app file
βΒ Β βββ __init__.py # The project module file we modified
βΒ Β βββ settings.py # Settings go here, obviously :)
βΒ Β βββ urls.py
βΒ Β βββ wsgi.py
βββ manage.py
βββ testapp
βββ __init__.py
βββ models.py
βββ tasks.py # File containing tasks for this app
βββ tests.py
βββ views.py
You can find a complete sample Django project on Celery’s GitHub.
In production we will want Celery workers to be daemonized, but let’s just quickly start the workers to check that everything is configured correctly. Use the celery
command located in your virtualenv’s bin
directory to start the workers. Make sure that the module path hello.celery:app
is available on your PYTHONPATH
.
It’s important to understand how Celery names tasks which it discovers and how these names are related to Python module paths. If you run into NotRegistered
or ImportError
exceptions make sure that your apps and tasks are imported in a consistent manner and your PYTHONPATH
is set correctly.
$ export PYTHONPATH=/webapps/hello_django/hello:$PYTHONPATH
$ /webapps/hello_django/bin/celery --app=hello.celery:app worker --loglevel=INFO
-------------- celery@django v3.1.11 (Cipater)
---- **** -----
--- * *** * -- Linux-3.2.0-4-amd64-x86_64-with-debian-7.5
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: hello_django:0x15ae410
- ** ---------- .> transport: redis://localhost:6379/0
- ** ---------- .> results: disabled
- *** --- * --- .> concurrency: 2 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> celery exchange=celery(direct) key=celery
[tasks]
. testapp.tasks.test
[2014-05-20 13:53:59,740: INFO/MainProcess] Connected to redis://localhost:6379/0
[2014-05-20 13:53:59,748: INFO/MainProcess] mingle: searching for neighbors
[2014-05-20 13:54:00,756: INFO/MainProcess] mingle: all alone
[2014-05-20 13:54:00,769: WARNING/MainProcess] celery@django ready.
If everything worked, you should see a splash screen similar to the above and the [tasks]
section should list tasks discovered in all the apps of your project.
[tasks]
. testapp.tasks.test
In another terminal, activate the virtualenv and start a task from your project’s shell.
$ sudo su - hello
hello@django:~$ source bin/activate
(hello_django)hello@django:~$ cd hello/
(hello_django)hello@django:~/hello$ python manage.py shell
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from testapp.tasks import test
>>> test.delay('This is just a test!')
<AsyncResult: 79e35cf7-0a3d-4786-b746-2d3dd45a5c16>
You should see messages appear in the terminal where Celery workers are started:
[2014-05-18 11:43:24,801: INFO/MainProcess] Received task: testapp.tasks.test[79e35cf7-0a3d-4786-b746-2d3dd45a5c16]
[2014-05-18 11:43:24,804: INFO/MainProcess] Task testapp.tasks.test[79e35cf7-0a3d-4786-b746-2d3dd45a5c16] succeeded in 0.00183034200018s: u'The test task executed with argument "This is just a test!" '
You can find more information about calling Celery tasks in the docs.
In production we can use supervisord to start Celery workers and make sure they are restarted in case of a system reboot or crash. Installation of Supervisor is simple:
$ sudo aptitude install supervisor
When Supervisor is installed you can give it programs to start and watch by creating configuration files in the /etc/supervisor/conf.d
directory. For our hello-celery
worker we’ll create a file named /etc/supervisor/conf.d/hello-celery.conf
with this content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
This configuration is based on a sample config provided by the makers of Celery. You can set many other options.
Create a file to store your application’s log messages:
hello@django:~$ mkdir -p /webapps/hello_django/logs/
hello@django:~$ touch /webapps/hello_django/logs/celery-worker.log
After you save the configuration file for your program you can ask supervisor to reread configuration files and update (which will start your the newly registered app).
$ sudo supervisorctl reread
hello-celery: available
$ sudo supervisorctl update
hello-celery: added process group
You can now monitor output of Celery workers by following the celery-worker.log
file:
$ tail -f /webapps/hello_django/logs/celery-worker.log
You can also check the status of Celery or start, stop or restart it using supervisor.
$ sudo supervisorctl status hello
hello RUNNING pid 18020, uptime 0:00:50
$ sudo supervisorctl stop hello
hello: stopped
$ sudo supervisorctl start hello
hello: started
$ sudo supervisorctl restart hello
hello: stopped
hello: started
Celery workers should now be automatically started after a system reboot and automatically restarted if they ever crashed for some reason.
You can check that Celery is running by issuing the celery status
command:
$ export PYTHONPATH=/webapps/hello_django/hello:$PYTHONPATH
$ /webapps/hello_django/bin/celery --app=hello.celery:app status
celery@django: OK
1 node online.
You can also inspect the queue using a friendly curses monitor:
$ export PYTHONPATH=/webapps/hello_django/hello:$PYTHONPATH
$ /webapps/hello_django/bin/celery --app=hello.celery:app control enable_events
$ /webapps/hello_django/bin/celery --app=hello.celery:app events
I hope that’s enough to get you started. You should probably read the Celery User Guide now. Happy coding!
]]>In this post I will assume that you want to set up Jenkins on a Debian server named test-server
. I will further assume that:
http://test-server
virtualenv
located in the directory /webapps/hello_django/
/webapps/hello_django/trunk/
svn://svn-server/hello_django/trunk
Take a look at my previous post for more information about setting up Django in a virtualenv.
Jenkins provides packages for most system distributions. Installation is very simple and consists of adding the Jenkins repository to your package system and installing the package. On Debian this can be performed using the following steps:
### Add the Jenkins repository to the list of repositories
$ sudo sh -c 'echo deb http://pkg.jenkins-ci.org/debian binary/ > /etc/apt/sources.list.d/jenkins.list'
### Add the repository's public key to your system's trusted keychain
$ wget -q -O - http://pkg.jenkins-ci.org/debian/jenkins-ci.org.key | sudo apt-key add -
### Download the repository index and install
$ sudo apt-get update
$ sudo apt-get install jenkins
Installation is very similar on other systems. For details take a look at the Jenkins installation docs.
By default Jenkins runs on port 8080
and listens on all network interfaces. After installing the package, you can visit Jenkins under the URL: http://test-server:8080
The default setup provides no security at all, so if your server is accessible outside of a trusted network you will need to secure it. Jenkins documentation describes a basic security setup, which you can extend by proxying Jenkins through your secured Apache or Nginx server, using HTTPs, etc.
Since we will want Jenkins to deploy our application to the directory from which it runs, ie. /webapps/hello_django/trunk/
, we will need to give the jenkins
user access to write to this directory. We can do this by changing the directory’s owner to jenkins
for example:
$ sudo chown jenkins /webapps/hello_django/trunk/
If you want the jenkins
user to restart your web or application server after a new version of your code is deployed, you should add an appropriate entry to your server’s /etc/sudoers
file, such as:
jenkins ALL=NOPASSWD: /usr/sbin/apachectl
On your SVN server create a user named jenkins
with access to the source code of your project.
hello-django-trunk
/webapps/hello_django/trunk/
as the Directory.svn://svn-server/hello_django/trunk
jenkins
SVN user..
to indicate that we will be checking out code directly into the workspace directory.If you are using Git, you should install the Jenkins Git Plugin and use a Git URL for your repository instead.
We will want Jenkins to deploy our application to the test server and run our tests after every commit. We can accomplish this, in at least two ways: a) we can ask Jenkins to periodically poll the SVN server for information about new commits, or b) we can add a post-commit hook to our repository to trigger a build after every commit remotely. The first option is easier to set up, but slightly wasteful, as we end up polling our source-code repository for information even during times when no one is working. Choose the option which suits your needs best.
To configure polling of your source code repository every 10 minutes:
H/10 * * * *
Alternatively, to enable builds to be actively triggered by your source code repository’s post-commit
hook:
If you want builds to be triggered actively by your source code repository, you will need to create a script called post-commit
(in the hooks
directory of your SVN repo directory or the .git/hooks
directory when using Git). Your hook script should execute a command such as curl
to send an HTTP request to Jenkins which will trigger the build. The token is used here for security.
1 2 |
|
Once you set up Jenkins user authentication, the command above will not work. There is a plugin which can fix this. Note that when using this plugin the build trigger URL changes, so your command will have to be modified slightly.
1 2 |
|
More information about using hooks: in SVN and Git.
Here we finally get to the meat of the matter. In the Build section we can enter all commands which should be executed to deploy our application to the test server and run tests.
1 2 3 4 5 6 7 8 9 |
|
If a commit causes our tests to fail, we want to alert the guilty commiter.
There is an important caveat here. Emails will be sent to addresses which combine the SVN username and a default domain suffix. This means that your users will need to have mailboxes (or aliases) in the same domain named as their SVN users are. For example, if my SVN username is michal
, I would have to have the e-mail michal@email-server
.
@email-server
where email-server
is the fully. qualified domain name of your organization’s mail system.OK, we’re done. From now on, whenever code is submitted to your repository, Jenkins should pick it up, deploy your Django application, run tests and alert the commiter if something he did broke a test.
If your test server is available outside of your trusted network, make sure you proceed to lock it down tight.
The book is written as a series of 120 step-by-step recipes, which should be easily accessible to both novice and professional system administrators. Check it out to see how Webmin can make your life easier. The book is currently available from Packt Publishing.
If you would like to write a review, you can get a free copy of the book from the publisher. Just send an email to trishalb@packtpub.com with the subject “Webmin Administrators Cookbook – Review Request”.
I assume you have a server available on which you have root privileges. I am using a server running Debian 7, so everything here should also work on an Ubuntu server or other Debian-based distribution. If you’re using an RPM-based distro (such as CentOS), you will need to replace the apt-get
commands by their yum
counterparts and if you’re using FreeBSD you can install the components from ports.
If you don’t have a server to play with, I would recommend the inexpensive VPS servers offered by [Digital Ocean][digital_ocean_referal]. If you click through [this link][digital_ocean_referal] when signing up, you’ll pay a bit of my server bill :)
I’m also assuming you configured your DNS to point a domain at the server’s IP. In this text, I pretend your domain is example.com
Let’s get started by making sure our system is up to date.
$ sudo apt-get update
$ sudo apt-get upgrade
Let’s install Nginx and set it to start automatically during system boot.
$ sudo apt-get install nginx
$ sudo service nginx start
You can navigate to your server (http://example.com) with your browser and Nginx should greet you with the words “Welcome to nginx!”.
Under Apache PHP code is executed by the web server (via mod_php). The Nginx philosophy is somewhat different. It’s a reverse proxy rather then a server, so it’s not running any code itself. Instead it can serve (proxy) data generated by CGI applications running on your system.
For PHP this is PHP-FPM (FastCGI Process Manager). This is a daemon process which waits for incoming requests to execute PHP code, runs the scripts and returns their output. More information can be found on the PGP-FPM site.
$ sudo apt-get install php5-fpm
Edit the /etc/php5/fpm/php.ini
and change cgi.fix_pathinfo to 0.
$ sudo vim /etc/php5/fpm/php.ini
1 2 3 4 5 6 7 8 |
|
Now check the php5-fpm configuration file /etc/php5/fpm/pool.d/www.conf
and make sure that php5-fpm communicates with the outside world through a socket file:
1
|
|
$ sudo service php5-fpm restart
/etc/nginx/fastcgi.conf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
http://wiki.nginx.org/**FcgiExample**
Let’s start by installing the MySQL server package.
$ sudo apt-get install mysql-server
During the installation process you will be asked for a password for the root user. If you don’t set it during installation, you can set it later using the following command (substitute a password for NEWPASSWORD).
$ mysqladmin -u root password NEWPASSWORD
$ mysql -u root -p
mysql> CREATE DATABASE `wordpress` CHARACTER SET utf8 COLLATE utf8_unicode_ci;
mysql> CREATE USER 'wordpress'@'localhost' IDENTIFIED BY 'dKtbiHTPrkAzHUUrWRcuhMDqlpcszSQY0kd6vSoh5yotkdx7gCwRkAmGKFJVotu';
mysql> GRANT ALL PRIVILEGES ON wordpress.* TO 'wordpress'@'localhost';
mysql> FLUSH PRIVILEGES;
$ sudo apt-get install php5-mysqlnd
(Native driver over standard, use standard if native not available)
$ sudo apt-get install php5-gd
$ sudo mkdir -p /var/www/
$ cd /var/www/
$ wget -nv -O - https://wordpress.org/latest.tar.gz | sudo tar -xzv
$ sudo chown -R `whoami` /var/www/wordpress
$ mkdir /var/www/wordpress/wp-content/uploads
$ sudo chown www-data /var/www/wordpress/wp-content/uploads
$ sudo vim /etc/nginx/sites-available/example-wordpress
IN VM
$ sudo ln -s ../sites-available/example-wordpress /etc/nginx/sites-enabled/example-wordpress
β $ sudo service nginx configtest
Testing nginx configuration: nginx.
β[02:13:57] michal@webmin-host /etc/nginx/sites-available [0]
β $ sudo service nginx reload
Reloading nginx configuration: nginx.
Point your browser to example.com
Click the Create a Configuration File button.
Follow onscreen instructions
PICTURE
Save generated wp-config.php file to /var/www/wordpress/wp-config.php
Navigate to http://example.com again and complete the WordPress setup process.
Your Nginx-powered WordPress site is ready to go.
316 vim example.com 332 sudo ln -s ../sites-available/example.com /etc/nginx/sites-enabled/example.com 333 sudo service nginx configtest 334 sudo service nginx reload
More info: https://rtcamp.com/wordpress-nginx/tutorials/single-site/wp-super-cache/
]]>Let’s dive right in. The code below is a sample test scenario for The Grinder. It represents various aspects of interaction between the test agent and your application including:
XMLHttpRequest
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
Django as well as many other web applications use a simple authentication mechanism. First, the user provides a username and password combination. The application then verifies user data, opens a session and sends a session cookie back to the user. A session ID contained in the cookie uniquely identifies the user’s open session. As long as each request from the user comes in accompanied by the cookie, the user is considered logged in, at least until the session expires on the server.
The example test scenario file above depends on The Grinder’s ability to parse and resubmit cookies, so you don’t actually have to worry about the sessionid
cookie.
Another obstacle to overcome when running automated tests on Django are anti cross-site request forgery tokens. These tokens are generated dynamically for every form and Django requires that they be submitted with every POST request.
In the example test scenario we don’t parse HTML forms, but instead rely on the fact that Django also sets a cookie with the CSRF token value. We fetch the cookie value (using the get_csrf_token
function) and submit it as a field named csrfmiddlewaretoken
in POST data.
1 2 3 4 5 |
|
We can also simulate AJAX requests by sending appropriate headers, namely X-Requested-With
and X-CSRFToken
. The anti-CSRF token value is read from cookie and written in the latter header.
1 2 3 4 |
|
If you’re still having trouble with CSRF and keep testing 403 errors instead of your application, you can disable CSRF completely using a small bit of middleware. Just make sure you don’t leave this on in production.
1 2 3 |
|
When your tests are prepared and properly vetted you can leverage the power of The Grinder to run them from as many parallel agent machines (and/or threads) as you require. Your test output will include execution time statistics for every test step.
This artcle is a continuation of a previous article on setting up Django with Nginx and Gunicorn. You should start by following instructions in that article and prepare a server with the following components:
Our goal in this article will be to create two applications, one called Hello and one called Goodbye. The former will be served under the address http://hello.test and the latter http://goodbye.test
In order to keep your apps independent, each will run in its own virtual environment. Create an environment for each application using the virtualenv
command. In each environment install Django, Gunicorn, the application itself and its other dependencies. Follow steps described in my previous article for each app.
Let’s say that for our hello
and goodbye
applications we would create environments in /webapps/hello_django
and /webapps/goodbye_django
respectively. We would get a directory structure containing the following entries:
/webapps/
βββ hello_django <= virtualenv for the application Hello
βΒ Β βββ bin
βΒ Β βΒ Β βββ activate
βΒ Β βΒ Β βββ gunicorn <= Hello app's gunicorn
βΒ Β βΒ Β βββ gunicorn_start <= Hello app's gunicorn start script
βΒ Β βΒ Β βββ python
βΒ Β βββ hello <= Hello app's Django project directory
βΒ Β βΒ Β βββ hello
βΒ Β βΒ Β βββ settings.py <= hello.settings
βΒ Β βΒ Β βββ wsgi.py <= hello.wsgi
βΒ Β βββ logs <= Hello app's logs will be saved here
βΒ Β βββ media
βΒ Β βββ run <= Gunicorn's socket file will be placed here
βΒ Β βββ static
βββ goodbye_django <= analogous virtualenv for the application Goodbye
βββ bin
βΒ Β βββ activate
βΒ Β βββ gunicorn
βΒ Β βββ gunicorn_start
βΒ Β βββ python
βββ goodbye
βΒ Β βββ goodbye
βΒ Β βββ settings.py
βΒ Β βββ wsgi.py
βββ logs
βββ media
βββ run
βββ static
Even though Django has a pretty good security track record, web applications can become compromised. In order to make running multiple applications safer we’ll create a separate system user account for each application. The apps will run on our system with the privileges of those special users. Even if one application became compromised, an attacker would only be able to take over the part of your system available to the hacked application.
Create system users named hello
and goodbye
and assign them to a system group called webapps
.
$ sudo groupadd --system webapps
$ sudo useradd --system --gid webapps --home /webapps/hello_django hello
$ sudo useradd --system --gid webapps --home /webapps/goodbye_django goodbye
Now change the owner of files in each application’s folder. I like to assign the group users
as the owner, because that allows regular users of the server to access and modify parts of the application which are group-writable. This is optional.
$ sudo chown -R hello:users /webapps/hello_django
$ sudo chown -R goodbye:users /webapps/goodbye_django
For each application create a simple shell script based on my gunicorn_start template. The scripts differ only in the values of variables which they set.
For the Hello app, set the following values in /webapps/hello_django/bin/gunicorn_start
:
1 2 3 4 5 6 7 8 9 10 |
|
And for the Goodbye app by analogy:
1 2 3 4 5 6 7 8 9 10 |
|
Next, create a Supervisor configuration for each application. Add a file for each app to the /etc/supervisor/conf.d
directory.
One for Hello:
1 2 3 4 5 |
|
And one for Goodbye:
1 2 3 4 5 |
|
Reread the configuration files and update Supervisor to start the apps:
$ sudo supervisorctl reread
$ sudo supervisorctl update
You can also start them manually, if you prefer:
$ sudo supervisorctl start hello
$ sudo supervisorctl start goodbye
Finally we can create virtual server configurations for each app based on this template. These will be stored in /etc/nginx/sites-available
and then activated by links in /etc/nginx/sites-enabled
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Enable the virtual servers and restart Nginx:
$ sudo ln -s /etc/nginx/sites-available/hello /etc/nginx/sites-enabled/hello
$ sudo ln -s /etc/nginx/sites-available/goodbye /etc/nginx/sites-enabled/goodbye
$ sudo service nginx restart
Now let’s point each domain to our server for testing purposes. Making actual changes to the Domain Name System is usually among the final steps when working in production, performed after all tests are completed. For testing you can simulate the DNS changes by adding an entry to the /etc/hosts
file of a computer from which you will be connecting to your server (your laptop for example).
Say you want to serve Django applications under the domains hello.test
and goodbye.test
from a server with the IP address of 10.10.10.200
. You can simulate the appropriate DNS entries locally on your PC by putting the following line into your /etc/hosts
file. On Windows the file is conveniently hidden in %SystemRoot%\system32\drivers\etc\hosts
.
1 2 |
|
You can now navigate to each domain from your PC to test that each app on the server is working correctly:
Good luck!
]]>The HTTP Archive (HAR) format is able to store a history of HTTP transactions. This allows a web browser to export detailed performance data about web pages it loads. This format is currently a work in progress at the W3C.
Chrome’s DevTools allow you to save a history of your browsing including every HTTP request made by the browser during your session. We can convert this record to a script which The Grinder will run multiple times.
Fire up Chrome and open the DevTools.
Click the Settings icon in the bottom right corner and Disable the cache.
Open the Network tab of the DevTools.
Clear the Network history
Choose the option to Preserve Log upon navigation (circle icon turns red).
Navigate around your site.
After you navigate to the pages you want to test, right-click on the network history panel and choose Copy All as HAR. Save the clipboard to a .har
file.
Download the har2grinder script from Github.
To convert your recorded navigation to a Grinder test, simply run the har2grinder
script and redirect its output to a .py
file.
$ python har2grinder.py my_website_test.har > my_website_grinder_test.py
Information about using The Grinder can be found in it’s user guide. Essentially it boils down to:
Start a GUI console to control your tests using a command such as:
$ java -classpath lib/grinder.jar net.grinder.Console
Grinder agents are the programs which will connect to your website during tests. On the same machine, another machine or a set of machines create a file called grinder.properties
, which will contain the configuration for Grinder’s agents.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
…and start the agent processes:
$ java -classpath lib/grinder.jar net.grinder.Grinder grinder.properties
You can now run your performance tests from the Grinder console.
]]>Let’s begin by setting up a generic mod_wsgi
application in your WebFaction control panel. Log into the control panel, choose the option to add a new application and specify the following settings:
test_app
mod_wsgi
mod_wsgi 3.4 / Python 2.7
The new application will be created in your home directory (~
) under: ~/webapps/test_app
.
The directory will contain two subdirectories:
apache2
– contains the Apache configuration files (apache2/conf
) and scripts which let you control the server (apache2/bin
)htdocs
– contains default page files.Configure a new website to hook up your application to a domain. Test the website by visiting it in your browser. You should be greeted by a message beginning with the following text:
Welcome to your mod_wsgi website! It uses: Python 2.7....
If you see the above, then the generic application is set up correctly and we can proceed to turn it into a Virtualenv Django application.
The htdocs
directory will not be needed, so feel free to remove it.
$ cd ~/webapps/test_app
$ rm -r htdocs
Check if Virtualenv is installed on your server:
$ virtualenv --version
-bash: virtualenv: command not found
If Virtualenv is installed, you will see a version number when running the above command. If it’s missing you’ll see a command not found
error message instead.
Steps to install Vitrualenv on a WebFaction server are the following:
$ mkdir -p ~/lib/python2.7/
$ easy_install-2.7 pip
$ pip install virtualenv
If you get an permission denied error try this command to install virtualenv inside your user folder:
$ pip install --user virtualenv
Verify that installation was successful:
$ virtualenv --version
1.10.1
Let’s proceed to turn our application directory into a virtual Python environment:
$ cd ~/webapps/test_app
$ virtualenv .
This adds the folders and scripts for a virtual environment inside of the directory which WebFaction created for our application.
You can now activate the created environment:
$ source bin/activate
(test_app) $
Once the initial Virtualenv setup is complete, you can install Django inside it’s lib/python2.7/site-packages
directory.
(test_app) $ pip install django
Verify that Django installed correctly:
(test_app) $ django-admin.py --version
1.5.2
Your project will probably depend on other packages. You can install those from a REQUIREMENTS.txt
file, which you can generate on your development server with the pip freeze
command.
(test_app) $ pip install -r REQUIREMENTS.txt
Let’s create a new Django project inside the virtual environment:
(test_app) $ django-admin.py startproject test_django
At this stage you should have created a directory structure resembling this:
~/webapps/test_app
|-- apache2
| |-- bin
| | |-- httpd
| | |-- httpd.worker
| | |-- restart <== Scripts which start, stop and restart Apache
| | |-- start
| | `-- stop
| |-- conf
| | |-- httpd.conf <== Apache configuration file
| | `-- mime.types
| |-- lib
| |-- logs <== Apache error log is here
| `-- modules
|-- bin <== Virtualenv scipts and binaries
| |-- activate <== Virtualenv activation script
| |-- django-admin.py
| |-- easy_install
| |-- easy_install-2.7
| |-- pip
| |-- pip-2.7
| |-- python -> python2.7
| |-- python2 -> python2.7
| `-- python2.7
|-- include
|-- lib
| `-- python2.7
| `-- site-packages <== Virtualenv's Python packages directory
`-- test_django <== Your Django project directory
|-- manage.py
`-- test_django
|-- __init__.py
|-- settings.py
|-- urls.py
`-- wsgi.py <== WSGI script file which Apache runs through mod_wsgi
We are now ready to configure Apache to serve our Django-powered webapp. In order to do this, we’ll need to modify the contents of the Apache configuration file located under apache2/conf/httpd.conf
. Copy the original file to a backup for reference and make a note of the following values:
Listen
directive of the original httpd.conf
. In the example below we set this to 12345
,test_app
),example.com
),/home/my_username/webapps/test_app
and /home/my_username/webapps/test_app/test_django
,/home/my_username/webapps/test_app/test_django/test_django/wsgi.py
.Use these values to customize the configuration template below and save it as your new httpd.conf
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
Save the configuration to ~/webapps/test_app/apache2/conf/httpd.conf
and restart Apache.
$ ./apache2/bin/restart
Visit your website again and you should be presented with Django congratulating you for setting your server up correctly.
The recommended way to serve static and media files on WebFaction is to use Nginx directly.
Let’s begin by creating the directories for static and media files.
$ cd ~/webapps/test_app
$ mkdir media static
In order to tell Django where the files should be stored, we should place the appropriate lines in the project’s settings.py
file. I like to keep the location of media
and static
folders relative to the source code project, so I would set them in this way:
1 2 3 |
|
Let’s collect the static files from all applications to the static
directory:
$ cd ~/webapps/test_app
$ source bin/activate
(test_app) $ cd test_app
(test_app) $ python manage.py collecstatic
We can now serve our static files. In the WebFaction control panel, add two new applications named test_app_media
and test_app_static
. Both will be defined using these settings:
Symbolic link
Symbolic link to static-only app
/home/my_username/webapps/test_app/media
or /home/my_username/webapps/test_app/static
The final step is to add these Nginx-powered folders to our website definition. On the website settings screen for your domain, in the Contents section, choose to add an application. Choose the option to reuse an existing application and set the test_app_media
to serve everything under http://example.com/media
and test_app_static
for http://example.com/static
.
You will want to use slightly different settings for your development and production environments. In order to separate them you can create three separate settings files:
settings.py
– global settings, which apply to both environmentssettings_dev.py
– your development environment specific settingssettings_prod.py
– production environment specific settingsThe settings_prod.py
file should only contain the settings which are specific to this environment, but also import all the global settings. We can do this by importing global settings like this:
1 2 3 4 5 6 7 8 9 |
|
Django checks the environment variable named DJANGO_SETTINGS_MODULE
to determine which settings file to use. If this environment variable is undefined, it will fall back to test_app.settings
.
In order to use your new settings module in the shell, we can add a line to the end of the script which activates our virtual environment (bin/activate
).
export DJANGO_SETTINGS_MODULE=test_app.settings_prod
Apache and mod_wsgi don’t know about our new settings yet. We can set the DJANGO_SETTINGS_MODULE
dynamically inside the wsgi.py
script. Create a wsgi_prod.py
script which will contain the following:
1 2 3 4 |
|
Now instruct Apache to use this WSGI script by setting the WSGIScriptAlias
directive line to:
1
|
|
Restart Apache and your application should run with production settings applied.
]]>I’m going to assume you have a Django application running in a virtual environment, under supervisord on a Debian server. Instructions for creating such a setup can be found in a previous article.
Setting up Redis on Debian is simple using apt-get
. On an RPM-based system you can use the equivalent yum
command or similar.
$ sudo apt-get install redis-server
$ redis-server --version
Redis server version 2.4.14 (00000000:0)
You can connect to a local Redis instance over the network layer (TCP to the loopback interface) or through a unix socket file.
In order to avoid the small overhead of TCP, we can configure Redis to accept direct socket connections. To do this, edit your /etc/redis/redis.conf
file, comment out the bind
and port
directives and uncomment the two unixsocket
directives.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
After making changes to its configuration you will need to restart Redis:
$ sudo service redis-server restart
You can now check if Redis is up and accepting connections:
$ redis-cli ping
PONG
Default permissions on the Redis socket are very restrictive on Debian and allow only the user redis
to connect through it. You can relax these permissions and allow any local user to connect to Redis by changing the unixsocketperm
directive in redis.conf
to:
1
|
|
Remember to restart Redis after making configuration changes
$ sudo service redis-server restart
For security reasons, it may be better to restrict access to the socket to a chosen group of users. You can add the user which your application will run as to the group redis
:
$ sudo usermod -a -G redis django_username
Then change the permissions on the socket file by changing the unixsocketperm
directive in redis.conf
to:
1
|
|
Groups are assigned at login, so if your application is running, you’ll need to restart it. If you’re running an application called hello
via supervisor
, you can restart it like this:
$ sudo supervisorctl restart hello
Find more information about getting started with Redis in the documentation.
In order to use Redis with a Python application such as Django, you’ll need to install the Redis-Python interface bindings module. You can install it in your virtualenv with pip
:
$ source bin/activate
(hello_django) $ pip install redis
You can set up Redis to store your application’s cache data. Use the django-redis-cache module for this.
Install django-redis-cache
in your virtual environment:
(hello_django) $ pip install django-redis-cache
And add the following to your settings.py
file:
1 2 3 4 5 6 |
|
You can now restart your application and start using Django’s Redis-powered cache.
Django’s cache framework is very flexible and allows you to cache your entire site or individual views. You can control the behavior of the cache using the @cache_page
decorator. For instance to cache the results of my_view
for 15 minutes, you can use the following code:
1 2 3 4 5 |
|
If you haven’t set up Django’s cacheing middleware yet, you should do so by adding lines 2 and 6 from the snippet below to MIDDLEWARE_CLASSES
in your settings.py
.
1 2 3 4 5 6 |
|
More information: using Django’s cache.
You can also use the cache in your functions to store arbitrary data for quick retrieval later on.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Complex values will be serialized and stored under a single key. Before you can look up a value in the structure, it has to be retrieved from cache and unserialized. This is not as fast as storing simple values directly in the cache under a more complex key namespace.
More information: Django’s cache API.
If you’re using django-redis-cache
as described above, you can use it to store Django’s sessions by adding the following to your setting.py
:
1
|
|
You can also write session information to the database and only load it from the cache by using:
1
|
|
This ensures that session data is persistent and can survive a restart of Redis.
Alternatively, you can use Redis exclusively as a store for Django’s session data. The django-redis-sessions module let’s you do this.
Inside your virtual environment install django-redis-sessions
:
(hello_django) $ pip install django-redis-sessions
Now, configure redis_sessions.session
as the session engine in your setting.py
. Since we’re using a socket to connect, we also need to provide its path:
1 2 |
|
That’s it. After you restart your application, session data should be stored in Redis instead of the database.
More information: working with sessions in Django.
Please note that the solution described above can only be used by a single Django application. If you want to use multiple Django applications with Redis, you could try to separate their namespaces, by using key prefixes, or using a different Redis numeric database for each (1
for one application, 2
for another, etc). These solutions are not advised however.
If you want to run multiple applications each with a Redis cache, the recommendation is to run a separate Redis instance for each application. On Chris Laskey’s blog you can find instructions for setting up multiple Redis instances on the same server.
]]>In this text I will explain how to combine all of these components into a Django server running on Linux.
I assume you have a server available on which you have root privileges. I am using a server running Debian 7, so everything here should also work on an Ubuntu server or other Debian-based distribution. If you’re using an RPM-based distro (such as CentOS), you will need to replace the aptitude
commands by their yum
counterparts and if you’re using FreeBSD you can install the components from ports.
If you don’t have a server to play with, I would recommend the inexpensive VPS servers offered by Digital Ocean. If you click through this link when signing up, you’ll pay a bit of my server bill :)
I’m also assuming you configured your DNS to point a domain at the server’s IP. In this text, I pretend your domain is example.com
Let’s get started by making sure our system is up to date.
$ sudo aptitude update
$ sudo aptitude upgrade
To install PostgreSQL on a Debian-based system run this command:
$ sudo aptitude install postgresql postgresql-contrib
Create a database user and a new database for the app. Grab a perfect password from GRC.
$ sudo su - postgres
postgres@django:~$ createuser --interactive -P
Enter name of role to add: hello_django
Enter password for new role:
Enter it again:
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
postgres@django:~$
postgres@django:~$ createdb --owner hello_django hello
postgres@django:~$ logout
$
Even though Django has a pretty good security track record, web applications can become compromised. If the application has limited access to resources on your server, potential damage can also be limited. Your web applications should run as system users with limited privileges.
Create a user for your app, named hello
and assigned to a system group called webapps
.
$ sudo groupadd --system webapps
$ sudo useradd --system --gid webapps --shell /bin/bash --home /webapps/hello_django hello
Virtualenv is a tool which allows you to create separate Python environments on your system. This allows you to run applications with different sets of requirements concurrently (e.g. one based on Django 1.5, another based on 1.6). virtualenv is easy to install on Debian:
$ sudo aptitude install python-virtualenv
I like to keep all my web apps in the /webapps/
directory. If you prefer /var/www/
, /srv/
or something else, use that instead. Create a directory to store your application in /webapps/hello_django/
and change the owner of that directory to your application user hello
$ sudo mkdir -p /webapps/hello_django/
$ sudo chown hello /webapps/hello_django/
As the application user create a virtual Python environment in the application directory:
$ sudo su - hello
hello@django:~$ cd /webapps/hello_django/
hello@django:~$ virtualenv .
New python executable in hello_django/bin/python
Installing distribute..............done.
Installing pip.....................done.
hello@django:~$ source bin/activate
(hello_django)hello@django:~$
Your environment is now activated and you can proceed to install Django inside it.
(hello_django)hello@django:~$ pip install django
Downloading/unpacking django
(...)
Installing collected packages: django
(...)
Successfully installed django
Cleaning up...
Your environment with Django should be ready to use. Go ahead and create an empty Django project.
(hello_django)hello@django:~$ django-admin.py startproject hello
You can test it by running the development server:
(hello_django)hello@django:~$ cd hello
(hello_django)hello@django:~$ python manage.py runserver example.com:8000
Validating models...
0 errors found
June 09, 2013 - 06:12:00
Django version 1.5.1, using settings 'hello.settings'
Development server is running at http://example.com:8000/
Quit the server with CONTROL-C.
You should now be able to access your development server from http://example.com:8000
Your application will run as the user hello
, who owns the entire application directory. If you want regular user to be able to change application files, you can set the group owner of the directory to users
and give the group write permissions.
$ sudo chown -R hello:users /webapps/hello_django
$ sudo chmod -R g+w /webapps/hello_django
You can check what groups you’re a member of by issuing the groups
command or id
.
$ id
uid=1000(michal) gid=1000(michal) groups=1000(michal),27(sudo),100(users)
If you’re not a member of users
, you can add yourself to the group with this command:
$ sudo usermod -a -G users `whoami`
Group memberships are assigned during login, so you may need to log out and back in again for the system to recognize your new group.
In order to use Django with PostgreSQL you will need to install the psycopg2
database adapter in your virtual environment. This step requires the compilation of a native extension (written in C). The compilation will fail if it cannot find header files and static libraries required for linking C programs with libpq
(library for communication with Postgres) and building Python modules (python-dev
package). We have to install these two packages first, then we can install psycopg2
using PIP.
Install dependencies:
$ sudo aptitude install libpq-dev python-dev
Install psycopg2
database adapter:
(hello_django)hello@django:~$ pip install psycopg2
You can now configure the databases section in your settings.py
:
1 2 3 4 5 6 7 8 9 10 |
|
And finally build the initial database for Django:
(hello_django)hello@django:~$ python manage.py migrate
In older versions of Django the equivalent command was: manage.py syncdb
In production we won’t be using Django’s single-threaded development server, but a dedicated application server called gunicorn.
Install gunicorn in your application’s virtual environment:
(hello_django)hello@django:~$ pip install gunicorn
Downloading/unpacking gunicorn
Downloading gunicorn-0.17.4.tar.gz (372Kb): 372Kb downloaded
Running setup.py egg_info for package gunicorn
Installing collected packages: gunicorn
Running setup.py install for gunicorn
Installing gunicorn_paster script to /webapps/hello_django/bin
Installing gunicorn script to /webapps/hello_django/bin
Installing gunicorn_django script to /webapps/hello_django/bin
Successfully installed gunicorn
Cleaning up...
Now that you have gunicorn, you can test whether it can serve your Django application by running the following command:
(hello_django)hello@django:~$ gunicorn hello.wsgi:application --bind example.com:8001
You should now be able to access the Gunicorn server from http://example.com:8001 . I intentionally changed port 8000 to 8001 to force your browser to establish a new connection.
Gunicorn is installed and ready to serve your app. Let’s set some configuration options to make it more useful. I like to set a number of parameters, so let’s put them all into a small BASH script, which I save as bin/gunicorn_start
Set the executable bit on the gunicorn_start
script:
$ sudo chmod u+x bin/gunicorn_start
You can test your gunicorn_start
script by running it as the user hello
.
$ sudo su - hello
hello@django:~$ bin/gunicorn_start
Starting hello_app as hello
2013-06-09 14:21:45 [10724] [INFO] Starting gunicorn 18.0
2013-06-09 14:21:45 [10724] [DEBUG] Arbiter booted
2013-06-09 14:21:45 [10724] [INFO] Listening at: unix:/webapps/hello_django/run/gunicorn.sock (10724)
2013-06-09 14:21:45 [10724] [INFO] Using worker: sync
2013-06-09 14:21:45 [10735] [INFO] Booting worker with pid: 10735
2013-06-09 14:21:45 [10736] [INFO] Booting worker with pid: 10736
2013-06-09 14:21:45 [10737] [INFO] Booting worker with pid: 10737
^C (CONTROL-C to kill Gunicorn)
2013-06-09 14:21:48 [10736] [INFO] Worker exiting (pid: 10736)
2013-06-09 14:21:48 [10735] [INFO] Worker exiting (pid: 10735)
2013-06-09 14:21:48 [10724] [INFO] Handling signal: int
2013-06-09 14:21:48 [10737] [INFO] Worker exiting (pid: 10737)
2013-06-09 14:21:48 [10724] [INFO] Shutting down: Master
$ exit
Note the parameters set in gunicorn_start
. You’ll need to set the paths and filenames to match your setup.
As a rule-of-thumb set the --workers
(NUM_WORKERS
) according to the following formula: 2 * CPUs + 1. The idea being, that at any given time half of your workers will be busy doing I/O. For a single CPU machine it would give you 3.
The --name
(NAME
) argument specifies how your application will identify itself in programs such as top
or ps
. It defaults to gunicorn
, which might make it harder to distinguish from other apps if you have multiple Gunicorn-powered applications running on the same server.
In order for the --name
argument to have an effect you need to install a Python module called setproctitle
. To build this native extension pip
needs to have access to C header files for Python. You can add them to your system with the python-dev
package and then install setproctitle
.
$ sudo aptitude install python-dev
(hello_django)hello@django:~$ pip install setproctitle
Now when you list processes, you should see which gunicorn belongs to which application.
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
(...)
hello 11588 0.7 0.2 58400 11568 ? S 14:52 0:00 gunicorn: master [hello_app]
hello 11602 0.5 0.3 66584 16040 ? S 14:52 0:00 gunicorn: worker [hello_app]
hello 11603 0.5 0.3 66592 16044 ? S 14:52 0:00 gunicorn: worker [hello_app]
hello 11604 0.5 0.3 66604 16052 ? S 14:52 0:00 gunicorn: worker [hello_app]
Your gunicorn_start
script should now be ready and working. We need to make sure that it starts automatically with the system and that it can automatically restart if for some reason it exits unexpectedly. These tasks can easily be handled by a service called supervisord. Installation is simple:
$ sudo aptitude install supervisor
When Supervisor is installed you can give it programs to start and watch by creating configuration files in the /etc/supervisor/conf.d
directory. For our hello
application we’ll create a file named /etc/supervisor/conf.d/hello.conf
with this content:
You can set many other options, but this basic configuration should suffice.
Create the file to store your application’s log messages:
hello@django:~$ mkdir -p /webapps/hello_django/logs/
hello@django:~$ touch /webapps/hello_django/logs/gunicorn_supervisor.log
After you save the configuration file for your program you can ask supervisor to reread configuration files and update (which will start your the newly registered app).
$ sudo supervisorctl reread
hello: available
$ sudo supervisorctl update
hello: added process group
You can also check the status of your app or start, stop or restart it using supervisor.
$ sudo supervisorctl status hello
hello RUNNING pid 18020, uptime 0:00:50
$ sudo supervisorctl stop hello
hello: stopped
$ sudo supervisorctl start hello
hello: started
$ sudo supervisorctl restart hello
hello: stopped
hello: started
Your application should now be automatically started after a system reboot and automatically restarted if it ever crashed for some reason.
Time to set up Nginx as a server for out application and its static files. Install and start Nginx:
$ sudo aptitude install nginx
$ sudo service nginx start
You can navigate to your server (http://example.com) with your browser and Nginx should greet you with the words “Welcome to nginx!”.
Each Nginx virtual server should be described by a file in the /etc/nginx/sites-available
directory. You select which sites you want to enable by making symbolic links to those in the /etc/nginx/sites-enabled
directory.
Create a new nginx server configuration file for your Django application running on example.com in /etc/nginx/sites-available/hello
. The file should contain something along the following lines. A more detailed example is available from the folks who make Gunicorn.
Create a symbolic link in the sites-enabled
folder:
$ sudo ln -s /etc/nginx/sites-available/hello /etc/nginx/sites-enabled/hello
Restart Nginx:
$ sudo service nginx restart
If you navigate to your site, you should now see your Django welcome-page powered by Nginx and Gunicorn. Go ahead and develop to your heart’s content.
At this stage you may find that instead of the Django welcome-page, you encounter the default “Welcome to nginx!” page. This may be caused by the default
configuration file, which is installed with Nginx and masks your new site’s configuration. If you don’t plan to use it, delete the symbolic link to this file from /etc/nginx/sites-enabled
.
If you run into any problems with the above setup, please drop me a line.
If you followed this tutorial, you should have created a directory structure resembling this:
/webapps/hello_django/
βββ bin <= Directory created by virtualenv
β βββ activate <= Environment activation script
β βββ django-admin.py
β βββ gunicorn
β βββ gunicorn_django
β βββ gunicorn_start <= Script to start application with Gunicorn
β βββ python
βββ hello <= Django project directory, add this to PYTHONPATH
β βββ manage.py
β βββ project_application_1
β βββ project_application_2
β βββ hello <= Project settings directory
β βββ __init__.py
β βββ settings.py <= hello.settings - settings module Gunicorn will use
β βββ urls.py
β βββ wsgi.py <= hello.wsgi - WSGI module Gunicorn will use
βββ include
β βββ python2.7 -> /usr/include/python2.7
βββ lib
β βββ python2.7
βββ lib64 -> /webapps/hello_django/lib
βββ logs <= Application logs directory
β βββ gunicorn_supervisor.log
β βββ nginx-access.log
β βββ nginx-error.log
βββ media <= User uploaded files folder
βββ run
β βββ gunicorn.sock
βββ static <= Collect and serve static files from here
If time comes to remove the application, follow these steps.
Remove the virtual server from Nginx sites-enabled
folder:
$ sudo rm /etc/nginx/sites-enabled/hello_django
Restart Nginx:
$ sudo service nginx restart
If you never plan to use this application again, you can remove its config file also from the sites-available
directory
$ sudo rm /etc/nginx/sites-available/hello_django
Stop the application with Supervisor:
$ sudo supervisorctl stop hello
Remove the application from Supervisor’s control scripts directory:
$ sudo rm /etc/supervisor/conf.d/hello.conf
If you never plan to use this application again, you can now remove its entire directory from webapps
:
$ sudo rm -r /webapps/hello_django
You will probably want to use different settings on your production server and different settings during development.
To achieve this I usually create a second settings file called settings_local.py
, which contains overrides of some values, but imports every default like so:
1 2 |
|
You can then tell Django to use this local settings file by specifying the environment variable DJANGO_SETTINGS_MODULE=hello.settings_local
.
If you would like some help with setting up a Nginx server to run multiple Django applications, check out my next article.
This article was actually translated into a number of languages. If you would like, you can read it in Spanish or Chinese. If you know of other translations, let me know.
]]>I assume you have a server available on which you have root privileges. I am using a server running Debian 7, so everything here should work the same on an Ubuntu server or other Debian-based distribution. If you’re using an RPM-based distro (such as CentOS), you will need to replace the apt-get
commands by their yum
counterparts.
If you don’t have a server to play with, I would recommend the inexpensive VPS servers offered by Digital Ocean. If you click through this link when signing up, you’ll pay a bit of my server bill :)
I’m also assuming you configured your DNS to point a domain at the server’s IP. In this text, I assume it’s example.com
It’a always a good idea to keep every part of your system up-to-date and since APT makes is so simple, let’s start with an update of the APT cache and a system upgrade (installing up-to-date versions of all packages).
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install mysql-server
During the installation process you will be asked for a password for the root user. If you don’t set it during installation, you can set it later using the following command (substitute a password for NEWPASSWORD).
$ mysqladmin -u root password NEWPASSWORD
$ sudo apt-get install nginx
$ sudo service nginx start
You can navigate to your server (http://example.com) with your browser and Nginx should greet you with the words “Welcome to nginx!”.
Under Apache PHP code is executed by the web server (via mod_php). The Nginx philosophy is somewhat different. It’s a reverse proxy rather then a server, so it’s not running any code itself. Instead it can serve (proxy) data generated by CGI applications running on your system.
For PHP this is PHP-FPM (FastCGI Process Manager). This is a daemon process which waits for incoming requests to execute PHP code, runs the scripts and returns their output. More information can be found on the PGP-FPM site.
sudo apt-get install php5-fpm
Edit the /etc/php5/fpm/php.ini
and change cgi.fix_pathinfo to 0.
sudo vim /etc/php5/fpm/php.ini
; cgi.fix_pathinfo provides *real* PATH_INFO/PATH_TRANSLATED support for CGI. PHP's ; previous behavior was to set PATH_TRANSLATED to SCRIPT_FILENAME, and to not grok ; what PATH_INFO is. For more information on PATH_INFO, see the cgi specs. Setting ; this to 1 will cause PHP CGI to fix its paths to conform to the spec. A setting ; of zero causes PHP to behave as before. Default is 1. You should fix your scripts ; to use SCRIPT_FILENAME rather than PATH_TRANSLATED. ; http://php.net/cgi.fix-pathinfo cgi.fix_pathinfo=0
Now check the php5-fpm configuration file /etc/php5/fpm/pool.d/www.conf
and make sure that php5-fpm communicates with the outside world through a socket file:
listen = /var/run/php5-fpm.sock
sudo service php5-fpm restart
Install Drupal dependencies
$ sudo apt-get install php5-mysql
$ sudo apt-get install php5-gd
Create a MySQL database and user for Drupal.
$ mysql -u root -p
mysql> CREATE DATABASE example;
mysql> GRANT ALL PRIVILEGES ON example.* TO example@localhost IDENTIFIED BY 'hskd7e345kfi';
Install Drush
$ sudo apt-get install drush
Install Drupal in any way you prefer, for instance using Drush like this:
$ cd /webapps/ # Or wherever you want to keep your sites
$ drush dl drupal-7.22
$ mv drupal-7.22/ example-drupal
$ cd example-drupal
$ drush site-install standard --account-name=admin --account-pass=admin --db-url=mysql://example:hskd7e345kfi@localhost/example
Each Nginx virtual server should be described in a file in the /etc/nginx/sites-available
directory. You select which sites you want to enable by making symbolic links to those in the /etc/nginx/sites-enabled
directory.
Create a new nginx server configuration file for your Drupal site running on example.com in /etc/nginx/sites-available/example-drupal
. The file should contain something along the following lines. A good explanation can be found on the Nginx wiki.
Restart Nginx:
$ sudo service nginx restart
You can now visit your Drupal site at http://example.com and log in as admin with the password admin. Remember to change your password.
Install memcached and the memcache PHP extension.
$ sudo apt-get install memcached
$ sudo apt-get install php5-memcached
Edit the php.ini file to enable the extension and configure memcache. At minimum add the following lines:
extension=memcached.so memcache.hash_strategy="consistent"
In order to enable memcached-based cacheing on your Drupal site, you will need to download and enable the memcache module:
$ cd /webapps/example-drupal
$ drush dl memcache
$ drush en memcache
You will also need to add cache_backends
configuration to your settings.php file.
$ vim sites/default/settings.php
Add the following lines:
$conf['cache_backends'][] = 'sites/all/modules/memcache/memcache.inc'; $conf['cache_default_class'] = 'MemCacheDrupal'; $conf['cache_class_cache_form'] = 'DrupalDatabaseCache';
Your site is now cached using memcache. You can enable a module which installed along memcache, which will display a summary of how many memcache hits or misses were executed during the generation of every page in the admin.
$ drush en memcache_admin
That’s it, I hope it helps. If you have any problems or see areas which could be improved, feel free to leave me a comment or send me a message.
]]>