Database migrations and Pum

By Friday February 23rd, 2018Database

At Oslandia we often need to deal with database migrations. So it’s an important topic to us. And
it should be an important topic to anyone maintaining databases in production.

First of all I want to mention a good series of articles by K. Scott Allen on the philosophy and
practice of database migrations and version control:

These articles are not about specific databases and migration tools. Instead they cover the basics of having your databases under version control and managing migrations.

Many database migration tools exist, both in the opensource and proprietary worlds.

Some tools relate to specific programming languages and ORMs (Object-Relational Mapping), such as Rails Migrations and Alembic. I’ve personally used Alembic in a number of Python projects in the past. Alembic is a fantastic tool for Python applications based on the SQLAlchemy toolkit. I would hightly recommend it. Other popular database migration tools include Flyway and Liquidbase. With Flyway migrations are written in either Java or SQL. With Liquidbase migrations are described using declarative formats such as XML, YAML and JSON; “plain SQL” being also supported.

Pum, which stands for “Postgres Upgrades Manager”, is a new migration tool. It was created in the context of the QWAT and QGEP projects, but it is completely independent and agnostic to those projects. Pum was initially created by OPENGIS.ch, with help from Oslandia and the whole QWAT community for the design and specifications of the tool.

Pum was greatly inspired by Flyway and Liquidbase. Flyway and Liquidbase are written in Java, and supports various database systems. In contrast Pum is written in Python, and focuses on the PostgreSQL database system. Pum is simple, and easy to use and extend. Also, with Pum, migrations, called “deltas” in the Pum jargon, are written in “plain SQL”. Pum fully embraces SQL and doesn’t attempt to hide it behind a declarative or programming language.

Pum is a CLI (Command Line Interface) tool. The main commands provided by Pum are

  • check
  • baseline
  • upgrade
  • test-and-upgrade

The check command compares two databases and shows the differences. Let’s create a simple example to illustrate it:

$ createdb prod  # create the "production" database
$ createdb ref   # create the "reference" database
$
$ cat > pg_service.conf << EOF # create a PostgreSQL service file for "prod" and "ref" > [prod]
> dbname=prod
> [ref]
> dbname=ref
> EOF
$ export PGSERVICEFILEFILE=pg_service.conf
$
$ pum check -p1 prod -p2 ref
Check...OK
columns: []
constraints: []
functions: []
indexes: []
rules: []
sequences: []
tables: []
triggers: []
views: []

The check command reports that the databases “prod” and “ref” are identical. They’re actually both empty. Now let’s add a table to the “ref” database, and run the check command again:

$ # add the "test" table to the "ref" database
$ psql -d ref -c "create table test (name text)"
$
$ pum check -p1 prod -p2 ref
Check...DIFFERENCES FOUND
columns:
- + ('public', 'test', 'name', None, 'YES', 'text', None, None, None, None)
constraints: []
functions: []
indexes: []
rules: []
sequences: []
tables:
- + ('public', 'test')
triggers: []
views: []

This time the check command reports that the “ref” database has a table and a column that the “prod” database does not have. If we created the same “test” table in “prod” the `check` command would report that “prod” and “ref” are the same again.

The baseline command assigns a version to a database. For example let’s assign the version 0.0.1 to the “prod” database, and the version 0.0.2 to the “ref” database:

$ pum baseline -p prod -t public.pum_upgrades -d deltas -b 0.0.1
$ pum baseline -p ref -t public.pum_upgrades -d deltas -b 0.0.2
$
$ # check that version 0.0.1 was assigned to the "prod" database
$ psql service=prod -c "table pum_upgrades"
 id | version | description | type | script | checksum | installed_by |        installed_on        | execution_time | success 
----+---------+-------------+------+--------+----------+--------------+----------------------------+----------------+---------
  1 | 0.0.1   | baseline    |    0 |        |          | postgres     | 2018-02-22 16:33:40.948143 |              1 | t
(1 row)

$
$ # check that version 0.0.2 was assigned to the "ref" database
$ psql service=ref -c "table pum_upgrades"
 id | version | description | type | script | checksum | installed_by |       installed_on        | execution_time | success
----+---------+-------------+------+--------+----------+--------------+---------------------------+----------------+---------
  1 | 0.0.2   | baseline    |    0 |        |          | postgres     | 2018-02-22 16:56:25.19542 |              1 | t
(1 row)

Finally, we’re going to write a migration script to migrate the “prod” database, and use Pum to do the migration.

We create the migration script delta_0.0.2_000.sql with the following content:

create table test (name text);

Let’s now migrate “prod”, checking that it now matches the “ref” database:

$ # a "test" database is required by the test-and-upgrade command
$ createdb test
$ cat >> pg_service.conf << EOF > [test]
> dbname=test
> EOF
$
$ # now perform the migration
$ pum test-and-upgrade -pp prod -pt test -pc ref -t public.pum_upgrades -d delta -f output.backup
Test and upgrade...Dump...OK
Restore...OK
Upgrade...     Applying delta 0.0.2... OK
OK
Check...OK
columns: []
constraints: []
functions: []
indexes: []
rules: []
sequences: []
tables: []
triggers: []
views: []

Apply deltas to prod? [n]|y: y
Upgrade...     Applying delta 0.0.2... OK
OK
OK

The “prod” database has been migrated, and during the process PUM has checked that the “prod” and “ref” are now in the same state.

That’s all folks! Please contact us at infos@oslandia.com if you have any questions or want to discuss this further.