Full Spatial database power in 2 lines

By mardi 20 mai 2014Developpement, NewsFR

This post explains how to setup a powerful spatial data store with a wide range of features on Ubuntu 14.04 (Trusty) in 2 command lines. And how it works.

TL;DR

Run this on Ubuntu 14.04 Trusty Tahr, and enjoy :

sudo apt install docker.io
sudo docker.io run --rm -P --name pggis_test oslandia/pggis /sbin/my_init

Database credentials : pggis/pggis/pggis for database/user/password

(DO NOT USE THIS IMAGE FOR PRODUCTION).

More info

You want to store and manipulate spatial data. Lots of spatial data. Various spatial data. This Docker image lets you install the latest components based on the most powerful opensource database (PostgreSQL, in case you wonder) and benefit from a wide range of features.
Docker is a Linux container based technology. See containers like something between the good old chroot and a Virtual Machine. Fast, efficient and painless. See Docker as a mix between a meta-package manager, git and a whale. Docker is available natively on Ubuntu 14.04 (and named docker.io), and can also run on other Linux flavors, or in a minimalist Virtual Machine with Boot2docker for other environments (Windows, MacOSX).

This image focus on installing and running the following components :

  • PostgreSQL 9.3
  • PostGIS 2.1.3 with SFCGAL support
  • PgRouting
  • PostgreSQL PointCloud
  • PDAL

PostgreSQL is the advanced database providing a lot of features for data store and management. Version 9.3 is the latest stable version (9.4 is just around the corner), providing support for streaming replication, JSON, CTE, window queries and much, much more.

PostGIS is a PostgreSQL extension, enabling spatial support. It features new data types, such as 2D (and 2.5D) vector data, be it projected or in lat/lon. It also offers raster data support, to store gridded image data, and topological vector data storage.

SFCGAL support for PostGIS adds 3D features to the database. You can store 3D meshes, TIN, and operate on those objects.

PgRouting is a PostgreSQL/PostGIS extension allowing you to perform routing in the database on topological spatial data. It can be used to dynamically compute shortest paths, driving distances and more.

PostgreSQL PointCloud is a PostgreSQL/PostGIS extension which lets you deal with huge amounts of points, e.g. LIDAR data. PDAL is a library and set of tools allowing you to load and extract data from and to PointCloud (among other things).

All of this makes the best platform for spatial data crushing. Loads of spatial data.

Using it

Below is a longer full explanation to install this image and use it.

Install Docker

You first need to install docker tools on your Ubuntu system. You can install Docker on a lot of recent Linux flavors or in a Linux Virtual Machine ( with Boot2docker for example).

sudo apt install docker.io

Get the image and run the container

The following commands use docker to download the image `from the docker registry <https://index.docker.io/u/oslandia/pggis/>`_ (~1GB) and then run it in the foreground. If you omit the **pull** command and try to run it directly, Docker will look for the image on the Docker registry if not already present locally.

sudo docker.io pull oslandia/pggis
sudo docker.io run --rm -P --name pggis_test oslandia/pggis /sbin/my_init

Your container is initiated from the oslandia/pggis image. It is named pggis_test, runs in the foreground (default), will redirect all exposed ports to a host port ( -P ), and will delete all created filesystem and container image when the container exits ( -rm ). It will run the /sbin/my_init startup script of baseimage to launch all necessary processes inside the container.

Now PostgreSQL runs in your container with all extensions activated, and a new database created just for you.

Connect to the database

Assuming you have the postgresql-client installed on your host, you can use the host-mapped port to connect to the database. You need to use **docker.io ps** to find out what local host port the container is mapped to first:

$ sudo docker.io ps
CONTAINER ID        IMAGE                   COMMAND                CREATED             STATUS              PORTS                     NAMES
75fec271dc5e        oslandia/pggis:latest   /sbin/my_init       51 seconds ago      Up 50 seconds       0.0.0.0:49154->5432/tcp   pggis_test

You can see that te container’s port 5432 has been mapped to the host’s port 49154 in this case (yours may differ). We can now connect to the database server through this host port. A pggis user has been created (with pggis as password) and a corresponding database with extensions activated.

$ psql -h localhost -p 49154 -d pggis -U pggis --password

Enjoy GIS features

You are now ready to use the database. Inside the psql console, you can check that everything is in order with queries like below.

pggis=# SELECT postgis_version();
postgis_version
---------------------------------------
2.1 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
(1 row)

pggis=# SELECT st_astext(st_makepoint(random() * 100, random() * 100)) from generate_series(1, 10);
st_astext
-------------------------------------------
POINT(90.1295741088688 97.7623191196471)
POINT(79.2798819020391 0.146342813968658)
POINT(56.9643571972847 50.8249804843217)
POINT(14.1903728246689 26.436754828319)
POINT(46.5733857825398 39.7641783114523)
POINT(89.1805129591376 36.6665146779269)
POINT(17.3397121019661 20.6000158563256)
POINT(11.5902675781399 93.6640297528356)
POINT(96.0820932406932 53.891480108723)
POINT(88.9889035373926 15.435611223802)
(10 rows)

Describing all features of installed components would require a lot more posts, coming later. If you are interested in learning more, Oslandia can provide training, assistance and support.

Warning

Do not use this in production !

The PostgreSQL configuration in this image opens the database connection without any IP restriction for the pggis super-user. The default pggis password is very weak too. This should NOT BE USED IN PRODUCTION, and you should adapt the configuration to your specific setup and security level. See below to rebuild the image with different configuration. In a production environment, you would also want to run the container in the background ( -d option) and keep the container filesystem and images after running ( no –rm option ).

How it works

This image is a Docker image, which uses Linux Containers (LXC). The base image is `Phusion baseimage <http://phusion.github.io/baseimage-docker>`_ , which it itself a Ubuntu 14.04 (Trusty). It is is built using a Dockerfile (see the source on Github). This Dockerfile is used by Docker to build the pggis image, by processing all installation, download, compilation and setup of the various components. The result is a Docker image, which is then published on docker.io.

You can either directly download and use the image from docker.io as explained above, or rebuild it from the Dockerfile in the git repository like below.

sudo apt-get install docker.io

Then clone the GitHub repository to get the Dockerfile :

git clone https://github.com/vpicavet/docker-pggis.git

cd docker-pggis

Then you can build the image, using the Dockerfile in the current repository.

sudo docker.io build -t oslandia/pggis .

Note that building the image requires quite a lot of download from internet, as well as enough RAM and CPU. Compilations are achieved along the way, with make -j3, thus enabling parallel compilation. You should adapt this value according to the number of CPUs you have.

Once you have built the image, you can run a container based on it just like above.

sudo docker.io run --rm -P --name pggis_test oslandia/pggis /sbin/my_init

If you want to change how the container works, or the default setup, you can edit the Dockerfile and rebuild the image. Docker caches every step of the build process (after each RUN command), so rebuilding can be very fast.

If you have a docker.io account, you can upload the image to the image registry like this

    sudo docker.io login
    sudo docker.io push oslandia/pggis

Conclusion

This is a first dive into Docker, with a spatial GIS database container. There is a lot more you can do with Docker, and plenty of use cases for this spatial database setup. This Docker image, the setup and the way it is built can be improved a lot as well, and will certainly change along with package availability, ppa repositories, next versions of components, better Docker practices and versions, but it is already a good and efficient way to get the stack up quickly. Do not hesitate to fork the GitHub repository  and send Pull Requests.

You can also report any issue on GitHub https://github.com/vpicavet/docker-pggis/issues

If you want to go further, or if you have any remark concerning the tools covered in this blog post, do not hesitate to contact us : infos+pggis@oslandia.com