The annual OSGeo Codesprint took place from 23d to 26th of February in Paris, at Mozilla’s Foundation offices. Here is a summary of some achievements done during this event by the PostGIS team.
A big team of PostGIS developers worked together during the sprint on numerous issues.
New functions and dev guide
Remi Cura from IGN had prepared a full list of improvements to existing PostGIS functions and new features to develop. Together with Marc Ducobu ( Champs Libres ) , they managed to go all the way from simple PostGIS users to PostGIS developers. They dived into the code and managed to write clean patches for functions and documentation.
They documented their work and created a step-by-step guide for new PostGIS contributors.
New clustering features
New features have been introduced for PostGIS 2.3, among them clustering methods : ST_ClusterKMeans() and ST_ClusterDBSCAN() by Paul Ramsey and Dan Baston. These are really nice Window functions for PostGIS !
On a side note, the documentation for PostGIS Window functions has been improved too ( R. Obe )
OGR ODBC FDW
Regina Obe managed to travel to Paris, and she put efforts analyzing why the ODBC performances of the ogr_fdw PostGIS foreign data wrapper were not as good as expected. This should lead to performance improvement later on. She also worked on problems encountered when restoring PostGIS data ( especially rasters), related to how the search_path is handled. There is still some work to experiment with EVENT TRIGGERS, but the issue is on a good path to be solved.
3D in PostGIS : SFCGAL
Our team at Oslandia has been working on SFCGAL, the 3D library behind PostGIS 3D features. Vincent Mora and Hugo Mercier teamed with Mickael Borne ( IGN ) and Sébastien Loriot ( GeometryFactory ) to break things and rebuild them, better. They focused mainly on performances:
- different computation and precision kernels in CGAL, to use them efficiently and be much faster
- set a flag for valid geometries, so as not to re-test for validity in every operation
- lower serialization overhead when passing data between PostGIS, CGAL and SFCGAL
This effort lead to significant improvement in speed. Preliminary tests on 3D intersection showed improvement from 15 seconds down to 100 milliseconds, which is impressive result.
A lot of refactoring of the code has to be done, and this work also started, to simplify and ease the use of SFCGAL.
New indexes for big data
Another significant contribution is the work on BRIN indexes for PostGIS. At PGConf Europe, we already spoke with the team at Dalibo about the new index type in PostgreSQL 9.5 : BRIN use cases and implementation in a GIS context. Some time before OSGeo code sprint, we realized that Giuseppe Broccolo from 2ndQuadrant had started a prototype implementation, so we invited him to join. Giuseppe teamed with Ronan Dunklau and Julien Rouhaud from Dalibo, and together they managed to have a working implementation of this new type of indexes for PostGIS.
Having the opportunity for PostgreSQL developers to meet in this geospatial environment was the ideal way to get things done efficiently.
PostGIS BRIN will be detailled after some code consolidation and benchmarks, but here is the idea. BRIN indexes arepartial indexes: they deal with data blocks ( a given amount of pages ) and not rows. You can set the number of pages per block. This makes indexes much faster to build, and a lot smaller than classic indexes ( GiST for PostGIS ). Of course, the tradeoff is that they are slower to query. And since they group data in blocks, they loose their efficiency if the data is not clustered on disk.
Some of our use cases, with tables full of Billions of points ( using PgPointCloud ), are a good match for BRIN indexes : the data is naturally clustered spatially ( patches of points are loaded sequencially ), and sometimes the data is too big for a table’s index to fit entirely in memory. E.g. we have a 3TB table, resulting in a 20GB GiST index.
A BRIN index can be a really good compromise for this : preliminary tests on a medium-size dataset show a performance degradation of 5x in querying, while the index is 1000x smaller. And this tradeoff is adjustable, according to your dataset size and hardware configuration.
Other topics which have been worked on (mainly by Paul, see his blog post) :
- Expanded object header, to avoid serialization / deserialization steps
- A nasty upgrade bug ( 2.2 blocker )
You will find more details in the following reports :
- Paul Ramsey – http://blog.cleverelephant.ca/2016/03/paris-code-sprint-postgis-recap.html
- Giuseppe Broccolo – http://blog.2ndquadrant.com/brin-postgis-codesprint2016-paris/
- Regina Obe (PostgresOnLine) – http://www.postgresonline.com/journal/archives/363-Paris-OSGeo-Code-Sprint-2016-Highlights.html
- Regina Obe (BostonGIS) – http://www.bostongis.com/blog/index.php?/archives/252-Paris-OSGeo-Code-Sprint-2016-PostGIS-Highlights.html
Thanks to the PostGIS team for such hard work and for their reports !
Figures credits : Openstreetmap’s Soup, OpenScienceMap, Wikipedia