Introduction
This short tip will tell you how you can create your own postal-code database(MS SQL) for the Netherlands. This database will use the official national addresses and buildings(BAG) files as a source. This tip is just a walk-through and contains no source code (yet). Be warned that this process will take up to 6 hours on a 'regular' desktop computer with slow hard disks. Excuse all the Dutch in the pictures and names but this is due to the nature of the tip.
Background
Due to past legal restrictions, the Dutch postal-code collection with addresses
was only available under licence for quite some money (€2730). But since 2012, it is now legal to use the Dutch governmental registration of addresses and buildings as a source. Although some nice websites have come up with postal-code lookups like www.postcodeapi.nu, it seems that no one ever explains how they convert the raw BAG data to something useful...and usable for Windows users. ;-)
Steps to Import the Data
1. The Source
To be able to do anything, we must first get the raw data. This can be achieved by going to the government website nationaalgeoregister.nl and searching for the keyword 'adressen
'. Then you will find a link to the atom feed with a big zipfile (1.5 gig) located at: http://geodata.nationaalgeoregister.nl/inspireadressen/extract/inspireadressen.zip of course you should download this file on to a local hard disk.
2. Indexing and Generating shapefile
Next, we need to index and rearrange the data. This is done with a 3rd party tool called: Bag-extract from the company Geon that can be found here. This program can be executed and used in the 'freeware' mode for our purposes. The program is actually built to handle this entire importing process into a database, but due to restrictions or problems, it does not generate the required data and or format.
Import the zip file on step 1 of the UI, and select the checkbox to extract everything from the file.
Press the step 2 button to extract and index the data. Be sure to choose a location with enough free space, the process will generate a lot of 20MB XML files that will take around 50GB of data! This will take about an hour and a half or so to be ready.
As I mentioned before, the data cannot be directly imported usefully into a SQL server instance with the 3rd option from this tool... but it can generate shapefiles with all the data. The shapefile will then be imported with another tool to SQL Server.
To export the data to a shapefile, choose the 3rd UI button 'Convert selection' (and be sure you have still selected all the regions). Choose the shapefile export and choose a location with about 15Gb of free disc-space.
Please be sure to first change the options before you export:
- Shapemaximum: UPDATE! Set this to a maximum value of 4000000 (due to import problems of the latest version) you will end up with 3 shapefile .dbf database files to import (this not a problem but it requires you to import the 3 shapefile *.dbf separately in the next step).
- Projection: Select the projection you want to use: WD is default but is useless for map purposes. You rather want to choose the WGS84 projection if you want to do anything online with the coordinates.
3. Import the shapefile into MS SQL
First be sure that you have an instance of SQL server available form your computer and expect about 4Gb of data to be transferred (9million records). I have a local instance running of MS SQL Server Express 2012 (this is the free edition) which is more than suitable for this purpose. Create a new database that can be filled with the data, I called mine shapeimport. Use this name if you want to use the next script without modifying.
Okay, now you have a folder with a lot of shapefiles of enormous size and we need to import the file called nummer.shp (and thus nummer.dbf) into our database. Therefore, we use the 3rd party tool called SQL Server spatial tools from Morten Nielsen. You can read more on the program here.
Once you have downloaded the zip extract and run Shape2Sql.exe, choose the nummer.shp file, choose the connection to your local server and database, and set the geometry properties as shown here below.
After the first import that will take up to 2 hours, select the 2nd file mummer001.shp. Be sure you uncheck the "Replace existing table" and change the 'Table Name' value from nummer001 back to nummer.
Repeat this process until you have imported all the 3 nummerxxx.dbf files.
You are now the proud owner of a huge table with all the unstructured data from BAG.
4. Rearrange the Data
Right now, we're almost done.
Create a second database on your SQL server instance called postalcode and run the following SQL query on it. (It will take about half an hour - Check out the download link at the top of this page.)
This script generates useful tables and some views with all the data from the first database.
It will also generate a table with possible postal-code problems (postal-codes that are not correct identified by their erroneous location).
You will end up with a useful database at about 2Gb with all the postalcodes, streets, house numbers and city's from all building objects in the Netherlands with their corresponding geolocation. If you have any improvements and/or comments, you are welcome to post them!
This is my second article on CodeProject!
Some Dutch keywords: postcode, straten, huisnummers, BAG, importeren, Nederlands, postcode database
10-05-2014: Updated the Shapemaximum
value to 4000000
due to import problems, you now need to import 3 .dbf files