Introduction
In this tip we will learn how to find the closest latitude and longitude in an OpenStreetMap CSV file. This is useful because the OpenStreetMap databases are not complete and finding the closest node is important to a correct and accurate GPS location.
Background
Before starting we need to get a copy of any OSM database (note that the larger the database, especially the world database,
it will take longer to process).
- Download an OpenStreetMap database. OpenStreetMap databases are located at Download Databases.
Make sure a database with the extension of .PBF is downloaded.
- Next, a conversion tool is needed to take that .PBF database and convert it to a .CSV file.
Navigate to OSMConvert and download OSMConvert for Windows or Linux.
- Now that both files are downloaded, you have a database file and a converter application. The next step is creating a batch script to convert your database into a
.CSV database.
Create a new batch file with the following code:
@echo off
cls
osmconvert.exe connecticut.pbf --all-to-nodes --csv="@id
@lat @lon highway amenity shop name" -o=convert.csv
As you can see, I used the Connecticut database. Replace "connecticut.pbf" with the database you downloaded.
- After running this batch script, you will get an output file named convert.csv. Here is a small snippet of what your .CSV file should look like:
32658277 41.6970505 -72.7321095
60642404 41.3847854 -73.5581884
60642405 41.3848481 -73.5576026
60642406 41.3850997 -73.5558833
60642407 41.3852108 -73.5553076
60642408 41.3854561 -73.5541931
60642409 41.3855941 -73.5536402
60642410 41.3859120 -73.5525427
60642411 41.3860840 -73.5520105
60642412 41.3862672 -73.5514768
60642413 41.3864555 -73.5509534
60642415 41.3875006 -73.5480470
60642416 41.3876687 -73.5475638
60642418 41.3881994 -73.5459707
60642419 41.3885572 -73.5447182 motorway_junction
Toward the end, you will see much more data in the database.
1000000017180069 41.9893800 -72.6480980 primary North Main Street
1000000017180070 42.0450340 -73.4087070 service
1000000017180071 41.9395970 -72.6294340 primary North Main Street
1000000017180072 41.6089105 -72.8778610 primary North Main Street
1000000017180073 41.6758988 -72.9467286 secondary North Main Street
1000000017180074 42.0418960 -73.4147590 service
1000000017180075 41.7792640 -73.2191550 service
1000000017180076 41.9591465 -72.7224673 secondary North Main Street
1000000017180077 42.0349050 -73.4026730 service
1000000017180078 41.9325970 -72.6177340 secondary North Main Street
1000000017180079 41.6110220 -73.4773270 residential
1000000017180080 41.6106810 -73.4769270 residential
1000000017180081 41.6406441 -72.4741366 primary North Main Street
1000000017180082 42.0442280 -73.3870410 service
1000000017180083 41.5672993 -73.2173759 residential
1000000017180084 41.7951592 -72.5235614 primary North Main Street
1000000017180085 41.6649840 -73.2698040 residential
1000000017180086 41.7793959 -72.5473544 residential Ferndale Drive
If you are getting this type of data then you have successfully converted your database into a workable database.
Using the code
Before you begin reading how to use the code I highly suggest that if you don't like math, just skip down to "Implementing the Algorithm and Putting it all Together".
The main algorithm
The main algorithm is a very simple process to understand. You have a reference Latitude and Longitude and you have a different a set of different latitudes and longitudes.
By finding the distance between the reference latitude and longitude and the compared latitude and longitude, the closest distance is obviously the nearest location to the reference.
Here is a very broad and general code of the algorithm:
var rLat;
var rLon;
var distance;
for each line in file
var sLat;
var sLon;
var _distance = GetDistance();
if(_distance < distance)
distance = _distance
end if
next
Calculating the distance
Calculating the distance between the reference latitude and longitude and the compared latitude and longitude requires some higher level math.
To calculate the distance, Haversines formula is used.
a = sin²(Δφ/2) + cos(φ<sub>1</sub>).cos(φ<sub>2</sub>).sin²(Δλ/2)
c = 2.atan2(√a, √(1−a))
d = R.c
φ is latitude, λ is longitude, R is earth’s radius
The
Java implementation of this formula is as follows:
private static Double CalculateDistance(String LatLon1, String LatLon2) {
Double lat1 = Double.parseDouble(LatLon1.split(",")[0]);
Double lon1 = Double.parseDouble(LatLon1.split(",")[1]);
Double lat2 = Double.parseDouble(LatLon2.split(",")[0]);
Double lon2 = Double.parseDouble(LatLon2.split(",")[1]);
Double latDistance = toRad(lat2 - lat1);
Double lonDistance = toRad(lon2 - lon1);
Double a = Math.sin(latDistance / 2) * Math.sin(latDistance / 2)
+ Math.cos(toRad(lat1)) * Math.cos(toRad(lat2))
* Math.sin(lonDistance / 2) * Math.sin(lonDistance / 2);
Double c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));
Double distance = EarthRadius * c;
return distance;
}
Reading the file
Reading the .CSV file is the easiest part! Reading the file line by line
is the most efficient option because most computers do not have enough memory
to load the entire file into memory.
BufferedReader br = new BufferedReader(new FileReader("connecticut.csv"));
String line;
while ((line = br.readLine()) != null) {
}
br.close;
Replace the connecticut.csv to the file location of your .CSV database file.
Implementing the Algorithm and Putting it all Together
The implementation of the algorithm is very simple.
double srefDistance = 0.0;
String idsShortestLatLon = "";
String StartLatitudeLongitude = "41.47295453,-73.34532628";
BufferedReader br = new BufferedReader(
new FileReader("connecticut.csv"));
String line;
while ((line = br.readLine()) != null) {
String id = line.split("\t")[0].toString();
double lat = Double.parseDouble(line.split("\t")[1].toString());
double lon = Double.parseDouble(line.split("\t")[2].toString());
String sLatLon2 = lat + "," + lon;
double distance = CalculateDistance(StartLatitudeLongitude,
sLatLon2);
if (srefDistance == 0.0) {
System.out.println(srefDistance);
srefDistance = distance;
} else {
if (distance < srefDistance) {
System.out.println(distance);
srefDistance = distance;
idsShortestLatLon = id;
}
}
}
br.close();
And that's it!
Points of interest
- The database file is separated by tabs. In Java, representation of tabs is "\t".
- The following code returns the ID of the closest found latitude and longitude in the database.
ID LAT LAT
32658277 41.6970505 -72.7321095