Introduction
When working with mathematical simulations or engineering problems, it is not unusual to handle curves that contains thousands of points. Usually, displaying all the points is not useful, a number of them will be rendered on the same pixel since the screen precision is finite. Hence, you use a lot of resource for nothing!
This article presents a fast 2D-line approximation algorithm based on the Douglas-Peucker algorithm (see [1]), well-known in the cartography community. It computes a hull, scaled by a tolerance factor, around the curve by choosing a minimum of key points. This algorithm has several advantages:
- It is fast!: For a n point curve, computation time of the approximation is proportional to nlog_2(n),
- You don't need a priori knowledge on the curve,
- The compression ratio can be easily tuned using the tolerance parameter.
The class has been integrated to a plotting library: Plot Graphic Library.
Douglas-Peucker (DP) Line Simplification Algorithm
The DP line simplification algorithm was originally proposed in [1]. John Hershberger and Jack Snoeyink have implemented it in C in [2] as a package named DPSimp
:
DPsimp
is a Douglas-Peucker line simplification algorithm implementation by John Hershberger and Jack Snoeyink. They analyze the line simplification algorithm reported by Douglas and Peucker and show that its worst case is quadratic in n, the number of input points. The algorithm is based on path hulls, that uses the geometric structure of the problem to attain a worst-case running time proportional to nlog_2(n), which is the best case of the Douglas algorithm.
The algorithm is based on a recursive construction of path hull, as depicted in the picture below. They did all the hard work (writing the base code in C), I ported it to C++ templates.
Modifications to DPSimp
DPSimp
was using a recursive call in the DP
method. This could lead to a stack overflow when the algorithm would go deep in recursion. To avoid this problem an internal stack has been added to the class to mimic the recursive function call without stack overflow.
Concepts and class design
Let the points
denote all the points of the original curve and the keys
the points from the original curve that are kept for the approximation.
The idea is that the user provides a container for the points
, denoted PointContainer
, and for the keys
, denoted KeyContainer
, and the link between those containers will be the line simplification class, denoted LineApproximator
.
How do we build a class hierarchy without restricting ourselves to particular container? In fact, one user might store it's points
in vector< pair<T,T>>
and another one in 2 separate vector<T>
. Of course, the same argument applies to the KeyContainer
.
A possible answer is templating. Passing the PointContainer
and KeyContainer
as template arguments for LineApproximator
allows to build the approximation class without specifying the containers, since the class is built at compilation time (We could write interface for those containers but in fact, I'm too lazy for that :) ).
With this idea in mind, here are the specifications of the container:
PointContainer
Let
Point
a structure, template or class that has x
,y
of type T
as member,
PointRef
a structure, template or class that has x
,y
of type T&
as member,
PointContainer
behaves like a vector of Point
:
- has
clear()
, size()
and resize()
methods,
- has random access iterator,
const_iterator
points to a structure similar to Point
,
iterator
points to a structure similar to PointRef
operator[] const
returns a Point
,
operator[]
returns a PointRef
A simple example of valid PointContainer
is
vector<Point>
However, a hybrid container has been developed to handle the case where x
and y
are in separate containers (See below).
KeyContainer
KeyContainer
behaves like a list of PointContainer::const_iterator
:
- has
size()
, clear()
methods,
- support
push_back
method
A simple example of valid KeyContainer
is
vector<PointContainer::const_iterator>
Again, a hydrid container to handle the case where the keys
must be outputted in separate containers is provided.
Templates
All the classes are templated to support float
and double
version of the algorithm.
The template TDPHull
is the user interface to the DP algorithm. However, it relies on a series of subclasses detailed below:
Name |
Description |
Use |
TLine<T, TPointContainer, TKeyContainer> |
2D Line template |
Points and keys |
TLineApproximator<T, TPointContainer, TKeyContainer> |
2D Line approximator base class |
Default interface to approximation algorithms |
TDPHull<T, TPointContainer, TKeyContainer> |
Implementing Douglas-Peukler algorithm |
User front end |
TPathHull<T, TPointContainer, TKeyContainer> |
Path hull |
Internal in TDPHull<T> |
TPoint<T> |
A pair of T : x, y |
Template for 2D point TLineApproximator<T> |
TPointRef<T> |
A pair of T& : x, y |
Template for 2D point TLineApproximator<T> |
SHomog |
2D Homogenous point, T x,y,w |
Internal structure to TLineApproximator<T> |
How to use TDPHull?
In the following examples, we adopt the following notations
using namespace hull;
using namespace std;
typedef vector<TPoint<float>> VectorPointContainer;
typedef vector<MyPointContainer::const_iterator> ListKeyContainer;
typedef TDPHull<float, VectorPointContainer, ListKeyContainer> CDPHullF;
CDPHullF dp;
The approximation methods throw exception, so you should always enclose them in a try
-catch
statement.
Normalization
The data points are, by default, normalized before approximation. This is in order to reduce numerical errors in the gradients computations. This happens when the data is badly scaled: using big numbers close together will lead to disastrous loss of significant numbers.
However, if you feel confident about your data, you can disable it by using SetNormalization
.
Handling points and keys
Get a reference to the point container and modify it:
TDPHull<float>::PointContainer& pc = dp.GetPoints();
for (UINT i=0;i<pc.size();i++)
{
pc[i].x=...;
pc[i].y=...;
}
If you are using normalization (default behavior), do not forget to re-compute the data bounding box after your changes:
dp.ComputeBoundingBox();
Approximation tuning
You can control the compression ratio by different ways:
- Setting the tolerance
dp.SetTol(0.5);
- Setting a compression ratio, an acceptable compression threshold:
try
{
dp.ShrinkNorm(0.1,0.05);
}
catch(TCHAR* str)
{
cerr<<str<<endl;
}
The method uses dichotomy to accelerate convergence.
- Setting the desired number of points, an acceptable number of points threshold:
try
{
dp.Shrink(100,5);
}
catch(TCHAR* str)
{
...
}
Simplifaction
The easiest part of the job:
try
{
dp.Simplify();
}
catch(TCHAR* str)
{
...
}
or by using ShrinkNorm, Shrink
methods.
Accessing the approximated curve
The keys are stored as PointContainer::const_iterator
. You can access the key container by using GetKeys
:
const TDPHull<float>::KeyContainer& kc = dp.GetKeys();
TDPHull<float>::KeyContainer::const_iterator it;
for (it = kc.begin(); it != kc.end(); it++)
{
xi=(*it)->x;
yi=(*it)->y;
}
Implementing your own algorithm
All you have to do is inherit a class from TLineApproximator
and override the function ComputeKeys
.
Hydrid containers
You can implement your own containers for points
and keys
as long as they meet the requirements.
Separate containers for x,y
It is not unusual to have x
,y
stored in separate containers and moreover these containers are not of the same type. To tackle this problem, two wrapper templates have been written: TPointDoubleContainer
and TKeyDoubleContainer
which serve as an interface between the approximation algorithms and the containers:
CVectorX vX;
CVectorY vY;
typedef TPointDoubleContainer<float, CVectorX,
CVectorY> HybridPointContainer;
CListX lKeyX;
CVectorY vKeyY;
typedef TKeyDoubleContainer<float, CListX, CVectorY> HybridKeyContainer;
TDPHull< float, HybridPointContainer, HydridKeyContainer> dp;
dp.GetPoints().SetContainers( &vX, &vY);
dp.GetKeys().SetContainers( &lKeyX, &vKeyY);
Using the demo
The demo shows a real time approximation of a curve by different algorithms.
- You can stop/start the animation using the toolbar buttons,
- You can modify the shrinking ration with the scroll bar,
- You can load your own data with the menu, File->Load Data Set. The file must be formatted with a pair
x
,y
per line.
Using it in your project
Insert the following files in your project and you're done.
LineApproximator.h,DPHull.h, PathHull.h
Known issues
- The original curve must not self-intersect. This means also that the start and end points must be different, no closed curve !
- Sick dataset and stack overflow: solved. The problem was due to recursion leading to stack overflow. It is solved now.
Update history
- 04-03-2003
- Got rid of DP recursion by adding an internal function call stack. Hence, the stack overflow problem is solved!!!
- Applied stuff I learned in Effective STL to the classes: using algorithms, functors, etc...
- Changed
class T
to typename T
- Better floating point comparison using
boost::close_at_tolerance
- 15-11-2002
- More and more templating,
- Detecting when curve is closed
- Hybrid containers
- Fixed bug in compute limits
- Added LOTS of
ASSERT
, so Debug version is significantly slower than release build
- 7-11-2002
- Fixed a bug in the
SLimits
structure (GetCenterY
)
TPathHull
working with iterators rather that SPoint*
- Added exception to handle errors
- Fixed a bug in
TDPHull::ComputeKeys
. Was using pc.end()
rather that pc.end()--
- 6-11-2002
- Added base class
TLineApproximator
- Added proposed algorithm by S.Rog: see
TKeyFramer
, TGlobalKeyFramer
, TDispKeyFramer
, TVectKeyFramer
- Updated demo
- 3-11-2002
- Added data normalization for better numerical behavior. Avoids the algorithm to crash when using badly conditioned data. Problem submitted by Corey W.
- Templated version
- Got rid of macro and rewrote in more OOP style
References
- D. H. Douglas and T. K. Peucker. Algorithms for the reduction of the number of points required to represent a line or its caricature. The Canadian Cartographer, 10(2):112--122, 1973.
- J. Hershberger and J. Snoeyink. Speeding up the Douglas-Peucker line simplification algorithm. In Proc. 5th Intl. Symp. Spatial Data Handling. IGU Commission on GIS, pages 134--143, 1992. (home page).