This article is written by Rob Martin, and was originally published in the August 2013 issue of the Software Developer's Journal. You can find more articles at the SDJ website.
The
problem
Like
many people, I confused the Heisenberg Uncertainty Principle with the
Observer Effect. The Heisenberg Uncertainty Principle asserts that we
cannot accurately measure pairs of physical properties of particles.
That is, if we know one value, the other is unknowable. This is best
illustrated by the story of Heisenberg being pulled over by a police
officer. The officer asks Heisenberg if he knows how fast he was
driving. No, but I know where I am, says Heisenberg. The
officer says, Sir, you were driving 76 miles per hour.
Heisenberg replies, Great. Now I'm lost.
On
the other hand, the observer effect describes the impact on a system
of measuring things within that system. A common example is measuring
the pressure in a tire. It's hard to do without having a bit of air
leak out, thereby affecting the pressure of the tire. The more we
measure, the flatter the tire gets.
Instruments
that measure code performance are subject to the observer effect too.
The mere process of inserting code (or reflecting on existing code)
to measure our application's performance will also affect the
performance of the application. If we naively code against a
cloud-based metrics tool such as Librato or New Relic, we risk
significant impact on our performance waiting for blocking I/O as we
record the measurement to a remote location.
I
needed a library for measuring code performance, and I wanted to
minimize the observer effect. This problem suggested an asynchronous
programming pattern that records in real time, but stores the result
within another thread or process.
ZeroMQ
Enter
ZeroMQ.
On
the surface, it would seem that ZeroMQ is a very fast ("zero"
time) message queue, but I find that a bit of a misnomer. It's not a
message broker like RabbitMQ. It doesn't support the Advanced Message
Queuing Protocol (AMPQ). There's no management interface. It doesn't
persist messages to a disk, and if you don't have a subscriber, all
of the publisher's messages are dropped by default. You can't inspect
messages or get statistics on the queue - at least not without
writing your own management layer.
ZeroMQ
is more like a brilliant and fast socket library with built-in
support for a wide variety of asynchronous patterns. This makes
ZeroMQ an ideal message dispatcher when you don't need complex broker
features. For my metrics library, I didn't need those features, but I
did need speed.
Building a code
instrumentation library
The
features I wanted were simple:
Easy
instrumentation for timing and counting.
Highly
efficient operation within the instrumented code.
The
ability to instrument multiple programs, processes, and threads in
one common system. These processes are the publishers of metrics.
Support
for multiple back-end systems to consume the metrics. These
processes are the subscribers to the metrics.
Because
ZeroMQ was my message dispatcher and Librato.com was my primary
target for recording and aggregating these metrics, I chose the
portmanteau Zibrato as my project name.
The Zibrato
library for code instrumentation
Before
we get into the architecture of the library, let's take a quick look
at the API. The Zibrato library provides three ways to instrument
your Python code:
Timers:
Zibrato provides a decorator that can time any defined function or
method, and a context manager that works with any code block.
from zibrato import Zibrato
z = Zibrato()
# decorated function
@z.time_me(level = 'debug', name = 'myfunct_timer', source = 'myprog')
def myfunct():
time_consuming_operations()
# context manager
with z.Time_me(level = 'debug', name = 'timer_name'):
slow_function_to_time()
Counters:
Zibrato provides a counter method as a decorator or as a context
manager.
from zibrato import Zibrato
z = Zibrato()
# decorated function
@z.count_me(level = 'info', name = 'myfunct_counter', value = 5) # inc by 5
def myfunctc():
pass
# context manager
with z.Count_me(level = 'info', name = 'counter_name', source = 'deathstar'):
pass
Gauges:
Finally, Zibrato can be used to insert an arbitrary value into the
backend at any point in the code.
from zibrato import Zibrato
z = Zibrato()
# Zibrato gauge
z.gauge(level = 'crit', name = 'gauge_name', value=123)
This
is just a quick overview of how Zibrato is used to instrument code.
For more information, check out the library at Pypi
(https://pypi.python.org/pypi/Zibrato) or look at my Github.com
repository (https://github.com/version2beta/zibrato).
The architecture
In
order to accomplish the goals behind the API, Zibrato is divided into
three parts:
The
Zibrato library, which implements the API described above and
publishes metrics to the message queue. As a user, I can have zero
or more instrumented processes all communicating with my message
queue.
A
message broker, which subscribes to zero or more publishers of
metrics and in turn republishes the metrics to zero or more backend
subscribers.
Zero
or more backend providers, which subscribe to the message broker to
receive metrics, then in turn do whatever is appropriate with them.
In my application, the backend provider sends the messages to my
Librato account.
This
is referred to as an "extended pubsub pattern". The message
broker (sometimes called a message bus) is core to this topology. It
provides support for multiple publishers, and filters which messages
the backend providers will receive.
To
implement Zibrato on a server, I generally use supervisord to start
the broker and my Librato backend. Then any code that is instrumented
can connect to the broker, and the backend will receive whatever
messages meets its filter and forward them to Librato. The backend
aggregates messages and performs the blocking I/O portion of the
work, communicating across the network to Librato.com, in a
completely seperate process from the instrumented code.
Using ZeroMQ
Clearly,
ZeroMQ is the special sauce in the Zibrato architecture, doing the
heavy lifting of messages from the instrumented code to the backend
providers. It provides the asynchronicity.
If
you don't already have ZeroMQ installed, it's easy to do with pip:
pip install --upgrade python-dev pyzmq
If
you're running Anaconda Python from Continuum Analytics, pyzmq is
already installed.
Creating a message
broker
Our
broker subscribes to publishers and then publishes to subscribers.
ZeroMQ refers to this type of device as a "forwarder". It's
fairly trivial to do this in Python:
First,
we create a ZeroMQ context, which is basically a thread-safe
container for our sockets that allows us to cleanly shut everything
down when we're done with it.
Then
we create a TCP socket to serve as a subscriber to the instrumented
code. This could have been done using a Unix domain socket (similar
to a named pipe) or a PGM multicast IP socket, but I wanted the
ability to connect to the broker on a specific IP address and port,
even if the broker is running on a different server from the
instrumented code. Potential gotcha: by default, a subscriber
filters out all messages, so we have to tell it what messages to
receive. An empty string tells it to receive all messages.
Next,
we create a TCP socket to serve as a publisher for the backends.
Again, I made a design decision allowing backends to run on separate
servers.
Finally,
we combine our subscriber socket and our publisher socket into a
ZeroMQ forwarder device.
Here's
the code.
import zmq
# Get the ZeroMQ Context
context = zmq.Context()
# Create a subscriber
subscriber = context.socket(zmq.SUB)
subscriber.bind('tcp://127.0.0.1:5550')
# Subscribe to all messages
subscriber.setsockopt(zmq.SUBSCRIBE, '')
# Create a publisher
publisher = context.socket(zmq.PUB)
publisher.bind('tcp://127.0.0.1:5551')
# Combine the subscriber and the publisher into a forwarder
<code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><code class="western"><span style="white-space: pre;"> zmq.device(zmq.FORWARDER, subscriber, publisher) </span>
This
code gets us started. Now any ZeroMQ publisher written in any
programming language running on localhost can send messages to our
broker. Likewise any subscriber on localhost can receive those
messages. If instead of 127.0.0.1 we created our sockets on an IP
address accessible over the network, the broker would be able to
connect to publishers and subscribers on other machines, too.
Building a
subscriber
A
broker without any listeners has little purpose in life. Let's create
a simple subscriber that receives messages and prints them to
standard output. We'll create this in a separate Python script - it
runs separately from the broker.
Here's
how to create a subscriber:
First,
we connect to our ZeroMQ context described above.
Then
we create our listener's socket.
Next,
we set a filter to which our socket subscribes. Our call to receive
messages from the message broker will only return values that start
with this filter. An empty string subscribes to all messages, but
the default is to subscribe to no messages so we have to set it to
something, even if it's an empty string.
Finally,
we set up a loop to keep on listening until something comes in.
Our
code might look like this:
import zmq
# Get the ZeroMQ context
context = zmq.Context()
# Create a socket
socket = context.socket(zmq.SUB)
socket.connect('tcp://127.0.0.1:5551')
# Subscribe to all messages
socket.setsockopt(zmq.SUBSCRIBE, '')
# Keep on keeping on
while True:
print socket.recv()
This
code gives us a listener that will receive anything the broker
forwards and print it to STDOUT. Now all we need is someone who will
give the broker some messages to forward.
Building a
publisher
If
a tree falls in the forest and there's no one to hear it, does it
make any noise? I don't know the answer to that question, but I do
know that a message broker without any publishers lives a pretty
quiet life.
Connecting
our broker to a publisher is easy to do. In a third Python script,
we'll create a publisher that sends messages to the broker. If we do
it right, the subscriber we created in the last section will receive
these messages and print them to STDOUT.
Here's
how to build a publisher:
First,
we connect to our ZeroMQ context, just like we did above.
Next,
we create our publisher socket.
Finally,
we send a message from our publisher to the message broker.
Here's
the code:
import zmq
# Get the ZeroMQ context
context = zmq.Context()
# Create a socket
socket = context.socket(zmq.SUB)
socket.connect('tcp://127.0.0.1:5551')
# Subscribe to all messages
socket.setsockopt(zmq.SUBSCRIBE, '')
# Keep on keeping on
while True:
print socket.recv()
With
these three components, we have a publisher that sends messages, a
subscriber that receives messages, and a broker that forwards
messages from any connected publisher to any connected subscriber.
Putting it all
together
Zibrato
is based on the extended PubSub pattern described above, and uses the
same three basic components.
On
the front end, we have the Zibrato class that provides the
instrumentation methods. This is our ZeroMQ publisher. The code
itself is very light, and all it needs to do is get the message to
our broker, so the net impact on performance is very low.
In
the middle, we have a broker coded much like the example above. It is
a simple forwarder that accepts connections from multiple publishers
and forwards messages to multiple subscribers.
On
the back end, we have a library that implements the Librato HTTP API
using the Python Requests library to store the metrics we've
recorded. It flushes the data to Librato on a set schedule, including
rolling up the counters. The Librato backend inherits from a standard
backend class, so it's easy to implement other kinds of backends too
- like simply outputting to a log file or sending to a central Statsd
server.
Credits
Isaac
Newton once said If I have seen further it is by standing on the
shoulders of giants. At best I squat and risk falling off the
giants' shoulders, so I prefer the earlier quote from Isaiah di
Trani, who said Who sees further a dwarf or a giant? Surely a
giant for his eyes are situated at a higher level than those of the
dwarf. But if the dwarf is placed on the shoulders of the giant who
sees further?... So too we are dwarfs astride the shoulders of
giants. We master their wisdom and move beyond it.
ZeroMQ
is the brilliant work of Pieter Hintjens and iMatix Corporation. It
is a powerful and flexible messaging platform and I highly recommend
it for asynchronous applications. I also recommend reading Pieter's
ZeroMQ Guide. It's lengthy and comprehensive, but it's quite
accessible and even an enjoyable read.
I
first became aware of Librato on the Ruby Rogues podcast #62
featuring Joe Ruscio, Librato's CTO and cofounder. They've done an
excellent job of making metrics easy. Librato offers free development
accounts and a free month of production with very reasonable pricing
thereafter.
Zibrato
was initially inspired by Etsy's Statsd package, a Node.js service
that, coupled with Graphite (written in Python and Django), provides
a full asynchronous instrumentation stack. On a related note, check
out Steve Ivy. He has not only written a Python library for
interfacing with Statsd, he's also reimplemented it in Python.
I
use Kenneth Reitz' Request library: HTTP for Humans. This library
makes web interactions painless.
My
testing setup is greatly benefitted from Gary Bernhardt's Expecter
package, and in the future I'll probably refactor my tests to also
use his Dingus library. Since I first learned BDD in Ruby, Gary was
very helpful in bridging my knowledge gap from Ruby to Python.
I've
become increasingly impressed by base version of Continuum Analytics'
Anaconda Python. It now goes on each of my development machines and
virtual development environments.
Special
thanks to Tracy Harms (https://twitter.com/kaleidic) who spent
several days pair programming with me on the Zibrato project. His
feedback and insight were invaluable.
~~~
Bio:
Rob Martin is a developer at i.TV in Provo, Utah, USA. One of the
cooler parts of his job is that he's expected to learn every language
used in their stack. Before i.TV, he's done Ruby at a Python shop,
Python at a PHP house, and Perl on the factory floor.
Rob
Martin is version2beta most places online. Follow him on Twitter
(@version2beta), Github.com (https://github.com/Version2beta), and on
his blog (http://version2beta.com/). i.TV
is hiring. Email rob@version2beta.com for more information.
Upcoming issues
If you're interested in upcoming issues please check our website. You can see for example a table of content of our two in one new Python pack. Last call! Python In a Few Lines of Codes and Python Starter Kit.
History
Keep a running update of any changes or improvements you've made here.