My articles about ZeroMQ
Introduction
ZeroMQ defines a Protocol called ZMTP (ZeroMQ Message Transport Protocol) as a transport layer protocol for exchanging messages between two peers over a connected transport layer such as TCP (see RFCs: 23/ZMTP and 37/ZMTP).
A Protocol is a set of rules (from Greek protocollon, first leaf glued to a manuscript describing its consents) that governs data exchanging between two end points. Each protocol has its own set of rules of how data is formatted, when to send, how to manage it once received, etc.
In this article, we will explain the ZMTP protocol using a simple ZeroMQ Push-Pull example. In order to capture the data exchanged between the two sockets, we will use Wireshark.
Wireshark
Wireshark is a free and open source network protocol analyzer. It captures and decodes network packets. One of its intended purposes is to learn network protocol internals.
Wireshark's decoding process uses dissectors that can identify and display protocol's fields and values into human-readable format frames. It supports thousands of dissectors that parse and decode common protocols.
The following Wireshark screenshot shows captured packets exchanged between a ZeroMQ Push and Pull sockets.
The main window is composed of three panes:
- Packet List Pane: displays a summary of captured packets. By clicking on a packet in this pane, the contents of other two panes change.
- Packet Details Pan: displays the protocols and protocol's fields of the selected packet in the packet list pane.
- Packet Bytes Pane: displays the content of the selected packet in both hexadecimal and ASCII format and highlights the field selected in the details pane.
The protocol hierarchy in the above Wireshark screenshot is Ethernet-Internet Protocol-Transmission Control Protocol. In each protocol level, the protocol dissector decode its part of the protocol and it passes the data on to the lowest-level data dissector.
One of the key strengths of Wireshark is the ability to add new custom dissectors to it, either as plugins or built into the source.
We notice that our socket message's data displayed in the TCP level is not decoded because the ZMTP dissector has not yet been installed. The ZMTP dissector (zmtp-wireshark) is a Wireshark plugin written in Lua and supports ZMTP 3.0 and later.
After installing this dissector, here is the same conversation between the Push and Pull sockets as shown above but with ZMTP dissector this time.
We notice that the packets in the “Packet List” pane has been decoded and displayed ZMTP informations in the columns. We notice also how this dissector decoded the ZMTP data and displayed its fields in a readable and elegant format in the “Packet Details” pane.
We will explain each ZMTP element in the following sections.
PUSH-PULL Pattern Example
In this example, we have one push socket connected to one pull socket.
Here is the code:
#include "zhelpers.hpp"
int main() {
zmq::context_t context(1);
zmq::socket_t server (context, ZMQ_PUSH);
server.bind("tcp://*:5557");
zmq::socket_t client (context, ZMQ_PULL);
client.connect("tcp://localhost:5557");
s_send (server, "My Message");
std::string msg = s_recv (client);
std::cout << "Received: " << msg << std::endl;
server.close();
client.close();
context.close();
}
The Push
socket will send a single message to Pull
socket.
Capturing Packets
Start Wireshark and add 'zmtp
' filter to filter only the zmpt
packets, then run the Push-Pull example. The captured packets are shown in the following screenshot:
The ZMTP dissector has recognized its packets, decoded and displayed them in a readable format in the 'Packet List' and 'Packet Details' panes.
ZMTP packets exchanged between two sockets are called 'connection'. A 'connection' is composed of three groups of packets: greeting, handshake and traffic as shown in the above screenshot.
The ABNF grammar that defines ZMTP is:
zmtp = *connection
connection = greeting handshake traffic
Now, let us examine each packet:
Greeting
A greeting is composed of 64 octets containing data sent by peers in order to agree on version and security mechanism of the connection.
A greeting consists of signature, version, mechanism, as-server and filler. The ABNF grammar that defines the greeting is:
greeting = signature version mechanism as-server filler
Each greeting displayed by the dissector is composed (in this example) of three packets exchanged between the two sockets, to see these packets:
- Click on any packet with right hand mouse button (in 'Packet List' pane) and select 'Follow TCP Stream' from the context menu.
- Select 'Hex Dump' radio button.
In the 'Follow TCP Stream' dialog box, the packets data are colored with blue and red. The blue color indicates packets from Push to Pull socket while packets from Pull to Push socket is marked in red.
Signature & Version
The ZMTP signature is 10 bytes followed by the ZMTP version of two bytes. The ABNF grammar that defines the signature & version is:
signature = %xFF padding %x7F
padding = 8OCTET
version = version-major version-minor
version-major = %x03
version-minor = %x00
The ZMTP signature and version are partial parts of the greeting enabling a peer to detect and work with older versions of the protocol, that means a peer may downgrade its protocol to talk to a lower protocol peer. But, if a peer cannot downgrade its protocol to match its peer, it will close the connection.
In our example, the ZMTP version used by the two peers is 3.0 (major = 3, minor = 0).
Noting that padding field in the signature may be used for older protocol detection.
Mechanism
The security mechanism is an ASCII string null
-padded as needed to fit 20 octets. The ABNF grammar that defines the security mechanism is:
mechanism = 20mechanism-char
mechanism-char = "A"-"Z" | DIGIT | "-" | "_" | "." | "+" | %x0
The security mechanism ensures for a peer the identity of the other peer it talks to, so that messages cannot be tampered with, nor inspected, by third parties. The security mechanism defines also the handshake phase which is composed of some packets exchanged between peers after greetings. If a peer receives a security mechanism that does not exactly match the security mechanism it sent, it will close the connection.
In our example, the sockets define no security mechanism “NULL
”, that means there is no authentication and no confidentiality.
The “NULL
” security mechanism defines one command exchanged between the peers forming the handshaking phase which we will see later in this article.
AS-Server
The 'as-server
' is composed of one byte. The ABNF grammar that defines the as-server is:
as-server = %x00 | %x01
The “as-server
” indicates if the peer is acting as a server (value is 1) or as a client (value is 0). These values are defined by the security mechanism and they are not related to socket bind/connection direction (for example in the 'PLAIN
' security mechanism the peer defined as a client authenticates itself to the peer defined as server by sending a HELLO
command. The server accepts or rejects this authentication).
The NULL
security mechanism dose not specify a client and a server topology, so “as-server” field should be always zero for all peers.
Filler
The “filler
” extends the greeting to 64 octets with zeros and its grammar is:
filler = 31%x00
Handshake
Framing Data
After greetings, all data is sent as frames. A frame consists of a flags field (1 octet), followed by a size field (one octet or eight octets) and a frame body of size octets. The size does not include the flags field, nor itself, so an empty frame has a size of zero.
The flags consists of a single octet containing various control flags:
Examples of frames are discussed in following sections.
Handshake is composed of zero or more commands defined by the security mechanism in the greetings. A command is a single long or short frame. The ABNF grammar that defines any command is:
command = command-size command-body
command-size = %x04 short-size | %x06 long-size
short-size = OCTET ; Body is 0 to 255 octets
long-size = 8OCTET ; Body is 0 to 2^63-1 octets
command-body = command-name command-data
command-name = OCTET 1*255command-name-char
command-name-char = ALPHA
command-data = *OCTET
Handshake is an extension protocol allowing peers to create a secure connection. If the security handshake is successful, the peers continue the discussion, otherwise one or both peers closes the connection.
We can see that the rule “command-size
” in the above grammar starts either by 0x04 or by 0x06 which represents the flags field of a frame. The flags 0x04 has only Bit2 set to 1, which means that this frame is a single short command frame. The flags 0x06 has Bit1 and Bit2 set to 1, which means that this frame is a single large frame.
The NULL
security mechanism defines a READY
command exchanged between peers. A READY
command consists of a list of properties. Each property consists of a name-value pair.
The ABNF grammar that defines the NULL
security mechanism is:
null = ready *message | error
ready = command-size %d5 "READY" metadata
command-size = %x04 short-size | %x06 long-size
short-size = OCTET ; Body is 0 to 255 octets
long-size = 8OCTET ; Body is 0 to 2^63-1 octets
metadata = *property
property = name value
name = OCTET 1*255name-char
name-char = ALPHA | DIGIT | "-" | "_" | "." | "+"
value = 4OCTET *OCTET ; Size in network byte order
The READY
command in our example contains a property named “Socket-Type
” which defines the sender's socket type. The value of this property is “PUSH
” when the push socket sends this command and “PULL
” when the pull socket sends it.
A peer validates that the other peer is using a valid socket type (valid combination of sockets). In our example, Push peer validates that the other peer has a Pull socket type and vice versa. If the validation is not succeeded, then the connection will be closed.
Traffic
A traffic consists of commands and messages intermixed.
The ABNF grammar that defines the traffic is:
traffic = *(command | message)
The grammar of “command
” is already defined above, and here is the ABNF grammar of a message:
message = *message-more message-last
message-more = ( %x01 short-size | %x03 long-size ) message-body
message-last = ( %x00 short-size | %x02 long-size ) message-body
short-size = OCTET ; Body is 0 to 255 octets
long-size = 8OCTET ; Body is 0 to 2^63-1 octets
message-body= *OCTET
The flags
byte of the message frame is defined in the rules “message-more
” and “message-last
”. This field can take four values:
0x00: indicates that this frames is a short last-frame message
0x02: indicates that this frames is a long last-frame message
0x01: indicates that this frames is a short more-frame message
0x03: indicates that this frames is a long more-frame message
In our example, the traffic consists of one message:
Now, we will send a multi frame message. The first frame is a long message (its length > 255 bytes) and the second frame is a short message (the same message that we sent before).
s_sendmore (server, std::string(256, 'a'));
s_send (server, "My Message");
In the above screenshot, I didn't display all long message bytes in the 'Packet Bytes
” pane. We notice that the first frame's flags indicate that this frame is followed by another one (More) and it's a long frame (bit 1 is set to one). The payload length is encoded as a 64-bit unsigned integer (8 bytes) because it's greater that 255 bytes.
The second frame is followed directly after the first one. Its flags indicate that it's the last frame (bit 0 is set to zero) and that it's a short frame (bit 1 is set to zero) since the payload length is less than 256 bytes.
Conclusion
ZMTP is a protocol that governs data exchanging between ZeroMQ sockets. It defines a certain number of rules: protocol version, security mechanism, defining discrete messages (frames), metadata (single/multi frames, short/long message), etc.