I worked on a hardware debugger making use of the debugWIRE protocol, but couldn’t get it to work reliably. So, I worked on my own solution that led to the SingleWireSerial library. It satisfies the following three requirements: single-wire serial communication, extremely accurate and robust up to 125 kbps, and communication speed can be set at runtime.
Introduction
A new Arduino library has seen the light of day: SingleWireSerial. It supports single-wire, half-duplex serial communication. By using the input capture feature of the AVR MCUs, it is extremely accurate and supports bitrates up to 250 kbps robustly. And contrary to its title, one can even use it in a two-wire setting.
Background
Earlier this year, I worked on a hardware debugger making use of the debugWIRE protocol. But I couldn’t get it to work reliably. One problem seemed to be the serial communication using only one line (the RESET pin of the MCU). People had come up with different solutions, but none of them worked for me reliably. So, recently I set out to program my own solution that led to the SingleWireSerial
library. It satisfies the following three requirements:
- single-wire serial communication,
- extremely accurate and robust up to 125 kbps,
- communication speed can be set at runtime.
First, usually asynchronous serial communication is done using two wires. If one wants to restrict the communication to only one wire, this is possible. But then there should be a strict protocol about which party is allowed to transmit data. For instance, if a master is the one who sends requests or queries to which the slave responds, it is clear who is in the role of a sender at each point of time. This can all be solved on the software level. On the hardware level, one needs to come up with a solution that permits two parties to send and receive on one wire only. Or one addresses this problem at the software level too.
Second, if you want to transmit strings about the room temperature, then nobody gets annoyed when at some point of time the wrong temperature is displayed. When you are debugging a system and a wrong value is displayed because of a communication error, then one is, of course, upset. debugWIRE does unfortunately not have any form of error detection, so one has to rely on the fact that there is no communication error at all. Since the communication speed is set to the system clock divided by 128, at 16 MHz system clock the communication speed will be 125 kbps. So, we need error-free serial communication at 125 kbps!
Third, if the communication speed is known ahead at compile time, then picoUART is probably the best alternative. However, with debugWIRE, we only learn about the communication speed at runtime. Furthermore, we want to get as close as possible to the communication speed the MCU has that we want to debug, which might diverge from the standard ones because the clock speed is controlled by an internal RC oscillator.
Single-wire: Hardware or Software Solution?
You can create a single-wire serial solution basically in two ways. First, you can join the TX and RX lines somehow. Second, you have only one line from the beginning.
Joining TX and RX involves some external hardware. The Microchip application note AN2658 describes this in detail, employing two transistors–one inverter and one open collector driver. The same effect can more simply be achieved by having a pull-up resistor attached to the RX line and a diode between the RX and TX line with the anode at the RX line, as is sketched in the next picture.
Joining TX and RX
A high level on the TX line will have no effect, i.e., the common line will be pulled up by the resistor. A low level on the TX line will pull down the common line. However, depending on what type of diode one uses, it will only be pulled down to 0.3-0.7 volt. This should be enough for all practical purposes because CMOS chips detect a low level up to 0.3Vcc.
Using this little bit of extra hardware allows to employ a hardware UART. However, one annoying side-effect of this hardware solution is that all sent bytes are also received and need to be ignored.
The second kind of solution uses bit -banging, i.e., one controls one pin through software. By switching the direction of a pin between input and output (with a low level), one creates an open drain output, similar to the one above. The usual software UART libraries do not support this out of the box. However, it is not very difficult to adapt them. OnePinSerial is such an adaptation of SoftwareSerial for use in a debugWIRE debugger. However, it is quite brittle. When the millis interrupt is enabled, which it is in the described debugger, then OnePinSerial does not receive reliably at 125 kbps.
Input Capture and Output Compare Match
One problem with bit-banging UART solutions is that one has to rely on knowledge about what kind of code the compiler produces in order to generate the right timing. And code generation might be different for different compiler versions. For instance, in the SoftwareSerial
class, you find code that is compiled conditional on specific compiler versions. Of course, one could use inline assembly code, which gives you full control about which machine instruction is executed. However, a solution that implements run-time configurable communication speed using this technique sounds like a serious challenge to me.
A second problem is that interrupts, e.g., the millis interrupt, can confuse the timing when receiving a byte. The SoftwareSerial
class uses the pin change interrupt to detect the falling edge of the start bit. If the millis interrupt is raised just before the start bit comes in, then the receiving interrupt routine could in the worst case start 6.7 µs too late. For slower bit rates, this might be tolerable. However, for 125 kbps, the bit time is 8 µs and so one might easily miss a bit.
The AltSoftSerial library, which I had a look at in a previous blog post, uses a feature called input capture. This supports timestamping certain events, such as a falling edge. The value of a timer at this point of time is stored in the input capture register (ICR) and optionally an interrupt is raised. With that feature, one always gets the precise time when the start bit started, even if another interrupt was serviced at that time. If one also records the times of the edges following the start bit, one can easily reconstruct a transmitted byte.
Actually, one needs also a timer that tells you when the byte ends. This can be accomplished by using the output compare match feature. It is the dual to the input capture feature. Here, one writes a value into the output compare register (OCR) and if the timer matches the value, some preconfigured action such as changing the level of an output pin or raising an interrupt is triggered. By setting the value such that after 8.5 bit times such an interrupt is raised, one can take all the information gathered by recording the times of rising and falling edges and return the received byte.
Speeding It Up
The problem with the AltSoftSerial
library is that for each edge in a transmitted byte, an interrupt is raised and a lot of work has to be done at the end of a transmitted byte. This is not sustainable at high bit rates., i.e., 115200 bps and higher.
In order to address this problem, I stuffed all of the things described above in one interrupt routine. This worked so so, but not really reliably. In order to get to the bottom of it, I used (once again) my Saleae logic analyzer. And without it, I probably would have had a hard time to understand the problems.
Here is the setup, showing the logic analyzer, the FTDI board which sends bytes at varying communication speeds, and the Arduino UNO running the new library. By the way: I used the same Arduino sketches and Python scripts to test my library as when I evaluated different serial libraries.
Experimental setup
I inserted some code to generate short blips at critical parts in the ISR. With that, I was able to notice two problems. The interrupt service routine needed too much time starting up and it needed too much time to finish the ISR.
First, the time between the falling edge of the start bit and the time when the ICR was saved was quite long, as can be seen in the next picture, which shows what is happening at 125 kbps.
Initial timing of ISR
In the second row, one sees two blips. The first one signals the point in time when the ICR was saved, the second one when everything is setup to record the incoming byte. It takes 2.7 µs to come to the point when the ISR stored the ICR and reconfigured the input capture to record rising edges. A large chunk of this time is dedicated to pushing registers on the stack. All in all, 15 registers are saved, which takes up roughly 2 µs. In addition, it takes (worst case) 10 cycles to process the interrupt and the noise cancellation takes another 4 cycles, which together is almost 1 µs.
If a millis interrupt is raised just before the falling edge of the start bit, this may then result in missing an edge and therefore a bit, i.e., one gets a read error. I was able to reduce the startup time by declaring the ISR to be “naked” and then saving and restoring the registers using inline assembly code. Fortunately, it is known which registers one has to save. This allowed me to save the ICR early before all the registers were pushed on the stack. The result of all that is shown in the next picture (again at 125 kbps).
Timing with a naked ISR
As one can see, the startup time has been reduced to 1.75 µs (marker P0). The second blip is still relatively late. But this event is constrained to happen before the midpoint of the first data bit, which is easily achievable, even if a millis interrupt slows the ISR down.
In the third row, the finishing period of the ISR is timed. The first blip signals the point when the byte has been read and the second blip marks the point just before the ISR executes a “return from interrupt”. As one can see (marker P1), this goes into the stop bit quite a lot, leaving not much time for the user program to process the byte.
The way to deal with this problem is to reduce the amount of post-processing and to reduce the number of registers to be restored in the end. I addressed the problem by rewriting everything using inline assembly coding and employing the output compare match feature for locating the middle of the bit times as the sample points. This allowed me to reduce the number of register to 5 (down from 15). In addition, I reduced the post-processing of the received byte to storing it into the buffer and then restoring the registers.
Final timing with inline assembly ISR
As can be seen (marker P0), it now takes just under 1 µs to finalize the ISR. So at 125 kbps, the ISR returns 2.5 µs before the last data bit is finished, adding 5 µs to the time the user program can process the received byte–compared to the previous version.
The ICR is now saved after 1.3 µs and the question might be whether this is short enough to guarantee that the next edge (the one of the first data byte) does not clobber the ICR. As mentioned, the millis interrupt can take 6.7 µs in the worst case. These two times add up to 8.0 µs, which is just one bit time. For this reason, a more detailed analysis is necessary, taking into account all worst case assumptions, but eliminating any artifacts introduced by measuring.
In the worst case, the millis interrupt uses 106 cycles. We have to add 4 cycles of an instruction that might be executed between the millis ISR and our ISR, then 7 cycles for processing the interrupt and jumping to the start address, and finally 4 cycles for pushing one register and reading the low byte of the ICR. Once we have read the low byte, the high byte is saved in a temp register. The 4 cycles delay for recognizing an edge due to noise cancellation can be ignored because this delay applies to all edges. Also the 2 cycles that are necessary to produce the blip can be ignored. This adds up to 106+4+7+4 = 121 cycles, which is 7 less than 128. In other words, we can easily cope with a serial bit stream that is 5% faster.
So, what about the other red blips? The second blip signals the time point when the output compare match register has been setup, which needs to happen before the middle of the first data byte. As can be seen from the measurement P2, there is enough time left. The third red blip is the time point when we are ready to sample the first bit. This should happen before the first bit has finished. Assuming that 75% of the bit time are usable, one should do it 12.5% before the end. As is obvious (markers P3), even when it is 30% before the end, timing is not critical.
Having said all this, with 121 cycles we are very close to the limit and if the implementation of the millis interrupt or the code generation of the compiler changes, then one might be in trouble. So for guaranteed 100% error-free communication (even in the future), I would disable the millis interrupt when using the library at 125 kbps.
Let us finally have a look at what happens at 250 kbps when a timer interrupt is raised.
Timing is off because of millis interrupt
Markers P0 shows that the communication ISR is delayed (most probably by the millis interrupt). Instead of saving the ICR 1.4 µs after the falling edge of the start bit, it takes 3.7 µs before this happens. No harm done though because the critical time point is 4µs after the falling edge. However, setting up the OCR happens 400 ns too late (markers P1). There are no yellow blips in the fourth row, which mark the sample points. And consequently, no bytes are recognized/received.
Empirical Results
I stress tested the new library in a similar way as I tested the other software and hardware UARTs. The results are shown in the next table. In summary, it looks as if one can use the library up to 125 kbps even with the millis interrupt enabled. The library can even cope with 250 kbps–the millis interrupt has to be disabled in this case, though.
Bitrate | TX speed
deviation | RX speed deviation | |
1200 | -0.1% | -5.9% | +5.4% | |
2400 | -0.1% | -5.8% | +5.4% | |
4800 | -0.1% | -6.0% | +5.2% | |
9600 | -0.1% | -5.8% | +5.3% | |
19200 | -0.1% | -5.7% | +5.3% | |
38400 | -0.1% | -5.7% | +4.9% | |
57600 | -0.2% | -5.3% | +5.1% | |
115200 | -0.2% | -5.3% | +4.8% | |
230400 | +0.6% | -4.2% | +5.3% | # |
7812 | -0.1% | -5.8% | +5.2% | |
15625 | -0.1% | -5.7% | +5.2% | |
31250 | -0.1% | -5.6% | +5.1% | |
62500 | -0.1% | -5.3% | +4.9% | |
125000 | -0.1% | -5.2% | +4.9% | |
250000 | -0.1% | -4.8% | +3.5% | # |
500000 | -0.1% | ____ | ____ | |
1M | -15.9% | ____ | ____ | |
Transmit speed deviation and possible speed deviations when receiving data
('#' = no millis interrupt, '*' = 2 stop bits)
Summary
The new SingleWireSerial library works very reliably and robustly up to 250 kbps. It implements a serial interface over a single wire, but it can also be used in a two-wire setting. So, I will probably use it in the future in all contexts I had used SoftwareSerial
before, provided the respective pins are available. And I definitely plan to employ it in my next trial to implement a debugWIRE debugger.