Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Netduino

A Simple Watchdog for the Netduino

5.00/5 (11 votes)
13 May 2016CPOL5 min read 22.4K  
The following is a discussion and quick program used to test a watchdog for your Netduino.

Introduction

What if you create your own Netduino application, and an error halts it unexpectedly? First answer: no problem, I’m in front of the debugger, and I’ll have the complete stack trace of the exception.
Second answer: it’s a nightmare! I’m on holiday for the weekend, and my Netduino-based sprinkler system looks frozen… I should ask my neighbor to reset the board or, even better, ask him to water the grass.

The Nightmare

Image 1

Obviously, we are interested in the second answer, or better yet, how to avoid such a situation. When the hardware is damaged or the software is buggy, there are not many ways to rescue the board control. However, there are many, many situations where our board works perfectly (maybe for weeks) without any problems at all. As soon it is moved “on-site”, the first problem occurs.

Since none of us want to wake up at 3 AM for a system halt, or be forced to go back home because the sprinkler did not water the grass properly, then we could enforce the reliability by adding a so-called “watchdog”.

It’s a very simple way to protect the system from undesired halts, but does not solve every potential headache. On the contrary, it’s much like an “extrema ratio” for something we really can’t foresee.

A good programming practice, along with a good hardware, are the must-have redundancy for most of the reliability of any system.

Think seriously about it.

A Simple Watchdog

I don’t know who invented this name…”watchdog”…but it sounds clear (at least for me) that there are two different subjects:

  • the dog, which is the controller
  • the house, being watched over by the dog

Now, the potential failing system is the house, and the dog lives “externally” in respect to the house. That’s obvious, because if the system hangs, how could it rescue itself? It’s so obvious that many people are asking for a pure-software solution, maybe using a separate thread as a controller.

There are soooo many situations that can get a MCU to hang, that would lead anyone immediately to an external solution. Some examples:

  • under- or over-temperature
  • voltage spikes (both above the supply and below ground)
  • strong noises in general (especially when long wires are connected directly to the MCU pins)
  • software bugs
  • hardware instability (e.g. the crystal stops oscillating)
  • many others

I could have used a custom chip for a watchdog. The Atmel chip of the Netduino embeds a watchdog, but it has not been driven by the firmware. Instead, I’d like to show a very simple circuit, which is primarily for learning how to solve this particular problem.

Image 2

We’ll use a simple counter: the 74HC4060. It’s a 14-stages binary counter, which also embeds a basic R-C oscillator. All that to obtain a re-triggerable, long-period timer. The word “timer” calls immediately to mind the amazing “555” chip: a masterpiece in the hardware design of the ’70. BTW, we need a relatively long reaction time: at least several seconds. That’s because the Netduino takes a couple of seconds to complete the full reset process, then we should consider the slowness of the program. A normal watchdog reacts within milliseconds, while here we’re considering dozen of seconds, maybe more. For a such long timing, the normal 555-timer is not reliable, because relies on the capacitor charge. Also, we would need a pretty large capacitor. The 74HC4060 is much simpler for long timings. I tuned the oscillator for a frequency of about 60Hz, that is using:

  • Rt = 68k
  • Ct = 100n

Note: Refer to the 74HC4060 specs.

Then, I chose the output of the 10th stage (i.e. Q9) as “timeout signal”, which triggers the Netduino reset after about 10 seconds. Now, ten binary stages yield a frequency division of 1024 (=2^10), so why is it that 60 Hz divided by 1024 does not yield 10 seconds, but 20? Because the reset happens as soon the Q9 output turns high, which is after just half of the overall time.

Image 3

So, what’s the role of the Netduino, being afraid to be reset from the 74HC4060? Well, yeah…our program running in the Netduino must continuously “refresh” the counter, so that it won’t ever reach the Q9 high. Basically, we need any of the Netduino outputs generating a short positive pulse, which has to reset the counter. Until the Netduino application is running properly, the pulse will keep the counter within a relatively low value, and the Q9 never turn high. By the way, when the program hangs, there’s no more reset pulse generation, and the counter can run to reach the Q9 high. That signal will reset the Netduino.

A Simple Test Program

The following program is used as a test for the watchdog. It makes the led blink for a certain period, then generates an exception. That is a simulated “bug”, which actually hangs the whole board. Under such a circumstance, you only have two choices: press the “reset” button, or detach and plug the supply again. Since none of them are operation suitable for a remote context, we’ll introducing a little “helper”, that “presses the reset button for us”.

C#
using System;
using System.Net;
using System.Net.Sockets;
using System.Threading;
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;
using SecretLabs.NETMF.Hardware;
using SecretLabs.NETMF.Hardware.NetduinoPlus;

namespace NetduinoWatchdog
{
    public class Program
    {
        public static void Main()
        {
            //define the led port
            var led = new OutputPort(Pins.ONBOARD_LED, false);

            //just a long loop to make the led blinking
            for (int i = 0; i < 1000; i++)
            {
                //call the critical section
                Freezer(i);

                //wait for a while, then toggle the led status
                Thread.Sleep(100);
                led.Write(
                    !led.Read()
                    );
            }
        }

        static void Freezer(int count)
        {
            //this is just to simulate an unexpected event
            if (count == 20)
                throw new Exception();

            //keep the dog awaken
            Watchdog();
        }

        //define the watchdog port
        static OutputPort wdt = new OutputPort(Pins.GPIO_PIN_A5, false);

        static void Watchdog()
        {
            //generate a positive pulse to reset the external counter
            wdt.Write(true);
            wdt.Write(false);
        }
    }
}

There is no other code, because the project is mainly focusing the external circuit using the 74HC4060. Also it’s clear that a similar source will hang every time: it has no sense in a real context. A more realistic application should be much more “exception-free proof”, and maybe is able to “correct itself” upon a certain failure. For instance, consider your application is writing a file on the SD, but the user pulls out the card. It’s a bit difficult to write a bullet-proof procedure that writes data without any exception. However, once the Netduino has been reset, you can test for the SD presence, and avoid any related operation.

The Demo

Enjoy it!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)