Watchdog Timer

Most computers systems crash from time to time due to software errors and become unresponsive. A watchdog timer expects to be alerted by a pulse (from a serial port for example) every few seconds. If not, its internal countdown timer reaches zero and forces the system to reboot. This is very important for unattended systems where you want reliability.

The picture below is an external watchdog timer, the Emphase WDT-01. The three connectors at the top (make sure the red wires go to the terminals marked with a plus sign) to to power, watchdog input and the reset pins of the motherboard. There is a fourth connector, not connected if you want to re-enable your manual reset switch.

The jumpers determine the mode of operation, namely:

A 60 second startup delay, followed by a 1 second timeout.
A 60 second startup delay, followed by a 10 second timeout. (pictured above)
Timer starts on first pulse, followed by a 10 second timeout. (best for most cases)

This is a rather unfortunate choice of values, because although most linux systems can boot up in less than 60 seconds, every so often there is a mandatory file system check. This of course could be disabled, but is not a good idea.

A file system check on boot takes a few minutes if you have a large drive (e.g. > 100G) and the watchdog will leave you with a cycle of perpetual rebooting. With a 160G IDE harddrive and a 1G processor, you will get about 17% of the way through the file system check before it reboots. This is particularly annoying because you could have it working for months and then get burnt after a power failure when the motherboard automatically reboots and gets stuck in an endless loop.

This leaves us with only one option, a start up delay of "first edge". If the watchdog daemon doesn't manage to load you wont have protection; if there is a temporary boot failure, you won't get a second chance -- hence the fixed delay would have been better had it been long enough to be useful.

Connecting the watchdog input:

Triggering the watchdog with serial port zero on the motherboard works, but you have to be really careful. Many motherboard BIOS toggle the lines on and off during the initialization process activating the watchdog immediately even before the operating system has a chance to boot. Since 60 seconds is not always enough, you are in big trouble. One solution would to have had a 10 second delay for "first edge" where input was ignored so that BIOS hardware initialization would not trigger the timer.

On the VIA pico-ITX motherboard, I found that only the DTR line was unaffected during BIOS initialization, and the only control signal safe to use. In the example below, the triggering software is run as /usr/sbin/hardwaredog -p /dev/ttyS0 -t 10 -s DTR

The picture below shows where the power and reset lines go. The six conductor ribbon cable (red/black/green/white/red/white) was attached to a larger socked, so I trimmed off the unused part of the socket to allow access to the needed pins.

Here is another view of the same thing. Make sure you check the wiring diagram for you pico-ITX motherboard in case anything has changed.

VIA pico-ITX

Below is the pinout diagram.

pico-ITX connector pin-out

Next for the control line (that keeps the watchdog timer alive). We want the DTR line of ttyS0, which is found as seen below:

Here are the motherboard serial port connector details. The blank (missing pin) is positioned at the top left of the connector in the picture above. The red line is going to pin 4, DTR (data terminal ready).

pico-itx pinout

It was difficult to take a picture that shows this properly. The important thing is that the red line of the connector attached to the control signal you want to use. The black connector can just hang off the side of the connector (not attached).

Next we wrap the board with black electrical tape so that it doesn't short anything out, and can be tucked into the case.

pico-itx motherboard

Software Setup (under Debian)

Compile the C source (available at Emphase) using a compiler like gcc. This is pretty simple code and should compile with anything. Copy the executable you created to somewhere appropriate. In the example below, it was copied to /usr/sbin.

Create a script as follows in /etc/init.d

Artigo1:~# cat /etc/init.d/wdogtimer
#! /bin/sh
/usr/sbin/hardwaredog -p /dev/ttyS0 -t 10 -s DTR &

You many need to run a utility in order to have the script in init.d automatically start. Every distribution and release seems to be different.

Before you add the script, you can experiment by running it manually, and doing the following:

# /usr/sbin/hardwaredog -p /dev/ttyS0 -t 10 -s DTR &

This will run the daemon in the background.

# ps -ef

This will list all the running processes. You should see a line similar to the following:

root 2364 1 0 Jan01 ? 00:00:00 /usr/sbin/hardwaredog -p /dev/ttyS0 -t 10 -s DTR

Then kill the watchdog process like this:

# kill 2346 (use the process number you get from ps -ef, not 2346!).

Your system should reboot in 10 seconds.

Response from Logic Supply

Since Logic Supply took the time to respond to this article (rare these days), I figured I'd include their comments. Also, as shown above, this product does work with the VIA pico-itx motherboard. If anyone wants to contribute the a link to the settings to a) disable the periodic file system check on boot in various linux distrubutions b)know of other pins on the itx motherboard that could work as inputs and don't get toggled at power-on c) know if this motherboard or other similar minature mother boards are available with a built-in watchdog etc. please let me know, and I will add the instructions here.

Hi Peter,

I enjoyed reading your review of the WDT-01. I think that it is fair and accurate.

Regarding your complaint about the available timing options:

While the WDT-01 is likely useful in a wide variety of applications, it was really designed for use in applied and embedded computing applications. The timing settings reflect these design goals.

In many such systems, boot speed is an important design consideration and filesystem checks at boot time are disabled. There are a variety of ways of mitigating risk associated with disabling boot-time filesystem checks, including mounting the root filesystem read-only and using journalling filesystems.

To give an example: You wouldn't want your TiVo to take longer than 60 seconds to boot, right?

We are aware that some systems exhibit the serial port pulses at BIOS initialization time that you mentioned, and that this causes problems for the "first pulse" settings. In many cases, a different pin with better power-on behavior can be found to drive the WDT-01.

Finally, the WDT-01 is really a simple device intended for use with mainboards that have no on-board watchdog hardware. As such, it is not extremely featureful. There are likely other devices on the market that provide more features at the expense of greater cost and size. However, it is our belief that the benefits of a less expensive, smaller device are substantial, and that, for the intended set of applications, the design trade-offs are appropriate.

Thanks again for your feedback, and feel free to contact me with additional questions or comments.

Thanks,
Forest
--
Forest Bond
Computer Engineer
Logic Supply, Inc.
Phone: 802 861 2300 x413
forest.bond@logicsupply.com
www.logicsupply.com

--end--