Rackwatch and OpenNMS | Adventures in Open Source

One of my favorite clients is Rackspace Managed Hosting. They have been using OpenNMS since 2002, and they were either our second or third official customer (I can’t remember if they came before or after NASA), and I’m happy to say that they are still a client.

We like to describe OpenNMS as a network management application platform, and what that means is that it is easy to integrate OpenNMS into other systems to build a custom and unique management solution. In the case of Rackwatch, OpenNMS is integrated into an internal Rackspace system called CORE (CORE Objects Reused Everywhere).

We have a server hosted at Rackspace which we, of course, monitor with OpenNMS. This morning I got an e-mail from our OpenNMS system:

Subject: Notice #19845: HTTP down on 10.1.1.1 (10.1.1.1) on node server.opennms.org.
Date: January 10, 2010 7:46:07 AM EST
To: Tarus Balog

The HTTP service poll on interface 10.1.1.1 (10.1.1.1) on node server.opennms.org failed at Sunday, January 10, 2010 7:38:56 AM EST.

What I though was cool was that at approximately the same time I got an e-mail from Rackspace:

From: support@rackspace.com
Subject: Created Rackspace Ticket #100110-01016: Service Down: Webport (Computer #40906)
Date: January 10, 2010 7:39:59 AM EST
To: Tarus Balog

Dear Tarus Balog,

The following support ticket has been created:

Ticket #: 100110-01016
Subject: Service Down: Webport (Computer #40906)
Status: Confirm Solved
Account #: 14290 (Sortova Consulting Group)
Date: 01/10/2010 6:39am CDT
Comment:
————————————————————————

The Rackwatch monitoring system was unable to reach the
Webport service on computer #40906.
It may be down.
————————————————————————

Note the times of the notices were nearly identical, although this is more of a coincidence since our polling rate it set at 5 minutes (I believe Rackwatch is more frequent).

I just thought it was cool to actually experience OpenNMS in action at such a large company (we have instances of OpenNMS running in all nine data centers as part of the Rackwatch application).

Anyway, this notice allowed me to fix an issue with the Apache config on that server, and I dutifully got “resolved” messages from both Rackwatch and our own internal instance of OpenNMS when it was detected as being back up.

Sometimes you get so caught up in the internal issues with running a project that you forget that people actually use it, and it is nice to think that we play some small role in helping Rackspace provide fanatical service to their tens of thousands of customers.