Cleaning Up

Okay, I am starting the process of cleaning up 1.1.1 to make it presentable (grin). I believe I have fixed the issue with some people getting “resource not available” errors with certain browsers and the new last day/week/month/year graphs. It’s in CVS.

Also, I debated removing the changes that require Postgres 7.2+, but changed my mind. The ability to sort by IP Address is useful, and since 7.3 is out we need to move off of 7.1.

Since we are using Tomcat 4.1, restarting OpenNMS no longer restarts Tomcat. That will be fixed soon.

Tomcat and 1.1.1

Okay, seems there is an issue with install.pl and the RPMs for 1.1.1. After installing OpenNMS, edit /opt/OpenNMS/bin/install.pl and change line 121 to read:

$TOMCAT  = 1 if ($arg eq "t");

Then run:
$OPENNMS_HOME/bin/install.pl -q $OPENNMS_HOME/etc/create.sql

Sorry about that.

The Two Sides of OpenNMS

I have been giving a lot of thought to who uses OpenNMS. On the one hand there are those who want a better solution than Network Node Manager, and on the other are those who want more of a remote management “tool” or appendage that they can control from some central location. The former are more interested in features and the WebUI, while the latter are more concerned with the ability to remotely configure and update OpenNMS. Both are concerned with things such as the need to restart OpenNMS when changing things.

In the next few months, I hope to focus more on some of the remote management aspects of OpenNMS. This will result in features such as:

o The ability to add, remove and update individual nodes via events without restarting OpenNMS.

o Outage improvements so that outages can be dynamically scheduled without restarting OpenNMS, and the ability to “snooze” a device that is down for a known reason, such as a hardware failure. A device that gets “snoozed” would disappear from the Outages list for a particular amount of time.

o A “graph server” so that requests can be made to the system for a particular graph or graphs. This will allow third party programs to present data to end users without having to give them direct access to the system.

o IP to port mapping. This feature will collect SNMP data, such as utilization, under an IP address in addition to a port number. If you are an xSP, you could associate Switch 7, Port 12, to IP Address 10.1.1.1. Then you could generate reports based upon traffic to that address, which could be used in billing, etc. If the 10.1.1.1 device is moved to another port, say Switch 9, Port 3, the system could receive an event which would automatically move the data collection for 10.1.1.1.

This is in addition to the other nifty features we have planned, and I would be interested in input to how useful this would be.

Ooops – I Did It Again

Okay, 1.1.1 is out, and I have heard little positive and a lot negative about install problems, broken links, etc.

Good, we must be on the right track.

In an attempt to get a release candidate for 1.2, you should see at least one if not two more releases in the next month. We have run into some issues that are requiring the upgrade of some of the packages we depend on, namely:

Tomcat4: We now require 4.1.18 for Tomcat. Unfortunately, Tomcat 4.1 appears via RPM to be a different beast than 4.0, so a simple upgrade doesn’t work. Tomcat 4.1 handles some of our URLs differently than 4.0 did.

Postgres: Folks complained about the ordering on the Manage/Unmanage interfaces page, so an ORDER BY statement was added. Unfortunately, ordering by IP address is only available in 7.2, and we have been supplying 7.1 on the web site for those who want it.

We are moving away from having “onms” versions of standard packages like Postgres and Tomcat, and plan to do most of the configuration in install.pl.

I am kind of encouraged about the lack of serious problems being reported in OpenNMS. I do want to add Derek’s map code into the next release, and there are a couple of small features that would be nice to see, but most of the effort in the near term will be a great big bug-hunt to remove the little, annoying things in preparation for 1.2.

Also, there have been numerous complaints about the “Delete Node” page. While I haven’t experienced them directly, I do plan on re-working the whole thing. I need to be able to delete interfaces (IP Addresses) since OpenNMS has a long memory once it learns of an IP Address, and moving it can cause the product to behave strangely.

Anyway, keep your fingers crossed for 1.1.2 in about two weeks.

Abroche Su Cinturon

I flew to San Antonio yesterday, and it was the first time I could really try out my Entymonic 4P earphones. They rock.

As soon as the flight attendant said it was okay, I pulled out the laptop, stuck in “Lord of the Rings” and put on the 4Ps. These are “in the ear” headphones that also act as really nice earplugs. You can only feel the plane, not hear it, and you get great sound.

The thing I liked best was that I seemed to arrive less fatigued than usual. I never realized how draining simply sitting in a noisy cabin could be.

New Category: Rant

I was thinking of posting this under “Commentary”, but it goes a bit beyond that and into the realm of “Rant”.

I have spent several hours today trying to get the virus protection on my Windows server to work. Now, I don’t do silly things like download software from unknown people, run “.exe” attachments that I receive in the mail, or use Outlook, so I am not to worried about a virus trashing the data on my system.

I am worried about a virus getting on my system and then propagating via my address book, etc., and pestering other people. I think if you use the Internet, then you should at least take the basic precautions to insure that you don’t contribute to the spread of viruses, etc.

The easiest way to do that is to run a fairly secure operating system, but since I use DirecPC to access the Internet, I am required to have at least one Windows box around.

This system is simply a DirecPC router. It contains the DirecPC software, Deerfield’s WinGate proxy server (so that my Mac OS X, Solaris and Linux systems can actually use the network connection) and not much else.

I did (until this afternoon) have a copy of Symantec’s Norton Anti-Virus on the system. It seemed to be working, but when it tried to scan the server once a week, the program would die with a vague error message. I spent most of this afternoon trying to fix the problem. No luck.

So I went looking for alternatives. My experiences with McAfee’s solution were worse than that with Symantec, so I decided to trial Trend Micro’s PC-cillin. So far so good.

But what the rant is about is how trapped I felt using the Symantec product. It is a pig of a program, and when it failed – nothing. My guess is that Microsoft may have changed a DLL or something in one of the various Windows Updates, or Symantec blew it with a poor registry entry, but there were no logs or anything to even begin to track down the problem. It seemed to fail on a particular sub-directory, so I spent hours scanning and re-scanning to try at see exactly where it would fail (The logs said: Beginning scan … scan failed. How useful).

Sure, I run into issues with open source software all the time. But I never seem to have to “hunt and pray” nearly as much to find the issues. Maybe it’s because the software is simply better, or maybe it’s due to the fact that open-source code is built to be debugged.

By being easier to debug, it would follow that open source software would have less bugs than a closed program of equal complexity. And since the debugging process is open to any and all, you get the bugs squashed more quickly as well.

But with commercial software, you just chuck it, and Symantec has lost me as a customer. Ah, I long for the day when I can do most everything that is important using open-source software.

Thank God for Standards

Okay, I really like standards – preferably open standards. And I really like SNMP, although sometimes I think that having the word “Simple” in its name was a mistake.

I love simplicity. I think for all of OpenNMS’s power, it is at heart a simple product. But some vendors confuse simplicity with sloppiness, and it causes me no end of grief.

For example, a client called me with the news that he was unable to get OpenNMS to discover one of his devices for data collection. Digging into the logs we find that there is a null pointer exception when it is trying to retrieve the ifSpeed from the device. Using UCD’s snmpwalk you get:

snmpwalk -c public 10.1.1.1 ifTable.ifEntry.ifSpeed
interfaces.ifTable.ifEntry.ifSpeed.1 = Wrong Type (should be Gauge32 or Unsigned32): 160000000
interfaces.ifTable.ifEntry.ifSpeed.10001 = Wrong Type (should be Gauge32 or Unsigned32): 160000000
interfaces.ifTable.ifEntry.ifSpeed.40001 = Gauge32: 38400
interfaces.ifTable.ifEntry.ifSpeed.40002 = Gauge32: 38400
interfaces.ifTable.ifEntry.ifSpeed.40005 = Gauge32: 44000000
interfaces.ifTable.ifEntry.ifSpeed.50001 = Gauge32: 10000000
interfaces.ifTable.ifEntry.ifSpeed.50002 = Gauge32: 10000000
interfaces.ifTable.ifEntry.ifSpeed.50003 = Gauge32: 38400

Now, how can you get part of the MIB right, and then blow it on the other part? I assume it is because of the high value, but it should still fit into 32 bits – 2^32 ~ 4 billion (unless my math is wrong).

Luckily, OpenNMS is open, so the source code change to ignore the wrong numbers was pretty simple to do.

A much harder fix will be for the problem where the SNMP agent does not respond correctly to snmpgetbulk requests. Ain’t it grand to be able to work around the issues, though?

A Grand Day Out

Well, Ben Reed has been on to me for some time now to start a web log about OpenNMS. Since I spend so much of my life working on that product, I figured it would be better to start a web log about my experiences with building a company around open-source and, of course, OpenNMS.

So, welcome to my adventures in open source. I really do believe in the open source model, but you may notice that I am not an open source bigot. I am using Movable Type as my blog editor, and I am writing this on a Mac Powerbook running OSX. The right tool for the right job.

So, if I don’t believe that open source is the “be all” and “end all”, why do I bust my hump working on OpenNMS? Because in this case, open source works. Network management is hard, difficult to the point that you can’t really utilize most software “out of the box”. So what you need instead is a tool, a tool that is powerful enough to be useful in many different situations.

But OpenNMS is and will always be growing, so check back here to see my thoughts on how we are doing, and feel free to add your own.