Ulf: My Favorite Open Source Animal

Over at opensource.com they asked “What’s your favorite open source animal?” Hands down, it’s Ulf.

OpenNMS Kiwi: Ulf

When I was at FOSDEM this year, we were often asked about the origin of having a kiwi as our mascot. Kiwi’s are mainly associated with New Zealand, and OpenNMS is not from New Zealand. But Ulf is.

Every year we have a developer’s conference called “Dev Jam“. Back in 2010, a man named Craig Miskell came from NZ and brought along a plush toy kiwi. He gave it to a group of people who had come from Germany, since he had come the furthest east for the conference and they had come the furthest west. They named him “Ulf”.

There was no conscious decision to make Ulf our mascot, it just happened organically. People in the project started treating him as a “traveling gnome“, setting up a wiki page to track some of the places he’s been, and he even has his own Twitter account.

I lost him once. We had a holiday party a few years ago and Ulf went missing. We thought he had been left in a limo, so I dutifully sought out a replacement. I found one for US$9, but of course shipping from NZ was an additional US$80 more, so I bought two. I later found Ulf hiding in the pocket of a formal overcoat I rarely wear (but had the night of the party) so now we have a random array of individual Ulf’s.

Anyway, Ulf manages to represent OpenNMS often, from stickers to holiday cards and keychains. I love the fact that he just kind of happened, we didn’t make a conscious decision to use him in marketing. If you happen to come across OpenNMS at conferences like FOSDEM, be sure to stop by and say “hi”.

“OpenNMS WHO” at OSMC 2016

There is a really cool monitoring conference held each year in Germany. Called the “Open Source Monitoring Conference” (OSMC) it is put on by Netways, one of the maintainers of the Icinga project, but they welcome other projects such as OpenNMS and Zabbix.

It is a lot of fun, and usually Jeff and I fight over who gets to go. This year David won (he was in Germany for other reasons) and they now have his talk available for viewing:

It’s an overview of what we have been up to and where we are going with the Project. Check it out.

Speaking of conferences and travel, next week I plan to be in Helsinki, Tallinn, Riga and Brussels. I’ll be in Riga for the Open Tech conference and hope to spent some time with my Zabbix friends, and I’ll be in Brussels for FOSDEM where OpenNMS will have a booth. It’s my first time at either conference, and if you happen to be in the area drop me a note and perhaps we can meet up.

OpenNMS 101

One of my favorite things to do is to teach people about OpenNMS. I am one of the main trainers, and I usually run the courses we hold here at OpenNMS HQ. I often teach these classes on-site as well (if you have three or more people who want to attend, it can be cheaper to bring someone like me in for a week than to send them here), and the feedback I got from a recent course at a defense contractor was “that was the best class I’ve ever attended, except for the ones I got to blow stuff up.”.

Unfortunately, a lot of people can’t spare a week away from the office nor do they have the training or travel budget to come to our classes. And teaching them can be draining. While I can easily talk about OpenNMS for hours on end, it is much harder to do for days on end.

To help with that I’ve decided to record the lessons in a series of videos. I am not a video editing wizard, but I’ve found a setup using OBS that works well for me and I do post production with OpenShot.

The first class is called “OpenNMS 101” and we set it up as a video playlist on Youtube. The lessons are built on one another so beginners will want to start with Module 0, the Introduction, although you can choose a particular single episode if you need a refresher on that part of OpenNMS.

My goal is to put up two or three videos a week until the course material is exhausted. That will not begin to cover all aspects of OpenNMS, so the roadmap includes a follow up course called “OpenNMS 102” which will consist of standalone episodes focused on a particular aspect of the platform. Finally, I have an idea for an “OpenNMS 201” to cover advanced features, such as the Drools integration.

I’ve kept the videos as informal as the training – when I make a mistake I tend to own it and explain how to fix it. It also appears that I use “ummmmmmm” a lot as a place holder, although I’m working to overcome that. I just posted the first part of “Module 4: Notifications” and I apologize for the long running time and the next lessons will be shorter. I had to redo this one (the longest, of course) as during the first take I forgot to turn on the microphone (sigh).

We have also posted the slides, videos and supporting configuration files on the OpenNMS project website.

I’d appreciate any feedback since the goal is to improve the adoption of OpenNMS by making it easier to learn. Any typos in the slides will be fixed on the website but I am not sure I’ll be able to redo any of the videos any time soon. I think it is more important to get these out than to get them perfect.

Perfection is the enemy of done.

Monitoring Certificates with OpenNMS

Awhile ago I posted about how easy it was to implement SSL certificates using Let’s Encrypt.

The main issue that people encounter is that the certificates do expire, and while you can set up a cron job to automatically update them, sometimes it doesn’t work. This is why I like to use OpenNMS to check the expiration date of all the certificates I use on the network.

The documentation for the SSLCertMonitor is pretty detailed, and it can be used for almost any cert, not just the one for HTTPS. The example shows configuration for SMTPS and IMAPS as well.

SSLCertMonitor Example

What it doesn’t show is how to discover these services. You could, of course, just provision them directly via a requisition, but I’m lazy so I set up the TCP detector to look for those services on their well known ports.

SSLCertMonitor Detectors

This may result in a false positive if, for some reason, the port was in use by another application, but in practice I haven’t seen it yet.

So now I can rest assured that all my important SSL-based services have valid certificates and there shouldn’t be any interruption in service due to one expiring.

SSLCertMonitor Services Displayed

Network World Reviews OpenNMS

Today Network World published the results of a comparison among open source network monitoring applications. OpenNMS did not win but I was pretty happy with the article.

The main criticism I have is that the winner, Pandora FMS, seems to be the only one of the four reviewed that is more “open core” than “open source”. They have a large number of versions, each with different features, and you have to pay for those features based on the number of monitored devices. It seems to be difficult to have open source software that is limited in this fashion, as anyone should be able to easily remove that limit. Thus I have to assume that their revenue model is firmly based on selling software licenses, which is antithetical to open source. That said, it looks like the review was based on the “community” version of Pandora which does appear to be free software, just don’t expect any of the “enterprise” features to be available in that version any time soon.

I don’t know why I have such a visceral dislike of the “per managed node” pricing model, outside of having to deal with it back in the 1990s and 2000s. It seems like an unnecessary tax on your growth, “hey, customer, for every new device you add you have to pay for another monitoring license.” Plus, in these days of virtualization and microservices it seems silly. Our customers might spin up between 10 and 100 virtual servers as needed and tear them down just as quickly, and I can’t imagine the complexity that would get added to have to manage a license of each one of them.

Network World Comparison

Of the other applications reviewed, I’m not familiar with NetXMS but I do know Zabbix. They, like OpenNMS, are 100% open source and they are great people. It was awesome to finally meet Alexei Vladishev in person at this year’s All Things Open conference.

Alexei Vladishev and Tarus Balog

The only other thing that immediately pushed a button was the sentence “All four products were surprisingly good.” At first I took it to express surprise that free software could also be good, but then I calmed down a bit and figured they meant it was surprising that all four applications were strong.

For the article they installed OpenNMS on Windows. When I read that my heart just sank, because while it does run on Windows our support of that operating system grew out of a bet. We were talking many years ago about Java’s “write once, run anywhere” slogan and I mentioned that if that were true, why don’t we run on Windows? The team took up the challenge and it took two weeks to port. The first week was spent getting the few bits of code written in C to compile on Windows, and the second week on soft-coding the file separator character so that it would use a back-slash instead of a forward-slash. Even on Windows, the comments in the article were really positive, which make me think this whole Java thing isn’t such a bad idea after all (grin).

They used Windows because apparently was an issue with getting OpenNMS installed on CentOS 7, which was a surprise to me, but then Ronny pointed out that there can be some weird conflicts with Java and packages like LibreOffice that I don’t experience since I always do a minimal install. There is a cool installer for CentOS 7 which may help with that. We also maintain Docker images that make installation easy if you are used to that environment.

Fortunately, or unfortunately, not much has been done for OpenNMS on Windows since we got it working. It is fortunate because not much is required to keep OpenNMS running on Windows due to Java, but it is unfortunate because we really don’t have the Windows expertise that would be required to get it to run as a service, create an MSI installer, etc. Susan Perschke, the author of the article, seems to be a Windows-guru so I plan to reach out to her about improving the OpenNMS experience for Windows users.

One thing that is both common and valid is criticism of the web user interface. At the moment we spend most of our time focused on making OpenNMS even more scalable, and thus we don’t have the resources to make the user interface easier to use. That is changing, and most of the current effort goes into Compass™, the OpenNMS mobile app. The article didn’t mention it which means they probably didn’t try it out, which is more a failure on our part to market it versus an oversight on theirs.

They also didn’t talk directly about scalability, although it was listed in the comparison chart (see above). OpenNMS is designed to monitor tens of thousands to hundreds of thousands of devices with our goal to be virtually unlimited in order to address scale on the order of the Internet of Things. That is why we wrote Newts for storing performance data and are working on both the Minion and Underling to easily distribute OpenNMS functionality.

Another reason we haven’t spent much time on the user interface is that our larger customers tend not to use it much. They rely on the ReST interface to integrate their own systems with OpenNMS and on things like the Business Service Monitoring.

But still, it was nice to be included. We don’t do much direct marketing and even though typing “open source network monitoring” into Google returns OpenNMS as the first hit we are often overlooked. Let’s hope they revisit this in a year and we can impress them even more.

Nextcloud and OpenNMS

Last weekend, OpenNMS-er extraordinare Ronny Trommer was at a conference where he met Jos Poortvliet from Nextcloud. I’ve been following Nextcloud pretty intently since I recognized kindred souls in their desire to create a business that was successful and still 100% open source (and not, for example, fauxpensource). Jos mentioned that Nextcloud was getting a new monitoring API and thought it would be cool if OpenNMS could use it.

Since their API returns the monitoring information as XML, Ronny used the XML Collector to gather the data. Once the data is in OpenNMS, you can graph it, set thresholds, configure notifications, etc.

Available metrics include:

  • CPU load and memory usage
  • Number of active users over time
  • Number of shares in various categories
  • Storage statistics
  • Server settings like PHP version, database type and size, memory limits and more

Here’s an example of the number of files from a small demo system:

Files in Nextcloud

Of course, since OpenNMS is a platform, once the data is in the system you can leverage its integrations with applications such as Grafana:

Nextcloud Metrics in Grafana

Some applications will go on and on about how many “plugins” they have. Often, these are little more than scripts that do something simple, like an SNMP GET, but with all the overhead of having to run a shell. To add something like Nextcloud to OpenNMS, it is just a simple matter of configuring a couple of files, but to make that easier a lot of configurations have been added to a git repository. If you want to try out the Nextcloud integration, follow these instructions.

True open source solutions can offer the best feature, performance and value for most companies, but unfortunately there are so few pure open source companies providing them. I applaud Nextcloud and look forward to working with them for years to come.

Nagios XI vs. OpenNMS Meridian – the Return of the FUD

It seems like our friends over at Nagios have been watching a little too much election coverage this year, and they’ve updated their “Nagios vs. OpenNMS” document with even more rhetoric and misinformation.

As my three readers may recall, back in 2011 I tore apart the first version of this document. Now they have decided to update it to target our Meridian™ version.

Let’s see how they did (please look at it and follow along as it is quite amusing).

The first misleading bit is the opening paragraph with the phrase “most widely used open-source monitoring project in the world”. Now, granted, they do indicate that means “Nagios Core” but it seems a little disingenuous since what they are selling is Nagios XI, which is much different.

Nagios XI is not open source. It is published under the “Nagios Open Software License” which is about as proprietary as they get. I’m not even sure why the word “open” was added, except to further mislead people into thinking it is open source. The license contains clauses like “The Software may not be Forked” and “The Software may only be used in conjunction with products, projects, and other software distributed by the Company.” Think about it, you can’t even integrate Nagios XI with, say, a home grown trouble ticketing system without violating the license. Doesn’t sound very “open” at all. OpenNMS Meridian is published under the AGPLv3, or a similar proprietary license should your organization have an issue with the AGPL. You don’t have that choice with Nagios XI.

Next, let’s check out the price. The OpenNMS Group has always published its prices on-line. One instance of Meridian, which includes support in the form of access to our “Connect” community, is $6,000. They have it listed as $25,995, which is the price should you choose the much more intensive “Prime” support option. I’m not sure why they didn’t just choose our most expensive product, Ultra Support with the 24×7 option, to make them seem even better.

Nagios XI Node Limitation

Also, note the fine print “Price based on one instance of XI with 220 nodes/devices”. There is no device limit with OpenNMS Meridian. So let’s be clear, for $6000 you get access to the Meridian software under an open source license versus $5000 to monitor 220 nodes with extreme limitations on your rights.

Our smaller customers tend to have around 2000 devices, which means to manage that with Nagios XI you would need roughly ten instances costing nearly $50,000 (using the math presented in this document). And from the experience we’ve heard with customers coming to us from Nagios, the reason it is limited to so few nodes is that you probably can’t run much more on a single instance of Nagios XI. Compare that to OpenNMS where we have customers with over 100,000 devices in a single instance (and they’ve been running it for years).

We also price OpenNMS as a platform. You get everything: trouble-ticketing integration, graphing, reporting, etc. in one application. It looks like Nagios has decided to nickel and dime you for logs, etc. and a thing called “Nagios Fusion” which you’ll need to manage your growing number of Nagios instances since it won’t natively scale. And remember, due to the license you are forbidden from using the software with your own tools.

I especially had to laugh at the “You Speak, We Listen” part. If you have a feature or change you need, if you ask nicely they might make it for you. With OpenNMS Meridian you are free to make any changes you need since it is 100% open source, and with our open issue tracker we address dozens of user requests each point release.

Finally, there is the feature comparison, which at a minimum is misleading and is often just blatantly false. Almost every feature marked as lacking in Meridian exists, and at a level far beyond what Nagios XI can provide. Seriously, is it really objective to state that OpenNMS doesn’t support Nagvis, a specific tool that even has “Nagios” in the name?

Nagvis

I had to laugh at the hubris. They obviously didn’t Google “opennms nagvis“, because, guess what? There has been an OpenNMS Nagvis integration for some time now, contributed by our community. Just in case you were wondering, we have an integration with Network Weathermap as well.

Nagios is just another proprietary software product that wants to lock you into its ecosystem, and this is just a shameful attempt to monetize an application that is long past its prime. Heck, it was the inability of the Nagios leadership to get along with others that resulted in the very popular Icinga fork, and with it Nagios lost a lot of contribution that helped make up its “Thousands of Free Add-Ons” (and the way Nagios took over the community lead plug-in site was also poorly handled). Plus, many of those add-ons won’t scale in an enterprise environment, which probably lead to the 220 device limit.

Compare that to OpenNMS. We not only want to encourage you integrate with other products, we do a lot of it for you. OpenNMS has great graphing, but we also created the first third party plug-in for Grafana. When it comes to mapping, OpenNMS is on the leading edge, with a focus on various topology views that can ultimately handle millions of devices in a fashion that is actually usable. Need to see a Layer 2 topology? Choose the “enhanced linkd view”. Run VMware and Vcenter? It is simple to import all of your machines and see them in a view that shows hosts, guests and network storage. Plus the unique ability to focus on just those devices of interest allows you to use a map with hundreds of thousands if not millions of nodes.

Nagios Map

Compare that to the Nagios map screenshot where it looks like “localhost” is having some issues. Oh no, not localhost! That’s like, all of my machines.

As for “Business Process Intelligence” I’ve been told that the Nagios XI version is like our Business Service Monitor “Except BSM is more featureful, and has a significantly better UI/UX”. Need real Business Intelligence? OpenNMS has Red Hat Drools support, the open source leader, built right into the product.

We also support integration with popular Trouble Ticketing systems such as Request Tracker, Jira, OTRS and Remedy. And the kicker is that you can also run any Nagios check script natively in OpenNMS using the “System Execute Monitor“, but once you get used to the OpenNMS platform, why would you?

I’m not really sure why Nagios goes out of its way to spread fear, uncertainty and doubt about OpenNMS. We rarely compete in the same markets. I’m sure that Sunrise Community Banks get their money’s worth from Nagios, and for companies like NRS Small Business Solutions, Nagios might be a good fit. But if you have enterprise and carrier-level requirements, there is no way Nagios will work for you in the long term.

When a company does something like this to mislead, from wrong information about our product to using terms like “open” when they mean “closed”, it shows you what they think of their competition. What does it say about what they think about their customers?

New Fancy Website for www.opennms.org

As some of you may have noticed, a little while ago the OpenNMS Project website got updated to a new, fancy, responsive version.

OpenNMS Platform

This was mainly the work of Ronny Trommer with a big assist from our graphic designer, Jessica.

We are often so busy working on the code we often forget how important it is to tell people about what we are doing. Most people who take the time to learn about the project realize how awesome it is, but it can be hard to get over that first hump in the learning curve.

I hope that the new site will both reflect the benefits of using OpenNMS as well as the work of the community behind it.

OpenNMS and Elasticsearch

With Horizon 18 we added support for sending OpenNMS events into Elasticsearch. Unfortunately, it only works with Elasticsearch 1.0. Elasticsearch 2.0 and higher requires Camel 17, but OpenNMS can’t use it. I wondered why, and if you were wondering too, here is the answer from Seth:

Camel 17 has changed their OSGi metadata to only be compatible with Spring 4.1 and higher. We’re still using Spring 4.0 so that’s one problem. The second issue is that ActiveMQ’s OSGi metadata bans Spring 4.0 and higher. So currently, ActiveMQ and Camel are mutually incompatible with one another inside Karaf at any version higher than the ones that we are currently running.

The biggest issue is the ActiveMQ problem, I’ve opened this bug and it sounds like they’re going to address it in their next major release

So there you have it.

Choose the Right Thermometer

Okay, so I have a love/hate relationship with Centurylink. Centurylink provides a DSL circuit to my house. I love the fact that I have something resembling broadband with 10Mbps down and about 1Mbps up. Now that doesn’t even qualify as broadband according to the FCC, but it beats the heck out of the alternatives (and I am jealous of my friends with cable who have 100Mbps down or even 300Mbps).

The hate part comes from reliability, which lately has been crap. This post is actually focused on OpenNMS so I won’t go into all of my issues, but I’ve been struggling with long outages in my service.

The latest issue is a new one: packet loss. Usually the circuit is either up or completely down, but for the last three days I’ve been having issues with a large percentage of dropped packets. Of course I monitor my home network from the office OpenNMS instance, and this will usually manifest itself with multiple nodeLostService events around HTTP since I have a personal web server that I monitor.

The default ICMP monitor does not measure packet loss. As long as at least one ping reply makes it, ICMP is considered up, so the node itself remains up. OpenNMS does have a monitor for packet loss called Strafeping. It sends out 20 pings in a short amount of time and then measures how long they take to come back. So I added it to the node for my home and I saw something unusual: a consistent 19 out of 20 lost packets.

Strafeping Graph

Power cycling the DSL modem seems to correct the problem, and the command line ping was reporting no lost packets, so why was I seeing such packet loss from the monitor? Was Strafeping broken?

While it is always a possibility, I didn’t think that Strafeping was broken, but I did check a number of graphs for other circuits and they looked fine. Thus it had to be something else.

This brings up a touchy subject for me: false positives. Is OpenNMS reporting false problems?

It reminds me of an event happened when I was studying physics back in the late 1980s. I was working with some newly discovered ceramic material that exhibited superconductivity at relatively high temperatures (around 92K). That temperature can be reached using liquid nitrogen, which was relatively easy to source compared to cooler liquids like liquid helium.

I needed to measure the temperature of the ceramic, but mercury (used in most common thermometers) is a solid at those temperatures, so I went to my advisor for suggestions. His first question to me was “What does a thermometer measure?”

I thought it was a trick question, so I answered “temperature” (“thermo” meaning temperature and meter meaning “to measure”). He replied, “Okay, smart guy, the temperature of what?”

That was harder to answer exactly, so I said vague things like the ambient environment, whatever it was next to, etc. He interrupted me and said “No, a thermometer measures one thing: the temperature of the thermometer”.

This was an important lesson, even though it seems obvious. In the case of the ceramic it meant a lot of extra steps to make sure the thermometer we were using (which was based on changes in resistance) was as close to the temperature of the material as possible.

What does that have to do with OpenNMS? Well, OpenNMS is like that thermometer. It is up to us to make sure that the way we decide to use it for monitoring is as close to our criteria as possible. A “false positive” usually indicates a problem with the method versus the tool – OpenNMS is behaving exactly as it should but we need to match it better to what we expect.

In my case I found out the router I use was limited by default to responding 1 ping per second (to avoid DDoS attacks I assume), so last night when I upped that to allow 20 pings per second Strafeping started to work as expected (as you can see in the graph above).

This allowed me to detect when my DSL circuit packet loss started again today. A little after 14:00 the system detected high packet loss. When this happened before, power cycling the modem seemed to fix it, so I headed home to do just that.

While I was on the way, around 15:30, the packet loss seemed to improve, but as you can see from the graph the ping times were all over the place (the line is green but there is a lot of extra “smoke” around it indicating a variance in the response times). I proactively power cycled the modem and things settled down. The Centurylink agent agreed to send me a new modem.

The point of this post is to stress that you need to understand how your monitoring tools actually work and you can often correct issues that make a monitor unusable and turn it into to something useful. Choose the right thermometer.