Nagios XI vs. OpenNMS Meridian – the Return of the FUD

It seems like our friends over at Nagios have been watching a little too much election coverage this year, and they’ve updated their “Nagios vs. OpenNMS” document with even more rhetoric and misinformation.

As my three readers may recall, back in 2011 I tore apart the first version of this document. Now they have decided to update it to target our Meridian™ version.

Let’s see how they did (please look at it and follow along as it is quite amusing).

The first misleading bit is the opening paragraph with the phrase “most widely used open-source monitoring project in the world”. Now, granted, they do indicate that means “Nagios Core” but it seems a little disingenuous since what they are selling is Nagios XI, which is much different.

Nagios XI is not open source. It is published under the “Nagios Open Software License” which is about as proprietary as they get. I’m not even sure why the word “open” was added, except to further mislead people into thinking it is open source. The license contains clauses like “The Software may not be Forked” and “The Software may only be used in conjunction with products, projects, and other software distributed by the Company.” Think about it, you can’t even integrate Nagios XI with, say, a home grown trouble ticketing system without violating the license. Doesn’t sound very “open” at all. OpenNMS Meridian is published under the AGPLv3, or a similar proprietary license should your organization have an issue with the AGPL. You don’t have that choice with Nagios XI.

Next, let’s check out the price. The OpenNMS Group has always published its prices on-line. One instance of Meridian, which includes support in the form of access to our “Connect” community, is $6,000. They have it listed as $25,995, which is the price should you choose the much more intensive “Prime” support option. I’m not sure why they didn’t just choose our most expensive product, Ultra Support with the 24×7 option, to make them seem even better.

Nagios XI Node Limitation

Also, note the fine print “Price based on one instance of XI with 220 nodes/devices”. There is no device limit with OpenNMS Meridian. So let’s be clear, for $6000 you get access to the Meridian software under an open source license versus $5000 to monitor 220 nodes with extreme limitations on your rights.

Our smaller customers tend to have around 2000 devices, which means to manage that with Nagios XI you would need roughly ten instances costing nearly $50,000 (using the math presented in this document). And from the experience we’ve heard with customers coming to us from Nagios, the reason it is limited to so few nodes is that you probably can’t run much more on a single instance of Nagios XI. Compare that to OpenNMS where we have customers with over 100,000 devices in a single instance (and they’ve been running it for years).

We also price OpenNMS as a platform. You get everything: trouble-ticketing integration, graphing, reporting, etc. in one application. It looks like Nagios has decided to nickel and dime you for logs, etc. and a thing called “Nagios Fusion” which you’ll need to manage your growing number of Nagios instances since it won’t natively scale. And remember, due to the license you are forbidden from using the software with your own tools.

I especially had to laugh at the “You Speak, We Listen” part. If you have a feature or change you need, if you ask nicely they might make it for you. With OpenNMS Meridian you are free to make any changes you need since it is 100% open source, and with our open issue tracker we address dozens of user requests each point release.

Finally, there is the feature comparison, which at a minimum is misleading and is often just blatantly false. Almost every feature marked as lacking in Meridian exists, and at a level far beyond what Nagios XI can provide. Seriously, is it really objective to state that OpenNMS doesn’t support Nagvis, a specific tool that even has “Nagios” in the name?

Nagvis

I had to laugh at the hubris. They obviously didn’t Google “opennms nagvis“, because, guess what? There has been an OpenNMS Nagvis integration for some time now, contributed by our community. Just in case you were wondering, we have an integration with Network Weathermap as well.

Nagios is just another proprietary software product that wants to lock you into its ecosystem, and this is just a shameful attempt to monetize an application that is long past its prime. Heck, it was the inability of the Nagios leadership to get along with others that resulted in the very popular Icinga fork, and with it Nagios lost a lot of contribution that helped make up its “Thousands of Free Add-Ons” (and the way Nagios took over the community lead plug-in site was also poorly handled). Plus, many of those add-ons won’t scale in an enterprise environment, which probably lead to the 220 device limit.

Compare that to OpenNMS. We not only want to encourage you integrate with other products, we do a lot of it for you. OpenNMS has great graphing, but we also created the first third party plug-in for Grafana. When it comes to mapping, OpenNMS is on the leading edge, with a focus on various topology views that can ultimately handle millions of devices in a fashion that is actually usable. Need to see a Layer 2 topology? Choose the “enhanced linkd view”. Run VMware and Vcenter? It is simple to import all of your machines and see them in a view that shows hosts, guests and network storage. Plus the unique ability to focus on just those devices of interest allows you to use a map with hundreds of thousands if not millions of nodes.

Nagios Map

Compare that to the Nagios map screenshot where it looks like “localhost” is having some issues. Oh no, not localhost! That’s like, all of my machines.

As for “Business Process Intelligence” I’ve been told that the Nagios XI version is like our Business Service Monitor “Except BSM is more featureful, and has a significantly better UI/UX”. Need real Business Intelligence? OpenNMS has Red Hat Drools support, the open source leader, built right into the product.

We also support integration with popular Trouble Ticketing systems such as Request Tracker, Jira, OTRS and Remedy. And the kicker is that you can also run any Nagios check script natively in OpenNMS using the “System Execute Monitor“, but once you get used to the OpenNMS platform, why would you?

I’m not really sure why Nagios goes out of its way to spread fear, uncertainty and doubt about OpenNMS. We rarely compete in the same markets. I’m sure that Sunrise Community Banks get their money’s worth from Nagios, and for companies like NRS Small Business Solutions, Nagios might be a good fit. But if you have enterprise and carrier-level requirements, there is no way Nagios will work for you in the long term.

When a company does something like this to mislead, from wrong information about our product to using terms like “open” when they mean “closed”, it shows you what they think of their competition. What does it say about what they think about their customers?

2016 PB and Jam

OpenNMS is headquartered in the idyllic small town of Pittsboro, NC, sometimes just called “PBO”. Since a number of people who come to Dev-Jam travel a fair distance, we’ve started a tradition of a “mini Dev-Jam” the week after, hosted at OpenNMS HQ.

This is much more focused on the work of The OpenNMS Group, but it is still a lot of fun. Last night as a team building exercise we decided to try an “escape” room.

This is a a relatively new thing where a group of people get put in a room and they have a certain amount of time to figure out puzzles and escape. Jessica set us up with Cipher Escape in their “Geek Room” which was the only one that could accommodate 11 of us.

It’s a lot of fun. For our experience we were lead into about a 15×15 room and given the following backstory: you are watching your neighbors cat while they are on vacation and after you feed her you realize you are locked in their house. You have 60 minutes to escape.

One thing I thought was funny was that the room was dotted with little pink stickers and we were told that these indicate things that don’t need to be manipulated (e.g. there was a picture frame that when you turned it over you would see the stickers, which meant you weren’t supposed to take it apart). I can only imagine the beta testing that went into determining where to put the stickers (our hostess specifically mentioned that you didn’t need to take the legs off the furniture).

To tell anything more would spoil it, but I was extremely proud that the team escaped with over 10 minutes to spare (we missed the best time by ten minutes, so it wasn’t close, but we did beat a team from Cisco that didn’t escape at all).

Escape Room Success

It was a ton of fun, and I’d put this team up against any challenge.

Afterward, most of us went out for sushi at Waraji. I’ve known the owner Masatoshi Tsujimura for almost 30 years, and even though they were packed they were able to set us up with a tatami room.

Waraji Dinner

It’s a bit out of the way for me to visit often, so I was happy to have an excuse for a victory celebration.

Emley Moor, Kirklees, West Yorkshire

I spent last week back in the United Kingdom. I always find it odd to travel to the UK. When I’m in, say, Germany or Spain, I know I’m in a different country. With the UK I sometimes forget and hijinks ensue. As Shaw may have once said, we are two countries separated by a common language.

Usually I spend time in the South, mainly Hampshire, but this trip was in Yorkshire, specifically West Yorkshire. I was looking forward to this for a number of reasons. For example, I love Yorkshire Pudding, and the Four Yorkshiremen is my favorite Monty Python routine.

Also, it meant that I could fly into Manchester Airport and miss Heathrow. Well, I didn’t exactly miss it.

I was visiting a big client that most people have never heard of, even though they are probably an integral part of your life if you live in the UK. Arqiva provides the broadcast infrastructure for much of the television and mobile phone industry in the country, as well as being involved in deploying networks for projects such as smart metering and the Internet of Things.

We were working at the Emley Moor location, which is home to the Emley Moor Mast. This is the largest freestanding structure in Britain (and third in the European Union). With a total height of 1084 feet, it is higher than the Eiffel Tower and almost twice as high as the Washington Monument.

Emily Moor Mast View

The mast was built in 1971 to replace a metal lattice tower that fell, due to a combination of ice and wind, in 1969. I love the excerpt from the log book mentioned in the Wikipedia article:

  • Day: Lee, Caffell, Vander Byl
  • Ice hazard – Packed ice beginning to fall from mast & stays. Roads close to station temporarily closed by Councils. Please notify councils when roads are safe (!)
  • Pye monitor – no frame lock – V10 replaced (low ins). Monitor overheating due to fan choked up with dust- cleaned out, motor lubricated and fan blades reset.
  • Evening :- Glendenning, Bottom, Redgrove
  • 1,265 ft (386 m) Mast :- Fell down across Jagger Lane (corner of Common Lane) at 17:01:45. Police, I.T.A. HQ, R.O., etc., all notified.
  • Mast Power Isolator :- Fuses removed & isolator locked in the “OFF” position. All isolators in basement feeding mast stump also switched off. Dehydrators & TXs switched off.

They still have that log book, open to that page.

Emily Moor Log Book

If you have 20 minutes, there is a great old documentary on the fall of the old tower and the construction of the new mast.

On my last day there we got to go up into the structure. It’s pretty impressive:

Emily Moor Mast Up Close

and the inside looks like something from a 1970s sci-fi movie:

Emily Moor Mast Inside

The article stated that it takes seven minutes to ride the lift to the top. I timed it at six minutes, fifty-seven seconds, so that’s about right (it’s fifteen seconds quicker going down). I was working with Dr. Craig Gallen who remembers going up in the open lift carriage, but we were in an enclosed car. It’s very small and with five of us in it I will admit to a small amount of claustrophobia on the way up.

But getting to the top is worth it. The view is amazing:

View from Emily Moor Mast

It was a calm day but you could still feel the tower sway a bit. They have a plumb bob set up to measure the drift, and it was barely moving while we were up there. Toby, our host, told of a time he had to spend seven hours installing equipment when the bob was moving four to five inches side to side. They had to move around on their hands and knees to avoid falling over.

Plumb Bob

I’m glad I wasn’t there on that day, but our day was fantastic. Here is a shot of the parking lot where the first picture (above) was taken.

View of Emily Moor Parking Lot

I had a really great time on this trip. The client was amazing, and I really like the area. It reminds me a bit of the North Carolina mountains. I did get my Yorkshire Pudding in Yorkshire (bucket list item):

Yorkshire Pudding in Yorkshire

and one evening Craig and I got to meet up with Keith Spragg.

Keith Spragg and Craig Gallen

Keith is a regular on the OpenNMS IRC channel (#opennms on freenode.net), and he works for Southway Housing Trust. They are a non-profit that manages several thousand homes, and part of that involves providing certain IT services to their tenants. They are mainly a Windows/Citrix shop but OpenNMS is running on one of the two Linux machines in their environment. He tried out a number of solutions before finding that OpenNMS met his needs, and he pays it forward by helping people via IRC. It always warms my heart to see OpenNMS being used in such places.

I hope to return to the area, although I was glad I was there in May. It’s around 53 degrees north latitude, which puts it level with the southern Alaskan islands. It would get light around 4am, and in the winter ice has been known to fall in sheets from the Mast (the walkways are covered to help protect the people who work there).

I bet Yorkshire Pudding really hits the spot on a cold winter’s day.

Welcome Ecuador (Country 29)

It is with mixed emotions that I get to announce that we now have a customer in Ecuador, our 29th country.

My emotions are mixed as my excitement at having a new customer in a new country is offset by the tragedy that country suffered recently. Everyone at OpenNMS is sending out our best thoughts and we hope things settle down (quite literally) soon.

They join the following countries:

Australia, Canada, Chile, China, Costa Rica, Denmark, Egypt, Finland, France, Germany, Honduras, India, Ireland, Israel, Italy, Japan, Malta, Mexico, The Netherlands, Portugal, Singapore, Spain, Sweden, Switzerland, Trinidad, the UAE, the UK and the US.

OpenNMS is Sweet Sixteen

It was sixteen years ago today that the first code for OpenNMS was published on Sourceforge. While the project was started in the summer of 1999, no one seems to remember the exact date, so we use March 30th to mark the birthday of the OpenNMS project.

OpenNMS Project Details

While I’ve been closely associated with OpenNMS for a very long time, I didn’t start it. It was started by Steve Giles, Luke Rindfuss and Brian Weaver. They were soon joined by Shane O’Donnell, and while none of them are associated with the project today, they are the reason it exists.

Their company was called Oculan, and I joined them in 2001. They built management appliances marketed as “purple boxes” based on OpenNMS and I was brought on to build a business around just the OpenNMS piece of the solution.

As far as I know, this is the only surviving picture of most of the original team, taken at the OpenNMS 1.0 Release party:

OpenNMS 1.0 Release Team

In 2002 Oculan decided to close source all future work on their product, thus ending their involvement with OpenNMS. I saw the potential, so I talked with Steve Giles and soon left the company to become the OpenNMS project maintainer. When it comes to writing code I am very poorly suited to the job, but my one true talent is getting great people to work with me, and judging by the quality of people involved in OpenNMS, it is almost a superpower.

I worked out of my house and helped maintain the community mainly through the #opennms IRC channel on freenode, and surprisingly the project managed not only to survive, but to grow. When I found out that Steve Giles was leaving Oculan, I applied to be their new CEO, which I’ve been told was the source of a lot of humor among the executives. The man they hired had a track record of snuffing out all potential from a number of startups, but he had the proper credentials that VCs seem to like so he got the job. I have to admit to a bit of schadenfreude when Oculan closed its doors in 2004.

But on a good note, if you look at the two guys in the above picture right next to the cake, Seth Leger and Ben Reed, they still work for OpenNMS today. We’re still here. In fact we have the greatest team I’ve every worked with in my life, and the OpenNMS project has grown tremendously in the last 18 months. This July we’ll have our eleventh (!) annual developers conference, Dev-Jam, which will bring together people dedicated to OpenNMS, both old and new, for a week of hacking and camaraderie.

Our goal is nothing short of making OpenNMS the de facto management platform of choice for everyone, and while we still have a long way to go, we keep getting closer. My heartfelt thanks go out to everyone who made OpenNMS possible, and I look forward to writing many more of these notes in the future.

OpenNMS at Scale

So, yes, the gang from OpenNMS will be at the SCaLE conference this weekend (I will not be there, unfortunately, due to a self-imposed conference hiatus this year). It should be a great time, and we are happy to be a Gold Sponsor.

But this post is not about that. This is about how Horizon 17 and data collection can scale. You can come by the booth at SCaLE and learn more about it, but here is the overview.

When OpenNMS first started, we leveraged the great application RRDTool for storing performance data. When we discovered a java port called JRobin, OpenNMS was modified to support that storage strategy as well.

Using a Round Robin database has a number of advantages. First, it’s compact. Once the file containing the RRD database is created, it never grows. Second, we used RRDTool to also graph the data.

However, there were problems. Many users had a need to store the raw collected data. RRDTool uses consolidation functions to store a time-series average. But the biggest issue was that writing lots of files required really fast hard drives. The more data you wanted to store, the greater your investment in disk arrays. Ultimately, you would hit a wall, which would require you to either reduce your data collection or partition out the data across multiple systems.

No more. With Horizon 17 OpenNMS fully supports a time-series database called Newts. Newts is built on Cassandra, and even a small Cassandra cluster can handle tens of thousands of inserts a second. Need more performance? Just add more nodes. Works across geographically distributed systems as well, so you get built-in high availability (something that was very difficult with RRDTool).

Just before Christmas I got to visit a customer on the Eastern Shore of Maryland. You wouldn’t think that location would be a hotbed of technical excellence, but it is rare that I get to work with such a quick team.

They brought me up for a “Getting to Know You” project. This is a two day engagement where we get to kick the tires on OpenNMS to see if it is a good fit. They had been using Zenoss Core (the free version) and they hit a wall. The features they wanted were all in the “enterprise” paid version and the free version just wouldn’t meet their needs. OpenNMS did, and being truly open source it fit their philosophy (and budget) much better.

This was a fun trip for me because they had already done most of the work. They had OpenNMS installed and monitoring their network, and they just needed me to help out on some interesting use cases.

One of their issues was the need to store a lot of performance data, and since I was eager to play with the Newts integration we decided to test it out.

In order to enable Newts, first you need a Cassandra cluster. It turns out that ScyllaDB works as well (more on that a bit later). If you are looking at the Newts website you can ignore the instructions on installing it as it it built directly into OpenNMS.

Another thing built in to OpenNMS is a new graphing library called Backshift. Since OpenNMS relied on RRDTool for graphing, a new data visualization tool was needed. Backshift leverages the RRDTool graphing syntax so your pre-defined graphs will work automatically. Note that some options, such as CANVAS colors, have not been implemented yet.

To switch to newts, in the opennms.properties file you’ll find a section:

###### Time Series Strategy ####
# Use this property to set the strategy used to persist and retrieve time series metrics:
# Supported values are:
#   rrd (default)
#   newts

org.opennms.timeseries.strategy=newts

Note: “rrd” strategy can refer to either JRobin or RRDTool, with JRobin as the default. This is set in rrd-configuration.properties.

The next section determines what will render the graphs.

###### Graphing #####
# Use this property to set the graph rendering engine type.  If set to 'auto', attempt
# to choose the appropriate backend depending on org.opennms.timeseries.strategy above.
# Supported values are:
#   auto (default)
#   png
#   placeholder
#   backshift
org.opennms.web.graphs.engine=auto

If you are using Newts, the “auto” setting will utilize Backshift but here is where you could set Backshift as the renderer even if you want to use an RRD strategy. You should try it out. It’s cool.

Finally, we come to the settings for Newts:

###### Newts #####
# Use these properties to configure persistence using Newts
# Note that Newts must be enabled using the 'org.opennms.timeseries.strategy' property
# for these to take effect.
#
org.opennms.newts.config.hostname=10.110.4.30,10.110.4.32
#org.opennms.newts.config.keyspace=newts

There are a lot of settings and most of those are described in the documentation, but in this case I wanted to demonstrate that you can point OpenNMS to multiple Cassandra instances. You can also set different keyspace names which allows multiple instances of OpenNMS to talk to the same Cassandra cluster and not share data.

From the “fine” documentation, they also recommend that you store the data based on the foreign source by setting this variable:

org.opennms.rrd.storeByForeignSource=true

I would recommend this if you are using provisiond and requisitions. If you are currently doing auto-discovery, then it may be better to reference it by nodeid, which is the default.

I want to point out two other values that will need to be increased from the defaults: org.opennms.newts.config.ring_buffer_size and org.opennms.newts.config.cache.max_entries. For this system they were both set to 1048576. The ring buffer is especially important since should it fill up, samples will be discarded.

So, how did it go? Well, after fixing a bug with the ring buffer, everything went well. That bug is one reason that features like this aren’t immediately included in Meridian. Luckily we were working with a client who was willing to let us investigate and correct the issue. By the time it hits Meridian 2016, it will be completely ready for production.

If you enable the OpenNMS-JVM service on your OpenNMS node, the system will automatically collected Newts performance data (assuming Newts is enabled). OpenNMS will also collect performance data from the Cassandra cluster including both general Cassandra metrics as well as Newts specific ones.

This system is connected to a two node Cassandra cluster and managing 3.8K inserts/sec.

Newts Samples Inserted

If I’m doing the math correctly, since we collect values once every 300 seconds (5 minutes) by default, that’s 1.15 million data points, and the system isn’t even working hard.

OpenNMS will also collect on ring buffer information, and I took a screen shot to demonstrate Backshift, which displays the data point as you mouse over it.

Newts Ring Buffer

Horizon 17 ships with a load testing program. For this cluster:

[root@nms stress]# java -jar target/newts-stress-jar-with-dependencies.jar INSERT -B 16 -n 32 -r 100 -m 1 -H cluster
-- Meters ----------------------------------------------------------------------
org.opennms.newts.stress.InsertDispatcher.samples
             count = 10512100
         mean rate = 51989.68 events/second
     1-minute rate = 51906.38 events/second
     5-minute rate = 38806.02 events/second
    15-minute rate = 31232.98 events/second

so there is plenty of room to grow. Need something faster? Just add more nodes. Or, you can switch to ScyllaDB which is a port of Cassandra written in C. When run against a four node ScyllaDB cluster the results were:

[root@nms stress]# java -jar target/newts-stress-jar-with-dependencies.jar INSERT -B 16 -n 32 -r 100 -m 1 -H cluster
-- Meters ----------------------------------------------------------------------
org.opennms.newts.stress.InsertDispatcher.samples
             count = 10512100
         mean rate = 89073.32 events/second
     1-minute rate = 88048.48 events/second
     5-minute rate = 85217.92 events/second
    15-minute rate = 84110.52 events/second

Unfortunately I do not have statistics for a four node Cassandra cluster to compare it directly with ScyllaDB.

Of course the Newts data directly fits in with the OpenNMS Grafana integration.

Grafana Inserts per Second

Which brings me to one down side of this storage strategy. It’s fast, which means it isn’t compact. On this system the disk space is growing at about 4GB/day, which would be 1.5TB/year.

Grafana Disk Space

If you consider that the data is replicated across Cassandra nodes, you would need that amount of space on each one. Since the availability of multi-Terabyte drives is pretty common, this shouldn’t be a problem, but be sure to ask yourself if all the data you are collecting is really necessary. Just because you can collect the data doesn’t mean you should.

OpenNMS is finally to the point where the storing of performance data is no longer an issue. You are more likely to hit limits with the collector, which in part is going to be driven by the speed of the network. I’ve been in large data centers with hundreds of thousands of interfaces all with sub-millisecond latency. On that network, OpenNMS could collect on hundreds of millions of data points. On a network with lots of remote equipment, however, timeouts and delays will impact how much data OpenNMS could collect.

But with a little creativity, even that goes away. Think about it – with a common, decentralized data storage system like Cassandra, you could have multiple OpenNMS instances all talking to the same data store. If you have them share a common database, you can use collectd filters to spread data collection out over any number of machines. While this would take planning, it is doable today.

What about tomorrow? Well, Horizon 18 will introduce the OpenNMS Minion code. Minions will allow OpenNMS to scale horizontally and can be managed directly from OpenNMS – no configuration tricks needed. This will truly position OpenNMS for the Internet of Things.

UPDATE: Alejandro pointed out the following:

There is a property in OpenNMS for Newts that doesn’t appear in opennms.properties called heartbeat org.opennms.newts.query.heartbeat, which expects a duration in milliseconds. By default is 450000 (i.e. 1.5 x 5min) and it is used when no heartbeat is specified. Should generally be 1.5x your biggest collection interval.

If you were using 10 minute polls, set it to 15 min (1.5 x 10min), and then you will see graphs. To do this, add the following to opennms.properties and restart OpenNMS:

org.opennms.newts.query.heartbeat=900000

♫ Don’t Call It a Comeback ♫

Welcome to 2016. My year started out with an invitation to join the AARP. (sigh)

As my three readers know, when it comes to this business of open source we are pretty much making things up as we go along. We are lead by our business plan of “spend less money than you earn” and our mission statement of “help customers, have fun, make money” but the rest is pretty fluid.

In 2013 we mixed things up and tried a more “traditional” start up path by seeking out investment and spending more money than we had. It didn’t work out so well.

Thus 2014 was more of a rebuilding year as we tried to move the focus back to our roots. It paid off, as 2015 was a very good year. We had record gross revenues, and although we didn’t make much money on the bottom line, it was positive once again. At the moment we are still investing in the company and the project so pretty much every extra dollar goes into growth.

And we had a lot of growth. The decision to split OpenNMS into Meridian and Horizon paid off in three major Horizon releases. Horizon 17 was an especially large and important release as it brought in the Newts integration. At the moment we are working with it on a customer site using a ScyllaDB cluster capable of supporting 75K inserts per second. The technologies introduced in 2015 will make it in to Meridian 2016, due in the spring, and it should solidify OpenNMS as a platform that can really scale.

In 2015 we also received orders from two of the Fortune 5 companies. I’ll leave it as an exercise to the reader to guess which two and you have a 1 in 16 shot at getting it right (grin). The fact that companies that can choose, literally, any technology they want yet they choose OpenNMS speaks volumes.

One of these days we’re going to have to figure out a way to talk about our customers by name, since they are all so cool. We are working on it, but it is surprisingly difficult to get permission to publicly post that information. Above all we respect our clients’ privacy.

I have high expectations for 2016 and the power of the Open Source Way. Thanks to everyone who has supported us over the last decade and more, and we just hope you find our efforts provide some value.

Happy New Year.

I, Robot

Today is the 11th anniversary of The OpenNMS Group. We started on September 1st, 2004 with little more than a drive to build something special, a business plan of “spend less than you earn” and a mission statement of “Help Customers, Have Fun, Make Money”.

Since I’m still working and people are using software other than OpenNMS to manage their networks, I can’t say “mission accomplished” but we’re still here, we have a great team and the best users anyone could want, so by that measure we are successful.

When it comes to the team, one thing I worry about is how to connect our remote people with the folks in North Carolina. We do a lot of Hangouts, etc, but they lack the aspect of initiative – the remote guys have to be passive and just sit there. Then I got the wild idea to investigate getting a telepresence robot. Wouldn’t it be cool if remote people could pop in and drive around the office, attend meetings, etc?

After a lot of research, I decided on a robot from Double Robotics.

Robot Tarus

The buying decision wasn’t a slam dunk. It is a very iPad/iOS centric solution which bothered me, and I had some issues concerning the overall security of the platform. So, I sent in a note and ended up having a call with Justin Beatty.

It was a great call.

Double is pretty serious about security, and assuming there are no firewall issues, the connection is encrypted peer-to-peer. While there are no plans to remove the requirement that you buy an iPad in order to use the robot, they are working on an Android native client. You can drive it on almost any platform that supports the Chrome browser (such as Linux) and you can even use it on Android via Chrome. There is a native iOS app as well.

What really sold me on the company is that they are a Y Combinator project, and rather than focus on raising more capital, they are focused on making a profit. They are small (like us) and dedicated to creating great things (like us).

Justin really understood our needs as well, as he offered us a refurbished unit at a discount (grin).

Anyway, I placed an order for a Double and (gulp) ordered an iPad.

It was delivered while I was away in England, but I was able to get it set up on Monday when I returned to the office. They have a number of easy to follow videos, and it probably took about 20 minutes to understand how everything went together.

You take the main body of the robot out of the box and place it on the floor. I had purchased an external speaker kit (otherwise, it uses the iPad speaker) which makes it look like a little Dalek, and you install that on the main post. Then you plug in the iPad holder and screw it to the post with a bolt. That’s about it for robot assembly.

The next step is to take the USB charging cable that came with your tablet and mount it inside the iPad holder. You then insert the iPad upside down and connect the cable so that the robot can power and recharge the iPad. The Double supports any iPad from version 2 onward, and they have a spacer to use for the iPad Air (which is thinner). Finally, you connect a directional microphone into the audio slot on the bottom of the iPad (or top, depending on how you look at it) and the unit is assembled.

Then I had to set up the iPad, which was a bit of a pain since I’m no longer an Apple person and needed a new Apple account (and then I had to update iOS), but once it was configured I could then pair the iPad to the robot via bluetooth. Next, I had to download the Double app from the App Store and create a Double account. Once that process was complete, I could login to the application on the tablet and our robot was ready to go.

To “drive” the robot, you log in to a website via Chrome. There are controls in the webapp for changing the height of the unit, controlling audio and video, and you move the thing around with the arrow keys.

It’s a lot of fun.

When moving you want to have the robot in its lowest height setting. Not only will it go faster, it will be more stable. This isn’t an off road, four wheeling type of robot – it likes smooth services. There is a little bump at the threshold to my office and once the robot has gone over that you want to wait a second or two because it will wobble back and forth a little bit. Otherwise, it does pretty well, and because the rubber wheels are the part of the robot that stick out the most in the front and the back; if you run into a wall it won’t damage the iPad.

I did have to mess with a couple of things. First of all, it needed a firmware upgrade before the external audio speaker would work. Second, sometimes it would keep turning in one direction (in my case, to the right), but restarting the browser seem to fix that.

You do need to be careful driving it, however. One of my guys accidentally drove it into a table, so it hit the table along the “neck” of the robot and not on the wheels. This caused the unit to shoot backward, recover and then try to move forward. It fell flat on its face.

Which, I am thankful, did no damage. The iPad is mounted in a fairly thick case, and while I wouldn’t want to test it you are probably safe with the occasional face plant.

I bought an external wireless charger which allows you to drive the robot into a little “dock” for charging instead of plugging it in. To help park it, there is a mirror mounted in the iPad holder that directs the rear camera downward so you can see where you are going (i.e. look at the robot’s “feet”). Pretty low tech but they get points for both thinking about it and engineering such a simple solution.

Everyone who has driven it seems to like it, although I’m thinking about putting a bell on the thing. This morning I was jammin’ to some tunes in my office when I heard a noise and found Jeff, piloting the robot, directly behind me. It was a little creepy (grin).

I bought it with a nice (i.e. expensive) Pelican case since the plan is to take it on road trips. I bought the iPad that supports 4G SIM cards so I should be able to use it in areas without WiFi. It’s first outing will be to the OpenNMS Users Conference, which is less than a month away. If you haven’t registered yet, you should do so now, and you’ll get to see the robot in action.

Robot Bryan

Bad Voltage will also be there, with Bryan Lunduke piloting the robot from his home in Portland. I had him try it out today and he commented “So rad. So very, very rad”.

At the moment I’m very pleased with the Double from Double Robotics. It’s a little spendy but loads of fun, and I can’t wait to use it for team meetings, etc, when people can’t make it in person. You can also share the output from the unit with other people with the beta website, although you could always just do a Google Hangout and share the screen.

Double Logo

I even like the Double Robotics logo, which is a silhouette of the robot against a square background to form a “D”. I am eager to see what they do in the future.

Case Study: Why You Want OpenNMS Support

I wanted to share a story about a support case I worked on recently that might serve to justify the usefulness of commercial OpenNMS Support to folks thinking about it. As always, OpenNMS is published under an open source license and so commercial support is never a requirement, but as this story involves commercial software I thought it might be useful to share it.

We have a client that handles a lot of sensitive information, to the point that they have an extremely hardened network environment that makes it difficult to manage. They place a separate copy of OpenNMS into this “sphere” just to manage the machines inside it, and they have configured the webUI to be accessed over HTTPS as the only access from the outside.

Recently, a security audit turned up this message:


Red Hat Linux 6.6 weak-crypto-key
3 Weak Cryptographic Key Fail "The following TLS cipher suites use
Diffie-Hellman keys smaller than 1024 bits: *
TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA (768-bit DH key) *
TLS_DHE_RSA_WITH_AES_128_CBC_SHA (768-bit DH key)" "Use a Stronger Key If
the weak key is used in an X.509 certificate (for example for an HTTPS
server), generate a longer key and recreate the certificate. Please also
refer to NIST's recommendations on cryptographic algorithms and key
lengths (http://csrc.nist.gov/publications/nistpubs/800-131A/sp800-131A.pdf
) ." Vulnerable

and they opened a support ticket asking for advice on how to fix it.

I had some issues with the error message right off the bat. The key used was 2048 bits, so my guess is that the algorithm is weak and not the key. The error message seems to suggest, however, that a longer key would fix the problem.

Anyway, this should be simple to fix. The jetty.xml file in the OpenNMS configuration directory lets you exclude certain ciphers, so I just had the customer add these two to the list and restart OpenNMS.

And then we waited for the nightly scan to run.

This fixed the issue with the TLS_DHE_RSA_WITH_AES_128_CBC_SHA cipher but not the first one. Nothing we did seemed to help, so I installed sslscan on my test machine to try and duplicate the issue. I got a different list of ciphers, and since openssl uses different name for the ciphers than Java, and it was a bit of a pain to try and map them. I couldn’t get sslscan to show the same vulnerabilities as the tool they were using.

We finally found out that the tool was Nexpose by Rapid 7. I wasn’t familiar with the tool, but I found that I could download a trial version. So I set up a VM and installed the “Community Edition”.

Note: this has nothing to do with open core, which often refers to their “free” version as the “community” version. Nexpose is 100% commercial. They use “community” to mean “community supported”, but it is kind of confusing, like when Bertolli’s markets “light” olive oil which means “light tasting” and not low in calories.

I had to fill out a web form and wait about a day for the key to show up. I had installed the exact version of OpenNMS that the client was using on my VM, so my hope was that I could recreate the errors.

First, I had to increase the memory to the VM. Nexpose is written in Java and is a memory hog, but so is OpenNMS, and it was some work to get them to play nice together on the same machine. But once I got it running, it wasn’t too hard to recreate the problem.

The Nexpose user interface isn’t totally intuitive, but I was able to add the IP address of the local machine and get a scan to kick off without having to read any documentation. The output came as a CVS file, but you could also examine the output from within the UI.

The scan reported the same two errors, and just like before I was able to remove the “TLS_DHE_RSA_WITH_AES_128_CBC_SHA” one just by excluding it in jetty.xml, but the second one would not go away. I found a list of ciphers supported by Java, but nothing exactly matched “TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA” and I tried almost all of the combinations for similar TLS ciphers.

Then it dawned on me to try “SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA” and the error went away. I guess in retrospect it was obvious but I was pretty much focused on TLS based ciphers and it didn’t dawn on me that this would be the error with Nexpose.

It was extremely frustrating, but as my customer was being beat up about it I was glad that we could get the system to pass the audit. While this was totally an issue with the scanning software and not OpenNMS, it would have been hard to figure out without the help we were happy to give.

It may not surprise anyone that a large number of OpenNMS support issues tend to be related to products from other vendors. Usually most of them can be classified as a poor implementation of the SNMP standard, but occasionally we get something like this.

Our clients tend to be incredibly smart and good at their jobs, but having access to the folks that actually make OpenNMS can sometimes save enough time and headache to more than offset the cost of support.

Welcome Costa Rica! (Country 28)

While I have never been able to personally visit Costa Rica (it is on my list) I am happy to announce that we now have a commercial customer from their, making it the 28th unique country for OpenNMS.

They join the following countries:

Australia, Canada, Chile, China, Denmark, Egypt, Finland, France, Germany, Honduras, India, Ireland, Israel, Italy, Japan, Malta, Mexico, The Netherlands, Portugal, Singapore, Spain, Sweden, Switzerland, Trinidad, the UAE, the UK and the US.