Add a Weather Widget to OpenNMS Home Screen

I was recently at a client site where I met a man named Jeremy Ford. He’s sharp as a knife and even though, at the time, he was new to OpenNMS, he had already hacked a few neat things into the system (open source FTW).

Weathermap on OpenNMS Home Page

One of those was the addition of a weathermap to the OpenNMS home page. He has graciously put the code up on Github.

The code is a script that will generate a JSP file in the OpenNMS “includes” directory. All you have to do then is to add a reference to it in the main index.jsp file.

For those of you who don’t know or who have never poked around, under the $OPENNMS_HOME directory should be a directory called jetty-webapps. That is the web root directory for the Jetty servlet container that ships with OpenNMS.

Under that directory you’ll find a subdirectory for opennms. When you surf to http://[my OpenNMS Server]:8980/opennms that is the directory you are visiting. In it is an index.jsp file that serves as the main page.

If you are familiar with HTML, the JSP file is very similar. It can contain references to Java code, but a lot of it is straight HTML. The file is kept simple on purpose, with each of the three columns on the main page indicated by comments. The part you will need to change is the third column:

<!-- Right Column -->
        <div class="col-md-3" id="index-contentright">
                <!-- weather box -->
                <jsp:include page="/includes/weather.jsp" flush="false" />

Feel free to look around. If you ever wanted to rearrange the OpenNMS Home page, this is a good place to start.

Now, I used to like poking around with these files since they would update automatically, but later versions of OpenNMS (which contain later versions of Jetty) seem to require a restart. If you get an error, restart OpenNMS and see if it goes away.

Now the weather.jsp file gets generated by Jeremy’s python script. In order to get that to work you’ll need to do two things. The most important is to get an API key from Weather Underground. It is a pretty easy process, but be aware that you can only do 500 queries a day without paying. The second thing you’ll need to do is edit the three URLs in the script and change the location. It is currently set to “CA/San_Francisco” but I was able to change it to “NC/Pittsboro” and it “just worked”.

Finally, you’ll need to set the script up to run via cron. I’m not sure how frequently Weather Underground updates the data, but a 10 minute interval seems to work well. That’s only 144 queries a day, so you could easily double it and still be within your limit.

[IMPORTANT UPDATE: Jeremy pointed out that the script actually does three queries, not just one, so instead of doing 144 queries a day, it’s 432. Still leaves some room with 10 minute queries but you don’t want to increase the frequency too much.]

Thanks to Jeremy for taking the time to share this. Remember, once you get it working, if you upgrade OpenNMS you’ll need to edit index.jsp and add it back, but that should be the only change needed.

OpenNMS at Scale

So, yes, the gang from OpenNMS will be at the SCaLE conference this weekend (I will not be there, unfortunately, due to a self-imposed conference hiatus this year). It should be a great time, and we are happy to be a Gold Sponsor.

But this post is not about that. This is about how Horizon 17 and data collection can scale. You can come by the booth at SCaLE and learn more about it, but here is the overview.

When OpenNMS first started, we leveraged the great application RRDTool for storing performance data. When we discovered a java port called JRobin, OpenNMS was modified to support that storage strategy as well.

Using a Round Robin database has a number of advantages. First, it’s compact. Once the file containing the RRD database is created, it never grows. Second, we used RRDTool to also graph the data.

However, there were problems. Many users had a need to store the raw collected data. RRDTool uses consolidation functions to store a time-series average. But the biggest issue was that writing lots of files required really fast hard drives. The more data you wanted to store, the greater your investment in disk arrays. Ultimately, you would hit a wall, which would require you to either reduce your data collection or partition out the data across multiple systems.

No more. With Horizon 17 OpenNMS fully supports a time-series database called Newts. Newts is built on Cassandra, and even a small Cassandra cluster can handle tens of thousands of inserts a second. Need more performance? Just add more nodes. Works across geographically distributed systems as well, so you get built-in high availability (something that was very difficult with RRDTool).

Just before Christmas I got to visit a customer on the Eastern Shore of Maryland. You wouldn’t think that location would be a hotbed of technical excellence, but it is rare that I get to work with such a quick team.

They brought me up for a “Getting to Know You” project. This is a two day engagement where we get to kick the tires on OpenNMS to see if it is a good fit. They had been using Zenoss Core (the free version) and they hit a wall. The features they wanted were all in the “enterprise” paid version and the free version just wouldn’t meet their needs. OpenNMS did, and being truly open source it fit their philosophy (and budget) much better.

This was a fun trip for me because they had already done most of the work. They had OpenNMS installed and monitoring their network, and they just needed me to help out on some interesting use cases.

One of their issues was the need to store a lot of performance data, and since I was eager to play with the Newts integration we decided to test it out.

In order to enable Newts, first you need a Cassandra cluster. It turns out that ScyllaDB works as well (more on that a bit later). If you are looking at the Newts website you can ignore the instructions on installing it as it it built directly into OpenNMS.

Another thing built in to OpenNMS is a new graphing library called Backshift. Since OpenNMS relied on RRDTool for graphing, a new data visualization tool was needed. Backshift leverages the RRDTool graphing syntax so your pre-defined graphs will work automatically. Note that some options, such as CANVAS colors, have not been implemented yet.

To switch to newts, in the opennms.properties file you’ll find a section:

###### Time Series Strategy ####
# Use this property to set the strategy used to persist and retrieve time series metrics:
# Supported values are:
#   rrd (default)
#   newts

org.opennms.timeseries.strategy=newts

Note: “rrd” strategy can refer to either JRobin or RRDTool, with JRobin as the default. This is set in rrd-configuration.properties.

The next section determines what will render the graphs.

###### Graphing #####
# Use this property to set the graph rendering engine type.  If set to 'auto', attempt
# to choose the appropriate backend depending on org.opennms.timeseries.strategy above.
# Supported values are:
#   auto (default)
#   png
#   placeholder
#   backshift
org.opennms.web.graphs.engine=auto

If you are using Newts, the “auto” setting will utilize Backshift but here is where you could set Backshift as the renderer even if you want to use an RRD strategy. You should try it out. It’s cool.

Finally, we come to the settings for Newts:

###### Newts #####
# Use these properties to configure persistence using Newts
# Note that Newts must be enabled using the 'org.opennms.timeseries.strategy' property
# for these to take effect.
#
org.opennms.newts.config.hostname=10.110.4.30,10.110.4.32
#org.opennms.newts.config.keyspace=newts

There are a lot of settings and most of those are described in the documentation, but in this case I wanted to demonstrate that you can point OpenNMS to multiple Cassandra instances. You can also set different keyspace names which allows multiple instances of OpenNMS to talk to the same Cassandra cluster and not share data.

From the “fine” documentation, they also recommend that you store the data based on the foreign source by setting this variable:

org.opennms.rrd.storeByForeignSource=true

I would recommend this if you are using provisiond and requisitions. If you are currently doing auto-discovery, then it may be better to reference it by nodeid, which is the default.

I want to point out two other values that will need to be increased from the defaults: org.opennms.newts.config.ring_buffer_size and org.opennms.newts.config.cache.max_entries. For this system they were both set to 1048576. The ring buffer is especially important since should it fill up, samples will be discarded.

So, how did it go? Well, after fixing a bug with the ring buffer, everything went well. That bug is one reason that features like this aren’t immediately included in Meridian. Luckily we were working with a client who was willing to let us investigate and correct the issue. By the time it hits Meridian 2016, it will be completely ready for production.

If you enable the OpenNMS-JVM service on your OpenNMS node, the system will automatically collected Newts performance data (assuming Newts is enabled). OpenNMS will also collect performance data from the Cassandra cluster including both general Cassandra metrics as well as Newts specific ones.

This system is connected to a two node Cassandra cluster and managing 3.8K inserts/sec.

Newts Samples Inserted

If I’m doing the math correctly, since we collect values once every 300 seconds (5 minutes) by default, that’s 1.15 million data points, and the system isn’t even working hard.

OpenNMS will also collect on ring buffer information, and I took a screen shot to demonstrate Backshift, which displays the data point as you mouse over it.

Newts Ring Buffer

Horizon 17 ships with a load testing program. For this cluster:

[root@nms stress]# java -jar target/newts-stress-jar-with-dependencies.jar INSERT -B 16 -n 32 -r 100 -m 1 -H cluster
-- Meters ----------------------------------------------------------------------
org.opennms.newts.stress.InsertDispatcher.samples
             count = 10512100
         mean rate = 51989.68 events/second
     1-minute rate = 51906.38 events/second
     5-minute rate = 38806.02 events/second
    15-minute rate = 31232.98 events/second

so there is plenty of room to grow. Need something faster? Just add more nodes. Or, you can switch to ScyllaDB which is a port of Cassandra written in C. When run against a four node ScyllaDB cluster the results were:

[root@nms stress]# java -jar target/newts-stress-jar-with-dependencies.jar INSERT -B 16 -n 32 -r 100 -m 1 -H cluster
-- Meters ----------------------------------------------------------------------
org.opennms.newts.stress.InsertDispatcher.samples
             count = 10512100
         mean rate = 89073.32 events/second
     1-minute rate = 88048.48 events/second
     5-minute rate = 85217.92 events/second
    15-minute rate = 84110.52 events/second

Unfortunately I do not have statistics for a four node Cassandra cluster to compare it directly with ScyllaDB.

Of course the Newts data directly fits in with the OpenNMS Grafana integration.

Grafana Inserts per Second

Which brings me to one down side of this storage strategy. It’s fast, which means it isn’t compact. On this system the disk space is growing at about 4GB/day, which would be 1.5TB/year.

Grafana Disk Space

If you consider that the data is replicated across Cassandra nodes, you would need that amount of space on each one. Since the availability of multi-Terabyte drives is pretty common, this shouldn’t be a problem, but be sure to ask yourself if all the data you are collecting is really necessary. Just because you can collect the data doesn’t mean you should.

OpenNMS is finally to the point where the storing of performance data is no longer an issue. You are more likely to hit limits with the collector, which in part is going to be driven by the speed of the network. I’ve been in large data centers with hundreds of thousands of interfaces all with sub-millisecond latency. On that network, OpenNMS could collect on hundreds of millions of data points. On a network with lots of remote equipment, however, timeouts and delays will impact how much data OpenNMS could collect.

But with a little creativity, even that goes away. Think about it – with a common, decentralized data storage system like Cassandra, you could have multiple OpenNMS instances all talking to the same data store. If you have them share a common database, you can use collectd filters to spread data collection out over any number of machines. While this would take planning, it is doable today.

What about tomorrow? Well, Horizon 18 will introduce the OpenNMS Minion code. Minions will allow OpenNMS to scale horizontally and can be managed directly from OpenNMS – no configuration tricks needed. This will truly position OpenNMS for the Internet of Things.

Avoiding the Sad Graph of Software Death

Seth recently sent me to an interesting article by Gregory Brown discussing a “death spiral” often faced by software projects when issues and feature requests start to out pace the ability to close them.

Sad Graph of Death

Now Seth is pretty much in charge of managing our Jira instance, which is key to managing the progress of OpenNMS software development. He decided to look at our record:

OpenNMS Issues Graph

[UPDATE: Logged into Jira to get a lot more issues on the graph]

Not bad, not bad at all.

A lot of our ability to keep up with issues comes from our project’s investment in using the tool. It is very easy to let things slide, resulting the the first graph above and causing a project to possibly declare “issue bankruptcy“. Since all of this information is public for OpenNMS, it is important to keep it up to date and while we never have enough time for all the things we need to do, we make time for this.

I think it speaks volumes for Seth and the rest team that OpenNMS issues are managed so well. In part it comes naturally from “the open source way” since projects should be as transparent as possible, and managing issues is a key part of that.

Annual LinuxQuestions Poll

Just a quick note that the annual LinuxQuestions “Member’s Choice” poll is out. While I don’t believe OpenNMS is known to many of the members of that site, if you feel like showing it a little love, please register and vote.

http://www.linuxquestions.org/questions/2015-linuxquestions-org-members-choice-awards-117/network-monitoring-application-of-the-year-4175562720/

Many thanks to Jeremy Garcia for maintaining that site and including OpenNMS.

♫ Don’t Call It a Comeback ♫

Welcome to 2016. My year started out with an invitation to join the AARP. (sigh)

As my three readers know, when it comes to this business of open source we are pretty much making things up as we go along. We are lead by our business plan of “spend less money than you earn” and our mission statement of “help customers, have fun, make money” but the rest is pretty fluid.

In 2013 we mixed things up and tried a more “traditional” start up path by seeking out investment and spending more money than we had. It didn’t work out so well.

Thus 2014 was more of a rebuilding year as we tried to move the focus back to our roots. It paid off, as 2015 was a very good year. We had record gross revenues, and although we didn’t make much money on the bottom line, it was positive once again. At the moment we are still investing in the company and the project so pretty much every extra dollar goes into growth.

And we had a lot of growth. The decision to split OpenNMS into Meridian and Horizon paid off in three major Horizon releases. Horizon 17 was an especially large and important release as it brought in the Newts integration. At the moment we are working with it on a customer site using a ScyllaDB cluster capable of supporting 75K inserts per second. The technologies introduced in 2015 will make it in to Meridian 2016, due in the spring, and it should solidify OpenNMS as a platform that can really scale.

In 2015 we also received orders from two of the Fortune 5 companies. I’ll leave it as an exercise to the reader to guess which two and you have a 1 in 16 shot at getting it right (grin). The fact that companies that can choose, literally, any technology they want yet they choose OpenNMS speaks volumes.

One of these days we’re going to have to figure out a way to talk about our customers by name, since they are all so cool. We are working on it, but it is surprisingly difficult to get permission to publicly post that information. Above all we respect our clients’ privacy.

I have high expectations for 2016 and the power of the Open Source Way. Thanks to everyone who has supported us over the last decade and more, and we just hope you find our efforts provide some value.

Happy New Year.

Horizon 16.0.4 Security Release

In response to the Apache Commons library that OpenNMS uses, version 16.0.4 has been released to help secure against a remote exploit.

The exploit involves Java Remote Method Invocation (RMI) which listens on port 1099 by default. In my previous post I pointed out that if that port is inaccessible, then the exploit can’t happen.

What 16.0.4 does is limit RMI to only listen on localhost. While that will prevent remote exploits even in the event port 1099 is blocked via the firewall, it doesn’t completely solve the problem. To fix the root cause of the issue will require changes to Apache Commons, and we are ready to upgrade to the fixed version as soon as it is available.

We tend to be very internally critical of security issues within OpenNMS, and some people complained that my last post wasn’t technical enough. So I’m hoping to correct that with this one, but if you don’t care about such things you should probably skip it (grin). I have started updating the Security Considerations page on the wiki with details about securing OpenNMS in general, and that will have better information for people interested in security and OpenNMS than this blog post.

While blocking external access to port 1099 will secure OpenNMS against this attack for most people, it doesn’t prevent people who have access to the machine from exploiting the vulnerability. This is called a “privilege escalation” attack vs. a “remote exploit”, as a “normal” user can now have rights (i.e. root access) if they are locally on the machine. Most of our users tend to limit shell access to the server, so this shouldn’t be a problem, but in environments that rely heavily on directory services such as LDAP, the default may be to allow non-privileged access to certain users (say, the “IT Group”) that aren’t involved in maintaining OpenNMS.

And there is also the slim chance that there is a vulnerability in our webUI that could allow a user access to the system. We, of course, don’t know of any and we take great care to prevent it, but simply hoping to limit access to the server as a way to prevent this exploit is insufficient.

So, to prevent it entirely, we are removing RMI. It was introduced in the first iteration of the OpenNMS Remote Poller, but real world installation found that getting the proper ports open was a real pain. So instead the remote poller now talks over HTTP/HTTPS (with the latter being the most secure). Most networks have ports 80 and 443 open, so that made things a lot easier.

Until that is introduced (most likely with Horizon 17), it is still a good idea to limit access to the OpenNMS server to only essential people.

Note that Java Management Extensions (JMX) also use serialized objects and thus could be vulnerable. OpenNMS has a JMX port (18980) but it is bound to localhost by default. In fact, all ports are bound to localhost by default in 16.0.4 except for the webUI, port 8980.

There are a number of other steps you can take to harden your OpenNMS server. I’m planning on detailing them on the wiki, but start with only doing a minimal operating system install. The less software on the system, the smaller the chance one will have a vulnerability.

Also, OpenNMS currently runs as the “root” user. This is due to the fact that it needs access to ICMP traffic as well as port 162 for SNMP traps. Both of these require root by default. With some “stupid kernel tricks” you can run OpenNMS as a non-root user, but it has not been heavily tested. We have a detailed list of issues for running as non-root on our Jira instance.

Sorry to drone on about this, but we take security extremely seriously at OpenNMS. We also have to labor under the misconception that Java is inherently unsafe. It is not true, although people still have nightmares from the early issues with client-side Java applets. The Java in OpenNMS is server-side and we don’t use applets, and the language is used securely in a tremendous amount of software.

For comparison, WordPress, an application I love, is currently estimated to run 25% of the world’s websites. It is written in PHP, a language that has a huge track record of security exploits, and many of the spam e-mails I get link to compromised WordPress sites.

It is possible to secure WordPress (we use it for all of our websites as well) but it takes some diligence. We will remain as diligent as we can concerning the security of OpenNMS, and we will continue to take steps to make it even more secure.

Dublin OpenNMS Meetup

I’m working in Ireland this week, and our UK/Irish Ambassador, Dr. Craig Gallen, used the opportunity to put together an OpenNMS meetup, featuring beer and pizza (grin).

We held it in an office space near Temple Bar thanks to Barry Alistair. Among his many talents, he is also one of the organizers behind IrishDev.com, an on-line community for the Irish Software Developers Network.

Ulf at Dublin Meetup

It was a lot of fun. We socialized for a bit, and Craig had arranged the pizza to arrive at the end of our talks in order to reward folks for listening to us hold forth on the wonders of OpenNMS (the beer was on offer first, ‘natch). Once again I ran long and the pizza was consumed between my introduction and Craig’s presentation. I did an overview of the history of OpenNMS and why using open source, especially for a network management platform, is a Good Thing™.

Craig at Dublin Meetup

Craig’s presentation was much better, and covered a lot of the new features that have recently been added to the application as well as the direction the product was moving (such as being positioned for SDN/NFV/Internet of Thingies). No one left or fell asleep and there were lots of good questions.

Events such as this are one of my favorite things to do, so I want to thank Barry and Craig for making it possible.

The Many Uses of Grafana

One of the things I love about open source and OpenNMS in particular is watching what people do with it. We knew that we had a great data collector in OpenNMS but sometimes it was hard to display that data in a useful fashion.

OpenNMS is a platform and it is very broad. For example, we do log management, but that is only a small portion of what the application can do, yet there are companies who do nothing but that. So yes, we can display graphs but we don’t necessarily have the resources to focus on making a great data visualization tool.

Enter open source. Torkel Ödegaard has written a great visualization tool in Grafana, so it would be silly for us not to leverage it.

I was at a customer site I and I saw this cool graph:

Grafana Graph

I asked Patrick about it, and he said that he wanted to play with the OpenNMS/Grafana integration so he installed it and within a half hour he had it up and running. He created the graph as a version of the “stacky graphs” you can make in OpenNMS, but it was much easier to do and to maintain.

The name “stacky graphs” came from another customer of ours. They asked me if there was a way to put the bandwidth from all of their peer points on one graph. Now, in OpenNMS, it is easy to make a graph of data from a single device, and it is easy to group multiple graphs together, but it was not easy to put disparate data points on a single graph.

However, OpenNMS is a platform so I was able to find a way. When you create a graph definition in OpenNMS, there are two important fields, called “columns” and “type”. The “columns” value defines the file to look for, say ifInOctets.rrd and ifOutOctets.rrd, and the “type” value tells OpenNMS where to look for those files. So what I did was create symbolic links under the OpenNMS node directory named things like LAX-in.rrd, LAX-out.rrd and NYC-in.rrd, NYC-out.rrd that were linked to the interface RRDs of interest. Then I created a report of type “nodeSnmp” with column names like “LAX-in, LAX-out, NYC-in, NYC-out” etc. Then I could use AREA graphs to print out the data.

This was a pain for a number of reasons. First, you had to do a lot of configuration on the command line. Second, sometimes it is useful to delete .rrd files that haven’t been updated in awhile, but if you aren’t careful you’ll delete the symlinks. Finally, it is a lot of work to add new data sources.

Grafana Graph vs. RRDtool

In this picture you can see the Grafana dashboard in the lower left corner and the OpenNMS “stacky graph” in the upper right. Not only does the Grafana version look better, it will be easier to maintain moving forward.

I am eager to see what others are doing with this, so feel free to check out the integration on the wiki and let me know if you come up with anything cool.

OpenNMS RMI Exploit

Recently, my RSS feed on OpenNMS stories turned up an article listing a possible remote code execution exploit in a number of applications, including OpenNMS.

In it, the researcher shows that it is possible to execute code on the OpenNMS server remotely due to a bug in the Apache commons library, which OpenNMS uses.

We’re a little unhappy that they published this without letting us know first (note that the e-mail address “security at opennms dot org” exists for reporting such things), but it is pretty easy to make sure that your instance of OpenNMS is safe. Simply configure the server’s firewall to disable remote access to port 1099 (it will need to remain for localhost).

I was happy to notice that the example he uses seems to be related to OpenNMS running on Windows. It can be a bit tricky to get OpenNMS to work on Windows, and perhaps the Windows default firewall doesn’t block port 1099 so that it why they noticed it.

It is a good idea to run something like iptables on your OpenNMS server and limit remote access to a minimal set of ports. Technically, the only port you really need access to is 8980, which is the default port for the webUI. I would assume that you would want port 22 for ssh access (unless you want to use the console for all configuration). In addition, port 162 should be open for SNMP trap reception.

That should be it. Now the application needs access to other ports (such as 5817 for events) so those need to remain accessible from localhost (127.0.0.1 or ::1) but that limits all exposure to only people who have shell access to the server, which we assume you limit to those people you trust. Remember to include IPv6 firewall rules if you use it.

An easy test to see if that port is remotely accessible would be to run:

telnet [IP or hostname of OpenNMS server] 1099

from a remote system to see if you can access the port. No connection should be made.

Sorry about this, but as I mentioned this wasn’t revealed to us until after the exploit was public. We are looking in to how we can better protect against this issue from a code change standpoint, but until then simply blocking access to the port will prevent most problems. We do plan to have a code fix in place soon.