Nextcloud and OpenNMS

Last weekend, OpenNMS-er extraordinare Ronny Trommer was at a conference where he met Jos Poortvliet from Nextcloud. I’ve been following Nextcloud pretty intently since I recognized kindred souls in their desire to create a business that was successful and still 100% open source (and not, for example, fauxpensource). Jos mentioned that Nextcloud was getting a new monitoring API and thought it would be cool if OpenNMS could use it.

Since their API returns the monitoring information as XML, Ronny used the XML Collector to gather the data. Once the data is in OpenNMS, you can graph it, set thresholds, configure notifications, etc.

Available metrics include:

  • CPU load and memory usage
  • Number of active users over time
  • Number of shares in various categories
  • Storage statistics
  • Server settings like PHP version, database type and size, memory limits and more

Here’s an example of the number of files from a small demo system:

Files in Nextcloud

Of course, since OpenNMS is a platform, once the data is in the system you can leverage its integrations with applications such as Grafana:

Nextcloud Metrics in Grafana

Some applications will go on and on about how many “plugins” they have. Often, these are little more than scripts that do something simple, like an SNMP GET, but with all the overhead of having to run a shell. To add something like Nextcloud to OpenNMS, it is just a simple matter of configuring a couple of files, but to make that easier a lot of configurations have been added to a git repository. If you want to try out the Nextcloud integration, follow these instructions.

True open source solutions can offer the best feature, performance and value for most companies, but unfortunately there are so few pure open source companies providing them. I applaud Nextcloud and look forward to working with them for years to come.

Nagios XI vs. OpenNMS Meridian – the Return of the FUD

It seems like our friends over at Nagios have been watching a little too much election coverage this year, and they’ve updated their “Nagios vs. OpenNMS” document with even more rhetoric and misinformation.

As my three readers may recall, back in 2011 I tore apart the first version of this document. Now they have decided to update it to target our Meridian™ version.

Let’s see how they did (please look at it and follow along as it is quite amusing).

The first misleading bit is the opening paragraph with the phrase “most widely used open-source monitoring project in the world”. Now, granted, they do indicate that means “Nagios Core” but it seems a little disingenuous since what they are selling is Nagios XI, which is much different.

Nagios XI is not open source. It is published under the “Nagios Open Software License” which is about as proprietary as they get. I’m not even sure why the word “open” was added, except to further mislead people into thinking it is open source. The license contains clauses like “The Software may not be Forked” and “The Software may only be used in conjunction with products, projects, and other software distributed by the Company.” Think about it, you can’t even integrate Nagios XI with, say, a home grown trouble ticketing system without violating the license. Doesn’t sound very “open” at all. OpenNMS Meridian is published under the AGPLv3, or a similar proprietary license should your organization have an issue with the AGPL. You don’t have that choice with Nagios XI.

Next, let’s check out the price. The OpenNMS Group has always published its prices on-line. One instance of Meridian, which includes support in the form of access to our “Connect” community, is $6,000. They have it listed as $25,995, which is the price should you choose the much more intensive “Prime” support option. I’m not sure why they didn’t just choose our most expensive product, Ultra Support with the 24×7 option, to make them seem even better.

Nagios XI Node Limitation

Also, note the fine print “Price based on one instance of XI with 220 nodes/devices”. There is no device limit with OpenNMS Meridian. So let’s be clear, for $6000 you get access to the Meridian software under an open source license versus $5000 to monitor 220 nodes with extreme limitations on your rights.

Our smaller customers tend to have around 2000 devices, which means to manage that with Nagios XI you would need roughly ten instances costing nearly $50,000 (using the math presented in this document). And from the experience we’ve heard with customers coming to us from Nagios, the reason it is limited to so few nodes is that you probably can’t run much more on a single instance of Nagios XI. Compare that to OpenNMS where we have customers with over 100,000 devices in a single instance (and they’ve been running it for years).

We also price OpenNMS as a platform. You get everything: trouble-ticketing integration, graphing, reporting, etc. in one application. It looks like Nagios has decided to nickel and dime you for logs, etc. and a thing called “Nagios Fusion” which you’ll need to manage your growing number of Nagios instances since it won’t natively scale. And remember, due to the license you are forbidden from using the software with your own tools.

I especially had to laugh at the “You Speak, We Listen” part. If you have a feature or change you need, if you ask nicely they might make it for you. With OpenNMS Meridian you are free to make any changes you need since it is 100% open source, and with our open issue tracker we address dozens of user requests each point release.

Finally, there is the feature comparison, which at a minimum is misleading and is often just blatantly false. Almost every feature marked as lacking in Meridian exists, and at a level far beyond what Nagios XI can provide. Seriously, is it really objective to state that OpenNMS doesn’t support Nagvis, a specific tool that even has “Nagios” in the name?


I had to laugh at the hubris. They obviously didn’t Google “opennms nagvis“, because, guess what? There has been an OpenNMS Nagvis integration for some time now, contributed by our community. Just in case you were wondering, we have an integration with Network Weathermap as well.

Nagios is just another proprietary software product that wants to lock you into its ecosystem, and this is just a shameful attempt to monetize an application that is long past its prime. Heck, it was the inability of the Nagios leadership to get along with others that resulted in the very popular Icinga fork, and with it Nagios lost a lot of contribution that helped make up its “Thousands of Free Add-Ons” (and the way Nagios took over the community lead plug-in site was also poorly handled). Plus, many of those add-ons won’t scale in an enterprise environment, which probably lead to the 220 device limit.

Compare that to OpenNMS. We not only want to encourage you integrate with other products, we do a lot of it for you. OpenNMS has great graphing, but we also created the first third party plug-in for Grafana. When it comes to mapping, OpenNMS is on the leading edge, with a focus on various topology views that can ultimately handle millions of devices in a fashion that is actually usable. Need to see a Layer 2 topology? Choose the “enhanced linkd view”. Run VMware and Vcenter? It is simple to import all of your machines and see them in a view that shows hosts, guests and network storage. Plus the unique ability to focus on just those devices of interest allows you to use a map with hundreds of thousands if not millions of nodes.

Nagios Map

Compare that to the Nagios map screenshot where it looks like “localhost” is having some issues. Oh no, not localhost! That’s like, all of my machines.

As for “Business Process Intelligence” I’ve been told that the Nagios XI version is like our Business Service Monitor “Except BSM is more featureful, and has a significantly better UI/UX”. Need real Business Intelligence? OpenNMS has Red Hat Drools support, the open source leader, built right into the product.

We also support integration with popular Trouble Ticketing systems such as Request Tracker, Jira, OTRS and Remedy. And the kicker is that you can also run any Nagios check script natively in OpenNMS using the “System Execute Monitor“, but once you get used to the OpenNMS platform, why would you?

I’m not really sure why Nagios goes out of its way to spread fear, uncertainty and doubt about OpenNMS. We rarely compete in the same markets. I’m sure that Sunrise Community Banks get their money’s worth from Nagios, and for companies like NRS Small Business Solutions, Nagios might be a good fit. But if you have enterprise and carrier-level requirements, there is no way Nagios will work for you in the long term.

When a company does something like this to mislead, from wrong information about our product to using terms like “open” when they mean “closed”, it shows you what they think of their competition. What does it say about what they think about their customers?

New Fancy Website for

As some of you may have noticed, a little while ago the OpenNMS Project website got updated to a new, fancy, responsive version.

OpenNMS Platform

This was mainly the work of Ronny Trommer with a big assist from our graphic designer, Jessica.

We are often so busy working on the code we often forget how important it is to tell people about what we are doing. Most people who take the time to learn about the project realize how awesome it is, but it can be hard to get over that first hump in the learning curve.

I hope that the new site will both reflect the benefits of using OpenNMS as well as the work of the community behind it.

OpenNMS and Elasticsearch

With Horizon 18 we added support for sending OpenNMS events into Elasticsearch. Unfortunately, it only works with Elasticsearch 1.0. Elasticsearch 2.0 and higher requires Camel 17, but OpenNMS can’t use it. I wondered why, and if you were wondering too, here is the answer from Seth:

Camel 17 has changed their OSGi metadata to only be compatible with Spring 4.1 and higher. We’re still using Spring 4.0 so that’s one problem. The second issue is that ActiveMQ’s OSGi metadata bans Spring 4.0 and higher. So currently, ActiveMQ and Camel are mutually incompatible with one another inside Karaf at any version higher than the ones that we are currently running.

The biggest issue is the ActiveMQ problem, I’ve opened this bug and it sounds like they’re going to address it in their next major release

So there you have it.

Choose the Right Thermometer

Okay, so I have a love/hate relationship with Centurylink. Centurylink provides a DSL circuit to my house. I love the fact that I have something resembling broadband with 10Mbps down and about 1Mbps up. Now that doesn’t even qualify as broadband according to the FCC, but it beats the heck out of the alternatives (and I am jealous of my friends with cable who have 100Mbps down or even 300Mbps).

The hate part comes from reliability, which lately has been crap. This post is actually focused on OpenNMS so I won’t go into all of my issues, but I’ve been struggling with long outages in my service.

The latest issue is a new one: packet loss. Usually the circuit is either up or completely down, but for the last three days I’ve been having issues with a large percentage of dropped packets. Of course I monitor my home network from the office OpenNMS instance, and this will usually manifest itself with multiple nodeLostService events around HTTP since I have a personal web server that I monitor.

The default ICMP monitor does not measure packet loss. As long as at least one ping reply makes it, ICMP is considered up, so the node itself remains up. OpenNMS does have a monitor for packet loss called Strafeping. It sends out 20 pings in a short amount of time and then measures how long they take to come back. So I added it to the node for my home and I saw something unusual: a consistent 19 out of 20 lost packets.

Strafeping Graph

Power cycling the DSL modem seems to correct the problem, and the command line ping was reporting no lost packets, so why was I seeing such packet loss from the monitor? Was Strafeping broken?

While it is always a possibility, I didn’t think that Strafeping was broken, but I did check a number of graphs for other circuits and they looked fine. Thus it had to be something else.

This brings up a touchy subject for me: false positives. Is OpenNMS reporting false problems?

It reminds me of an event happened when I was studying physics back in the late 1980s. I was working with some newly discovered ceramic material that exhibited superconductivity at relatively high temperatures (around 92K). That temperature can be reached using liquid nitrogen, which was relatively easy to source compared to cooler liquids like liquid helium.

I needed to measure the temperature of the ceramic, but mercury (used in most common thermometers) is a solid at those temperatures, so I went to my advisor for suggestions. His first question to me was “What does a thermometer measure?”

I thought it was a trick question, so I answered “temperature” (“thermo” meaning temperature and meter meaning “to measure”). He replied, “Okay, smart guy, the temperature of what?”

That was harder to answer exactly, so I said vague things like the ambient environment, whatever it was next to, etc. He interrupted me and said “No, a thermometer measures one thing: the temperature of the thermometer”.

This was an important lesson, even though it seems obvious. In the case of the ceramic it meant a lot of extra steps to make sure the thermometer we were using (which was based on changes in resistance) was as close to the temperature of the material as possible.

What does that have to do with OpenNMS? Well, OpenNMS is like that thermometer. It is up to us to make sure that the way we decide to use it for monitoring is as close to our criteria as possible. A “false positive” usually indicates a problem with the method versus the tool – OpenNMS is behaving exactly as it should but we need to match it better to what we expect.

In my case I found out the router I use was limited by default to responding 1 ping per second (to avoid DDoS attacks I assume), so last night when I upped that to allow 20 pings per second Strafeping started to work as expected (as you can see in the graph above).

This allowed me to detect when my DSL circuit packet loss started again today. A little after 14:00 the system detected high packet loss. When this happened before, power cycling the modem seemed to fix it, so I headed home to do just that.

While I was on the way, around 15:30, the packet loss seemed to improve, but as you can see from the graph the ping times were all over the place (the line is green but there is a lot of extra “smoke” around it indicating a variance in the response times). I proactively power cycled the modem and things settled down. The Centurylink agent agreed to send me a new modem.

The point of this post is to stress that you need to understand how your monitoring tools actually work and you can often correct issues that make a monitor unusable and turn it into to something useful. Choose the right thermometer.

OpenNMS Horizon 18 “Tardigrade” Is Now Available

I am extremely happy to announce the availability of Horizon 18, codenamed “Tardigrade”. Ben is responsible for naming our releases and he’s decided that the theme for Horizon 18 will be animals. The name “Tardigrade” was suggested in the IRC channel by Uberpenguin, and while they aren’t the prettiest things, Wikipedia describes them as “perhaps the most durable of known organisms” so in the context of OpenNMS that is appropriate.

OpenNMS Horizon 18

I am also happy to see the Horizon program working. When we split OpenNMS into Horizon and Meridian, the main reason was to drive faster development. Now instead of a new stable release every 18 months, we are getting them out every 3 to 4 months. And these are great releases – not just major releases in name only.

The first thing you’ll notice if you log in to Horizon 18 as a user in the admin role is that we’ve added a new “opt-in” feature that let’s us know a little bit about how OpenNMS is being used by people. We hope that most of you will choose to send us this information, and in the spirit of the Open Source Way we’ve made all of the statistics available publicly.

OpenNMS Opt-In Screen

One of the key things we are looking for is the list of SNMP Object IDs. This will let us know what devices are being monitored by our users and to increase their level of support. Of course, this requires that your OpenNMS instance be able to reach the stats server on the Internet, and you can change your choice at any time on the Configuration admin page under “Data Choices”. It will only send this information once every 24 hours, so we don’t expect it to impact network traffic at all.

Once you’ve opted in, the next thing you’ll probably notice is new problem lists on the home page listing “services” and “applications”.

OpenNMS BSM Problem Lists

This related to the major feature addition in Horizon 18 of the Business Service Monitor (BSM).

OpenNMS BSM OpenDaylight

As people move from treating servers as pets to treating them like cattle, the emphasis has shifted to understanding how well applications and microservices are running as a whole instead of focusing on individual devices. The BSM allows you to configure these services and then leverage all the usual OpenNMS crunchy goodness as you would a legacy service like HTTP running on a particular box. The above screenshot comes from some prototype work Jesse has been doing with integrating OpenNMS with OpenDaylight. As you can see at a glance, while the ICMP service is down on a particular device, the overall Network Fabric is still functioning perfectly.

Another thing I’m extremely proud of is the increase in the quality of documentation. Ronny and the rest of the documentation team are doing a great job, and we’ve made it a requirement that new features aren’t complete without documentation. Please check out the release notes as an example. It contains a pretty comprehensive lists of changes in 18.

A few I’d like to point out:

Horizon 17 is one of the most powerful and stable releases of OpenNMS ever, and we hope to continue that tradition with Horizon 18. Hats off to the team for such great work.

Here is a list of all the issues addressed in Horizon 18:

Release Notes – OpenNMS – Version 18.0.0


  • [NMS-3489] – "ADD NODE" produces "too much" config
  • [NMS-4845] – RrdUtils.createRRD log message is unclear
  • [NMS-5788] – should be deprecated and removed
  • [NMS-5839] – Bring WaterfallExecutor logging on par with RunnableConsumerThreadPool
  • [NMS-5915] – The retry handler used with HttpClient is not going to do what we expect
  • [NMS-5970] – No HTML title on Topology Map
  • [NMS-6344] – does not import requisitions with spaces in the name
  • [NMS-6549] – Eventd does not honor reloadDaemonConfig event
  • [NMS-6623] – Update JNA.jar library to support ARM based systems
  • [NMS-7263] – not included in jar
  • [NMS-7471] – SNMP Plugin tests regularly failing
  • [NMS-7525] – ArrayOutOfBounds Exception in Topology Map when selecting bridge-port
  • [NMS-7582] – non RFC conform behaviour of SmtpMonitor
  • [NMS-7731] – Remote poller dies when trying to use the PageSequenceMonitor
  • [NMS-7763] – Bridge Data is not Collected on Cisco Nexus
  • [NMS-7792] – NPE in JmxRrdMigratorOffline
  • [NMS-7846] – Slow LinkdTopologyProvider/EnhancedLinkdTopologyProvider in bigger enviroments
  • [NMS-7871] – Enlinkd bridge discovery creates erroneous entries in the Bridge Forwarding Tables of unrelated switches when host is a kvm virtual host
  • [NMS-7872] – 303 See Other on requisitions response breaks the usage of the Requisitions ReST API
  • [NMS-7880] – Integration tests in org.opennms.core.test-api.karaf have incomplete dependencies
  • [NMS-7918] – Slow BridgeBridgeTopologie discovery with enlinkd.
  • [NMS-7922] – Null pointer exceptions with whitespace in requisition name
  • [NMS-7959] – Bouncycastle JARs break large-key crypto operations
  • [NMS-7967] – XML namespace locations are not set correctly for namespaces cm, and ext
  • [NMS-7975] – Rest API v2 returns http-404 (not found) for http-204 (no content) cases
  • [NMS-8003] – Topology-UI shows LLDP links not correct
  • [NMS-8018] – Vacuumd sends automation events before transaction is closed
  • [NMS-8056] – opennms-setup.karaf shouldn't try to start ActiveMQ
  • [NMS-8057] – Add the .xml and .cfg files to the Minion repo webapp
  • [NMS-8058] – Poll all interface w/o critical service is incorrect
  • [NMS-8072] – NullPointerException for NodeDiscoveryBridge
  • [NMS-8079] – The OnmsDaoContainer does not update its cache correctly, leading to a NumberFormatException
  • [NMS-8080] – VLAN name is not displayed
  • [NMS-8086] – Provisioning Requisitions with spaces in their name.
  • [NMS-8096] – JMX detector connection errors use wrong log level
  • [NMS-8098] – PageSequenceMonitor sometimes gives poor failure reasons
  • [NMS-8104] – init script checkXmlFiles() fails to pick up errors
  • [NMS-8116] – Heat map Alarms/Categories do not show all categories
  • [NMS-8118] – CXF returning 204 on NULL responses, rather than 404
  • [NMS-8125] – Memory leak when using Groovy + BSF
  • [NMS-8128] – NPE if provisioning requisition name has spaces
  • [NMS-8137] – OpenNMS incorrectly discovers VLANs
  • [NMS-8146] – "Show interfaces" link forgets the filters in some circumstances
  • [NMS-8167] – Cannot search by MAC address
  • [NMS-8168] – Vaadin Applications do not show OpenNMS favicon
  • [NMS-8189] – Wrong interface status color on node detail page
  • [NMS-8194] – Return an HTTP 303 for PUT/POST request on a ReST API is a bad practice
  • [NMS-8198] – Provisioning UI indication for changed nodes is too bright
  • [NMS-8208] – Upgrade maven-bundle-plugin to v3.0.1
  • [NMS-8214] – AlarmdIT.testPersistManyAlarmsAtOnce() test ordering issue?
  • [NMS-8215] – Chart servlet reloads Notifd config instead of Charts config
  • [NMS-8216] – Discovery config screen problems in latest code
  • [NMS-8221] – Operation "Refresh Now" and "Automatic Refresh" referesh the UI differently
  • [NMS-8224] – JasperReports measurements data-source step returning null
  • [NMS-8235] – Jaspersoft Studio cannot be used anymore to debug/create new reports
  • [NMS-8240] – Requisition synchronization is failing due to space in requisition name
  • [NMS-8248] – Many Rcsript (RScript) files in OPENNMS_DATA/tmp
  • [NMS-8257] – Test flapping: ForeignSourceRestServiceIT.testForeignSources()
  • [NMS-8272] – snmp4j does not process agent responses
  • [NMS-8273] – %post error when Minion host.key already exists
  • [NMS-8274] – All the defined Statsd's reports are being executed even if they are disabled.
  • [NMS-8277] – %post failure in opennms-minion-features-core: sed not found
  • [NMS-8293] – Config Tester Tool doesn't check some of the core configuration files
  • [NMS-8298] – Label of Vertex is too short in some cases
  • [NMS-8299] – Topology UI recenters even if Manual Layout is selected
  • [NMS-8300] – Center on Selection no longer works in STUI
  • [NMS-8301] – v2 Rest Services are deployed twice to the WEB-INF/lib directory
  • [NMS-8302] – Json deserialization throws "unknown property" exception due to usage of wrong Jax-rs Provider
  • [NMS-8304] – An error on threshd-configuration.xml breaks Collectd when reloading thresholds configuration
  • [NMS-8313] – Pan moving in Topology UI automatically recenters
  • [NMS-8314] – Weird zoom behavior in Topology UI using mouse wheel
  • [NMS-8320] – Ping is available for HTTP services
  • [NMS-8324] – Friendly name of an IP service is never shown in BSM
  • [NMS-8330] – Switching Topology Providers causes Exception
  • [NMS-8335] – Focal points are no longer persisted
  • [NMS-8337] – Non-existing resources or attributes break JasperReports when using the Measurements API
  • [NMS-8353] – Plugin Manager fails to load
  • [NMS-8361] – Incorrect documentation for org.opennms.newts.query.heartbeat
  • [NMS-8371] – The contents of the info panel should refresh when the vertices and edges are refreshed
  • [NMS-8373] – The placeholder {diffTime} is not supported by Backshift.
  • [NMS-8374] – The logic to find event definitions confuses the Event Translator when translating SNMP Traps
  • [NMS-8375] – License / copyright situation in release notes introduction needs simplifying
  • [NMS-8379] – Sluggish performance with Cassandra driver
  • [NMS-8383] – jmxconfiggenerator feature has unnecessary includes
  • [NMS-8386] – Requisitioning UI fails to load in modern browsers if used behind a proxy
  • [NMS-8388] – Document resources ReST service
  • [NMS-8389] – Heatmap is not showing
  • [NMS-8394] – NoSuchElement exception when loading the TopologyUI
  • [NMS-8395] – Logging improvements to Notifd
  • [NMS-8401] – There are errors on the graph definitions for OpenNMS JMX statistics
  • [NMS-8403] – Document styles of identifying nodes in resource IDs


  • [NMS-2504] – Create a better landing page for Configure Discovery aftermath
  • [NMS-4229] – Detect tables with Provisiond SNMP detector
  • [NMS-5077] – Allow other services to work with Path Outages other than ICMP
  • [NMS-5905] – Add ifAlias to bridge Link Interface Info
  • [NMS-5979] – Make the Provisioning Requisitions "Node Quick-Add" look pretty
  • [NMS-7123] – Expose SNMP4J 2.x noGetBulk and allowSnmpV2cInV1 capabilities
  • [NMS-7446] – Enhance Bridge Link Object Model
  • [NMS-7447] – Update BridgeTopology to use the new Object Model
  • [NMS-7448] – Update Bridge Topology Discovery Strategy
  • [NMS-7756] – Change icon for Dell PowerConnector switch
  • [NMS-7798] – Add Sonicwall Firewall Events
  • [NMS-7903] – Elasticsearch event and alarm forwarder
  • [NMS-7950] – Create an overview for the developers guide
  • [NMS-7965] – Add support for setting system properties via user supplied .properties files
  • [NMS-7976] – Merge OSGi Plugin Manager into Admin UI
  • [NMS-7980] – provide HTTPS Quicklaunch into node page
  • [NMS-8015] – Remove Dependencies on RXTX
  • [NMS-8041] – Refactor Enhanced Linkd Topology
  • [NMS-8044] – Provide link for Microsoft RDP connections
  • [NMS-8063] – Update asciidoc dependencies to latest 1.5.3
  • [NMS-8076] – Allow user to access local documentation from OpenNMS Jetty Webapp
  • [NMS-8077] – Add NetGear Prosafe Smart switch SNMP trap events and syslog events
  • [NMS-8092] – Add OpenWrt syslog and related event definitions
  • [NMS-8129] – Disallow restricted characters from foreign source and foreign ID
  • [NMS-8149] – Update asciidoctorj to 1.5.4 and asciidoctorjPdf to 1.5.0-alpha.11
  • [NMS-8152] – Collect and publish anonymous statistics to
  • [NMS-8160] – Remove Quick-Add node to avoid confusions and avoid breaking the ReST API
  • [NMS-8163] – Requisitions UI Enhancements
  • [NMS-8179] – ifIndex >= 2^31
  • [NMS-8182] – Add HTTPS as quick-link on the node page
  • [NMS-8205] – Generate events for alarm lifecycle changes
  • [NMS-8209] – Upgrade junit to v4.12
  • [NMS-8210] – Add support for calculating the derivative with a Measurements API Filter
  • [NMS-8211] – Add support for retrieving nodes with a filter expression via the ReST API
  • [NMS-8218] – External event source tweaks to admin guide
  • [NMS-8219] – Copyright bump on asciidoc docs
  • [NMS-8225] – Integrate the Minion container and packages into the mainline OpenNMS build
  • [NMS-8226] – Upgrade SNMP4J to version 2.4
  • [NMS-8238] – Topology providers should provide a description for display
  • [NMS-8251] – Parameterize product name in asciidoc docs
  • [NMS-8259] – Cleanup testdata in SnmpDetector tests
  • [NMS-8265] – SNMP collection systemDefs for Cisco ASA5525-X, ASA5515-X
  • [NMS-8266] – SNMP collection systemDefs for Juniper SRX210he2, SRX100h
  • [NMS-8267] – Create documentation for SNMP detector
  • [NMS-8271] – Enable correlation engines to register for all events
  • [NMS-8296] – Be able to re-order the policies on a requisition through the UI
  • [NMS-8334] – Implement org.opennms.timeseries.strategy=evaluate to facilitate the sizing process
  • [NMS-8336] – Set the required fields when not specified while adding events through ReST
  • [NMS-8349] – Update screenshots with 18 theme in user documentation
  • [NMS-8365] – Add metric counter for drop counts when the ring buffer is full
  • [NMS-8377] – Applying some organizational changes on the Requisitions UI (Grunt, JSHint, Dist)



  • [NMS-8236] – Move the "vaadin-extender-service" module to opennms code base

Agent Provocateur

I’ve been involved with the monitoring of computer networks for a long time, two decades actually, and I’m seeing an alarming trend. Every new monitoring application seems to be insisting on software agents. Basically, in order to get any value out of the application, you have to go out and install additional software on each server in your network.

Now there was a time when this was necessary. BMC Software made a lot of money with its PATROL series of agents, yet people hated them then as much as they hate agents now. Why? Well, first there was the cost, both in terms of licensing and in continuing to maintain them (upgrades, etc.). Next there was the fact that you had to add software to already overloaded systems. I can remember the first time the company I worked for back then deployed a PATROL agent on an Oracle database. When it was started up it took the database down as it slammed the system with requests. Which leads me to the final point, outside of security issues that arise with an increase in the number of applications running on a system, the moment the system experiences a problem the blame will fall on the agent.

Despite that, agents still seem to proliferate. In part I think it is political. Downloading and installing agents looks like useful work. “Hey, I’m busy monitoring the network with these here agents”. Also in part, it is laziness. I have never met a programmer who liked working on someone else’s code, so why not come up with a proprietary protocol and write agents to implement it?

But what bothers me the most is that it is so unnecessary. The information you need for monitoring, with the possible exception of Windows, is already there. Modern operating systems (again, with the exception of Windows) ship with an SNMP agent, usually based on Net-SNMP. This is a secure, powerful extensible agent that has been tried and tested for many years, and it is maintained directly on server itself. You can use SNMPv3 for secure communications, and the “extend” and “pass” directives to make it easy to customize.

Heck, even Windows ships with an extensible SNMP agent, and you can also access data via WMI and PowerShell.

But what about applications? Don’t you need an agent for that?

Not really. Modern applications tend to have an API, usually based on ReST, that can be queried by a management station for important information. Java applications support JMX, databases support ODBC, and when all that fails you can usually use good ol’ HTTP to query the application directly. And the best part is that the application itself can be written to guard against a monitoring query causing undue load on the system.

At OpenNMS we work with a lot of large customers, and they are loathe to install new software on all of their servers. Plus, many of our customers have devices that can’t support additional agents, such as routers and switches, and IoT devices such as thermostats and door locks. This is the main reason why the OpenNMS monitoring platform is, by design, agentless.

A critic might point out that OpenNMS does have an agent in the remote poller, as well as in the upcoming Minion feature set. True, but those act as “user agents”, giving OpenNMS a view into networks as if it was a user of those networks. The software is not installed on every server but instead it just needs the same access as a user would have. So, it can be installed on an existing system or on a small system purchased for that purpose, at a minimum just one for each network to be monitored.

While some new IT fields may require agents, most successful solutions try to avoid them. Even in newer fields such as IT automation, the best solutions are agentless. They are not necessary, and I strongly suggest that anyone who is asked to install an agent for monitoring question that requirement.

OpenNMS is Sweet Sixteen

It was sixteen years ago today that the first code for OpenNMS was published on Sourceforge. While the project was started in the summer of 1999, no one seems to remember the exact date, so we use March 30th to mark the birthday of the OpenNMS project.

OpenNMS Project Details

While I’ve been closely associated with OpenNMS for a very long time, I didn’t start it. It was started by Steve Giles, Luke Rindfuss and Brian Weaver. They were soon joined by Shane O’Donnell, and while none of them are associated with the project today, they are the reason it exists.

Their company was called Oculan, and I joined them in 2001. They built management appliances marketed as “purple boxes” based on OpenNMS and I was brought on to build a business around just the OpenNMS piece of the solution.

As far as I know, this is the only surviving picture of most of the original team, taken at the OpenNMS 1.0 Release party:

OpenNMS 1.0 Release Team

In 2002 Oculan decided to close source all future work on their product, thus ending their involvement with OpenNMS. I saw the potential, so I talked with Steve Giles and soon left the company to become the OpenNMS project maintainer. When it comes to writing code I am very poorly suited to the job, but my one true talent is getting great people to work with me, and judging by the quality of people involved in OpenNMS, it is almost a superpower.

I worked out of my house and helped maintain the community mainly through the #opennms IRC channel on freenode, and surprisingly the project managed not only to survive, but to grow. When I found out that Steve Giles was leaving Oculan, I applied to be their new CEO, which I’ve been told was the source of a lot of humor among the executives. The man they hired had a track record of snuffing out all potential from a number of startups, but he had the proper credentials that VCs seem to like so he got the job. I have to admit to a bit of schadenfreude when Oculan closed its doors in 2004.

But on a good note, if you look at the two guys in the above picture right next to the cake, Seth Leger and Ben Reed, they still work for OpenNMS today. We’re still here. In fact we have the greatest team I’ve every worked with in my life, and the OpenNMS project has grown tremendously in the last 18 months. This July we’ll have our eleventh (!) annual developers conference, Dev-Jam, which will bring together people dedicated to OpenNMS, both old and new, for a week of hacking and camaraderie.

Our goal is nothing short of making OpenNMS the de facto management platform of choice for everyone, and while we still have a long way to go, we keep getting closer. My heartfelt thanks go out to everyone who made OpenNMS possible, and I look forward to writing many more of these notes in the future.

OpenNMS Horizon 17.1.1 Released

Probably the last Horizon 17 version, 17.1.1, has been released. According to TWiO, the next release will be Horizon 18 at the end of the month, with Horizon 19 following at the end of May.

This release is mainly a maintenance release. It does contain one fix I used (NMS-8199), which allows for the state names in the Jira Trouble Ticketing plugin to be configured. This helps a lot if Jira is not in English.

If you are running Horizon 17, this should help it run a bit smoother.


  • [NMS-7936] – Chart Servlet Outages model exception
  • [NMS-8010] – Groups config rolled back after deleting a user in web UI
  • [NMS-8034] – Adding on opennms.conf is ignored by the opennms script
  • [NMS-8048] – org.hibernate.exception.SQLGrammarException with ACLs on V17
  • [NMS-8075] – vacuumd-configuration.xml — Database error executing statement
  • [NMS-8113] – Overview about major releases in the release notes
  • [NMS-8153] – Can't modify the Foreign ID on the Requisitions UI when adding a new node
  • [NMS-8159] – When altering the SNMP Trap NBI config, the externally referenced mapping groups are persisted into the main file.
  • [NMS-8161] – Tooltips are not working on the new Requisitions UI
  • [NMS-8165] – OutageDao ACL support is broken causing web UI failures
  • [NMS-8177] – Install guide should use postgres admin for schema updates
  • [NMS-8199] – Allows state names to be configured in the JIRA Ticketer Plugin


  • [NMS-6404] – Allow send events through ReST
  • [NMS-8148] – Create pull request and contribution template to GitHub project


  • [NMS-8151] – Remove all jersey artifacts from lib classpath

Speeding Up OpenNMS Requisition Imports

One thing that differentiates OpenNMS from other applications is the strong focus on tools for provisioning the system. If you want to monitor hundred of thousands of devices, to ultimately millions, the ordinary methods just don’t work.

Users of OpenNMS often create large requisitions from external database sources, and sometimes it can take awhile for the import to complete. Delays can happen if the Foreign Source used for the requisition has a large number of service detectors that won’t exist on most devices.

For example, the default Foreign Source for Horizon 17 has about 15 detectors. Of those, only about 4 will exist on networking equipment (ICMP, SSH, HTTP and HTTPS). When scanning, this can add a lot of time per interface. Assuming 2 retries and a 3 second timeout, that would be 9 seconds for each non-existent service. With just 1000 interfaces, that’s 99000 seconds (9 seconds x 11 services x 1000 interfaces) of time just spent waiting, which translates to 27.5 hours.

Now, granted, the importer has multiple threads so the actual wait time will be less, but you can see how this can impact the time needed to import a requisition. This can be reduced significantly by tuning service detection to the bare minimum needed and perhaps adding other services later on a per device basis without scanning.