New Fancy Website for www.opennms.org

As some of you may have noticed, a little while ago the OpenNMS Project website got updated to a new, fancy, responsive version.

OpenNMS Platform

This was mainly the work of Ronny Trommer with a big assist from our graphic designer, Jessica.

We are often so busy working on the code we often forget how important it is to tell people about what we are doing. Most people who take the time to learn about the project realize how awesome it is, but it can be hard to get over that first hump in the learning curve.

I hope that the new site will both reflect the benefits of using OpenNMS as well as the work of the community behind it.

OpenNMS and Elasticsearch

With Horizon 18 we added support for sending OpenNMS events into Elasticsearch. Unfortunately, it only works with Elasticsearch 1.0. Elasticsearch 2.0 and higher requires Camel 17, but OpenNMS can’t use it. I wondered why, and if you were wondering too, here is the answer from Seth:

Camel 17 has changed their OSGi metadata to only be compatible with Spring 4.1 and higher. We’re still using Spring 4.0 so that’s one problem. The second issue is that ActiveMQ’s OSGi metadata bans Spring 4.0 and higher. So currently, ActiveMQ and Camel are mutually incompatible with one another inside Karaf at any version higher than the ones that we are currently running.

The biggest issue is the ActiveMQ problem, I’ve opened this bug and it sounds like they’re going to address it in their next major release

So there you have it.

Choose the Right Thermometer

Okay, so I have a love/hate relationship with Centurylink. Centurylink provides a DSL circuit to my house. I love the fact that I have something resembling broadband with 10Mbps down and about 1Mbps up. Now that doesn’t even qualify as broadband according to the FCC, but it beats the heck out of the alternatives (and I am jealous of my friends with cable who have 100Mbps down or even 300Mbps).

The hate part comes from reliability, which lately has been crap. This post is actually focused on OpenNMS so I won’t go into all of my issues, but I’ve been struggling with long outages in my service.

The latest issue is a new one: packet loss. Usually the circuit is either up or completely down, but for the last three days I’ve been having issues with a large percentage of dropped packets. Of course I monitor my home network from the office OpenNMS instance, and this will usually manifest itself with multiple nodeLostService events around HTTP since I have a personal web server that I monitor.

The default ICMP monitor does not measure packet loss. As long as at least one ping reply makes it, ICMP is considered up, so the node itself remains up. OpenNMS does have a monitor for packet loss called Strafeping. It sends out 20 pings in a short amount of time and then measures how long they take to come back. So I added it to the node for my home and I saw something unusual: a consistent 19 out of 20 lost packets.

Strafeping Graph

Power cycling the DSL modem seems to correct the problem, and the command line ping was reporting no lost packets, so why was I seeing such packet loss from the monitor? Was Strafeping broken?

While it is always a possibility, I didn’t think that Strafeping was broken, but I did check a number of graphs for other circuits and they looked fine. Thus it had to be something else.

This brings up a touchy subject for me: false positives. Is OpenNMS reporting false problems?

It reminds me of an event happened when I was studying physics back in the late 1980s. I was working with some newly discovered ceramic material that exhibited superconductivity at relatively high temperatures (around 92K). That temperature can be reached using liquid nitrogen, which was relatively easy to source compared to cooler liquids like liquid helium.

I needed to measure the temperature of the ceramic, but mercury (used in most common thermometers) is a solid at those temperatures, so I went to my advisor for suggestions. His first question to me was “What does a thermometer measure?”

I thought it was a trick question, so I answered “temperature” (“thermo” meaning temperature and meter meaning “to measure”). He replied, “Okay, smart guy, the temperature of what?”

That was harder to answer exactly, so I said vague things like the ambient environment, whatever it was next to, etc. He interrupted me and said “No, a thermometer measures one thing: the temperature of the thermometer”.

This was an important lesson, even though it seems obvious. In the case of the ceramic it meant a lot of extra steps to make sure the thermometer we were using (which was based on changes in resistance) was as close to the temperature of the material as possible.

What does that have to do with OpenNMS? Well, OpenNMS is like that thermometer. It is up to us to make sure that the way we decide to use it for monitoring is as close to our criteria as possible. A “false positive” usually indicates a problem with the method versus the tool – OpenNMS is behaving exactly as it should but we need to match it better to what we expect.

In my case I found out the router I use was limited by default to responding 1 ping per second (to avoid DDoS attacks I assume), so last night when I upped that to allow 20 pings per second Strafeping started to work as expected (as you can see in the graph above).

This allowed me to detect when my DSL circuit packet loss started again today. A little after 14:00 the system detected high packet loss. When this happened before, power cycling the modem seemed to fix it, so I headed home to do just that.

While I was on the way, around 15:30, the packet loss seemed to improve, but as you can see from the graph the ping times were all over the place (the line is green but there is a lot of extra “smoke” around it indicating a variance in the response times). I proactively power cycled the modem and things settled down. The Centurylink agent agreed to send me a new modem.

The point of this post is to stress that you need to understand how your monitoring tools actually work and you can often correct issues that make a monitor unusable and turn it into to something useful. Choose the right thermometer.

OpenNMS Horizon 18 “Tardigrade” Is Now Available

I am extremely happy to announce the availability of Horizon 18, codenamed “Tardigrade”. Ben is responsible for naming our releases and he’s decided that the theme for Horizon 18 will be animals. The name “Tardigrade” was suggested in the IRC channel by Uberpenguin, and while they aren’t the prettiest things, Wikipedia describes them as “perhaps the most durable of known organisms” so in the context of OpenNMS that is appropriate.

OpenNMS Horizon 18

I am also happy to see the Horizon program working. When we split OpenNMS into Horizon and Meridian, the main reason was to drive faster development. Now instead of a new stable release every 18 months, we are getting them out every 3 to 4 months. And these are great releases – not just major releases in name only.

The first thing you’ll notice if you log in to Horizon 18 as a user in the admin role is that we’ve added a new “opt-in” feature that let’s us know a little bit about how OpenNMS is being used by people. We hope that most of you will choose to send us this information, and in the spirit of the Open Source Way we’ve made all of the statistics available publicly.

OpenNMS Opt-In Screen

One of the key things we are looking for is the list of SNMP Object IDs. This will let us know what devices are being monitored by our users and to increase their level of support. Of course, this requires that your OpenNMS instance be able to reach the stats server on the Internet, and you can change your choice at any time on the Configuration admin page under “Data Choices”. It will only send this information once every 24 hours, so we don’t expect it to impact network traffic at all.

Once you’ve opted in, the next thing you’ll probably notice is new problem lists on the home page listing “services” and “applications”.

OpenNMS BSM Problem Lists

This related to the major feature addition in Horizon 18 of the Business Service Monitor (BSM).

OpenNMS BSM OpenDaylight

As people move from treating servers as pets to treating them like cattle, the emphasis has shifted to understanding how well applications and microservices are running as a whole instead of focusing on individual devices. The BSM allows you to configure these services and then leverage all the usual OpenNMS crunchy goodness as you would a legacy service like HTTP running on a particular box. The above screenshot comes from some prototype work Jesse has been doing with integrating OpenNMS with OpenDaylight. As you can see at a glance, while the ICMP service is down on a particular device, the overall Network Fabric is still functioning perfectly.

Another thing I’m extremely proud of is the increase in the quality of documentation. Ronny and the rest of the documentation team are doing a great job, and we’ve made it a requirement that new features aren’t complete without documentation. Please check out the release notes as an example. It contains a pretty comprehensive lists of changes in 18.

A few I’d like to point out:

Horizon 17 is one of the most powerful and stable releases of OpenNMS ever, and we hope to continue that tradition with Horizon 18. Hats off to the team for such great work.

Here is a list of all the issues addressed in Horizon 18:

Release Notes – OpenNMS – Version 18.0.0

Bug

  • [NMS-3489] – "ADD NODE" produces "too much" config
  • [NMS-4845] – RrdUtils.createRRD log message is unclear
  • [NMS-5788] – model-importer.properties should be deprecated and removed
  • [NMS-5839] – Bring WaterfallExecutor logging on par with RunnableConsumerThreadPool
  • [NMS-5915] – The retry handler used with HttpClient is not going to do what we expect
  • [NMS-5970] – No HTML title on Topology Map
  • [NMS-6344] – provision.pl does not import requisitions with spaces in the name
  • [NMS-6549] – Eventd does not honor reloadDaemonConfig event
  • [NMS-6623] – Update JNA.jar library to support ARM based systems
  • [NMS-7263] – jaxb.properties not included in jar
  • [NMS-7471] – SNMP Plugin tests regularly failing
  • [NMS-7525] – ArrayOutOfBounds Exception in Topology Map when selecting bridge-port
  • [NMS-7582] – non RFC conform behaviour of SmtpMonitor
  • [NMS-7731] – Remote poller dies when trying to use the PageSequenceMonitor
  • [NMS-7763] – Bridge Data is not Collected on Cisco Nexus
  • [NMS-7792] – NPE in JmxRrdMigratorOffline
  • [NMS-7846] – Slow LinkdTopologyProvider/EnhancedLinkdTopologyProvider in bigger enviroments
  • [NMS-7871] – Enlinkd bridge discovery creates erroneous entries in the Bridge Forwarding Tables of unrelated switches when host is a kvm virtual host
  • [NMS-7872] – 303 See Other on requisitions response breaks the usage of the Requisitions ReST API
  • [NMS-7880] – Integration tests in org.opennms.core.test-api.karaf have incomplete dependencies
  • [NMS-7918] – Slow BridgeBridgeTopologie discovery with enlinkd.
  • [NMS-7922] – Null pointer exceptions with whitespace in requisition name
  • [NMS-7959] – Bouncycastle JARs break large-key crypto operations
  • [NMS-7967] – XML namespace locations are not set correctly for namespaces cm, and ext
  • [NMS-7975] – Rest API v2 returns http-404 (not found) for http-204 (no content) cases
  • [NMS-8003] – Topology-UI shows LLDP links not correct
  • [NMS-8018] – Vacuumd sends automation events before transaction is closed
  • [NMS-8056] – opennms-setup.karaf shouldn't try to start ActiveMQ
  • [NMS-8057] – Add the org.opennms.features.activemq.broker .xml and .cfg files to the Minion repo webapp
  • [NMS-8058] – Poll all interface w/o critical service is incorrect
  • [NMS-8072] – NullPointerException for NodeDiscoveryBridge
  • [NMS-8079] – The OnmsDaoContainer does not update its cache correctly, leading to a NumberFormatException
  • [NMS-8080] – VLAN name is not displayed
  • [NMS-8086] – Provisioning Requisitions with spaces in their name.
  • [NMS-8096] – JMX detector connection errors use wrong log level
  • [NMS-8098] – PageSequenceMonitor sometimes gives poor failure reasons
  • [NMS-8104] – init script checkXmlFiles() fails to pick up errors
  • [NMS-8116] – Heat map Alarms/Categories do not show all categories
  • [NMS-8118] – CXF returning 204 on NULL responses, rather than 404
  • [NMS-8125] – Memory leak when using Groovy + BSF
  • [NMS-8128] – NPE if provisioning requisition name has spaces
  • [NMS-8137] – OpenNMS incorrectly discovers VLANs
  • [NMS-8146] – "Show interfaces" link forgets the filters in some circumstances
  • [NMS-8167] – Cannot search by MAC address
  • [NMS-8168] – Vaadin Applications do not show OpenNMS favicon
  • [NMS-8189] – Wrong interface status color on node detail page
  • [NMS-8194] – Return an HTTP 303 for PUT/POST request on a ReST API is a bad practice
  • [NMS-8198] – Provisioning UI indication for changed nodes is too bright
  • [NMS-8208] – Upgrade maven-bundle-plugin to v3.0.1
  • [NMS-8214] – AlarmdIT.testPersistManyAlarmsAtOnce() test ordering issue?
  • [NMS-8215] – Chart servlet reloads Notifd config instead of Charts config
  • [NMS-8216] – Discovery config screen problems in latest code
  • [NMS-8221] – Operation "Refresh Now" and "Automatic Refresh" referesh the UI differently
  • [NMS-8224] – JasperReports measurements data-source step returning null
  • [NMS-8235] – Jaspersoft Studio cannot be used anymore to debug/create new reports
  • [NMS-8240] – Requisition synchronization is failing due to space in requisition name
  • [NMS-8248] – Many Rcsript (RScript) files in OPENNMS_DATA/tmp
  • [NMS-8257] – Test flapping: ForeignSourceRestServiceIT.testForeignSources()
  • [NMS-8272] – snmp4j does not process agent responses
  • [NMS-8273] – %post error when Minion host.key already exists
  • [NMS-8274] – All the defined Statsd's reports are being executed even if they are disabled.
  • [NMS-8277] – %post failure in opennms-minion-features-core: sed not found
  • [NMS-8293] – Config Tester Tool doesn't check some of the core configuration files
  • [NMS-8298] – Label of Vertex is too short in some cases
  • [NMS-8299] – Topology UI recenters even if Manual Layout is selected
  • [NMS-8300] – Center on Selection no longer works in STUI
  • [NMS-8301] – v2 Rest Services are deployed twice to the WEB-INF/lib directory
  • [NMS-8302] – Json deserialization throws "unknown property" exception due to usage of wrong Jax-rs Provider
  • [NMS-8304] – An error on threshd-configuration.xml breaks Collectd when reloading thresholds configuration
  • [NMS-8313] – Pan moving in Topology UI automatically recenters
  • [NMS-8314] – Weird zoom behavior in Topology UI using mouse wheel
  • [NMS-8320] – Ping is available for HTTP services
  • [NMS-8324] – Friendly name of an IP service is never shown in BSM
  • [NMS-8330] – Switching Topology Providers causes Exception
  • [NMS-8335] – Focal points are no longer persisted
  • [NMS-8337] – Non-existing resources or attributes break JasperReports when using the Measurements API
  • [NMS-8353] – Plugin Manager fails to load
  • [NMS-8361] – Incorrect documentation for org.opennms.newts.query.heartbeat
  • [NMS-8371] – The contents of the info panel should refresh when the vertices and edges are refreshed
  • [NMS-8373] – The placeholder {diffTime} is not supported by Backshift.
  • [NMS-8374] – The logic to find event definitions confuses the Event Translator when translating SNMP Traps
  • [NMS-8375] – License / copyright situation in release notes introduction needs simplifying
  • [NMS-8379] – Sluggish performance with Cassandra driver
  • [NMS-8383] – jmxconfiggenerator feature has unnecessary includes
  • [NMS-8386] – Requisitioning UI fails to load in modern browsers if used behind a proxy
  • [NMS-8388] – Document resources ReST service
  • [NMS-8389] – Heatmap is not showing
  • [NMS-8394] – NoSuchElement exception when loading the TopologyUI
  • [NMS-8395] – Logging improvements to Notifd
  • [NMS-8401] – There are errors on the graph definitions for OpenNMS JMX statistics
  • [NMS-8403] – Document styles of identifying nodes in resource IDs

Enhancement

  • [NMS-2504] – Create a better landing page for Configure Discovery aftermath
  • [NMS-4229] – Detect tables with Provisiond SNMP detector
  • [NMS-5077] – Allow other services to work with Path Outages other than ICMP
  • [NMS-5905] – Add ifAlias to bridge Link Interface Info
  • [NMS-5979] – Make the Provisioning Requisitions "Node Quick-Add" look pretty
  • [NMS-7123] – Expose SNMP4J 2.x noGetBulk and allowSnmpV2cInV1 capabilities
  • [NMS-7446] – Enhance Bridge Link Object Model
  • [NMS-7447] – Update BridgeTopology to use the new Object Model
  • [NMS-7448] – Update Bridge Topology Discovery Strategy
  • [NMS-7756] – Change icon for Dell PowerConnector switch
  • [NMS-7798] – Add Sonicwall Firewall Events
  • [NMS-7903] – Elasticsearch event and alarm forwarder
  • [NMS-7950] – Create an overview for the developers guide
  • [NMS-7965] – Add support for setting system properties via user supplied .properties files
  • [NMS-7976] – Merge OSGi Plugin Manager into Admin UI
  • [NMS-7980] – provide HTTPS Quicklaunch into node page
  • [NMS-8015] – Remove Dependencies on RXTX
  • [NMS-8041] – Refactor Enhanced Linkd Topology
  • [NMS-8044] – Provide link for Microsoft RDP connections
  • [NMS-8063] – Update asciidoc dependencies to latest 1.5.3
  • [NMS-8076] – Allow user to access local documentation from OpenNMS Jetty Webapp
  • [NMS-8077] – Add NetGear Prosafe Smart switch SNMP trap events and syslog events
  • [NMS-8092] – Add OpenWrt syslog and related event definitions
  • [NMS-8129] – Disallow restricted characters from foreign source and foreign ID
  • [NMS-8149] – Update asciidoctorj to 1.5.4 and asciidoctorjPdf to 1.5.0-alpha.11
  • [NMS-8152] – Collect and publish anonymous statistics to stats.opennms.org
  • [NMS-8160] – Remove Quick-Add node to avoid confusions and avoid breaking the ReST API
  • [NMS-8163] – Requisitions UI Enhancements
  • [NMS-8179] – ifIndex >= 2^31
  • [NMS-8182] – Add HTTPS as quick-link on the node page
  • [NMS-8205] – Generate events for alarm lifecycle changes
  • [NMS-8209] – Upgrade junit to v4.12
  • [NMS-8210] – Add support for calculating the derivative with a Measurements API Filter
  • [NMS-8211] – Add support for retrieving nodes with a filter expression via the ReST API
  • [NMS-8218] – External event source tweaks to admin guide
  • [NMS-8219] – Copyright bump on asciidoc docs
  • [NMS-8225] – Integrate the Minion container and packages into the mainline OpenNMS build
  • [NMS-8226] – Upgrade SNMP4J to version 2.4
  • [NMS-8238] – Topology providers should provide a description for display
  • [NMS-8251] – Parameterize product name in asciidoc docs
  • [NMS-8259] – Cleanup testdata in SnmpDetector tests
  • [NMS-8265] – SNMP collection systemDefs for Cisco ASA5525-X, ASA5515-X
  • [NMS-8266] – SNMP collection systemDefs for Juniper SRX210he2, SRX100h
  • [NMS-8267] – Create documentation for SNMP detector
  • [NMS-8271] – Enable correlation engines to register for all events
  • [NMS-8296] – Be able to re-order the policies on a requisition through the UI
  • [NMS-8334] – Implement org.opennms.timeseries.strategy=evaluate to facilitate the sizing process
  • [NMS-8336] – Set the required fields when not specified while adding events through ReST
  • [NMS-8349] – Update screenshots with 18 theme in user documentation
  • [NMS-8365] – Add metric counter for drop counts when the ring buffer is full
  • [NMS-8377] – Applying some organizational changes on the Requisitions UI (Grunt, JSHint, Dist)

Story

Task

  • [NMS-8236] – Move the "vaadin-extender-service" module to opennms code base

Agent Provocateur

I’ve been involved with the monitoring of computer networks for a long time, two decades actually, and I’m seeing an alarming trend. Every new monitoring application seems to be insisting on software agents. Basically, in order to get any value out of the application, you have to go out and install additional software on each server in your network.

Now there was a time when this was necessary. BMC Software made a lot of money with its PATROL series of agents, yet people hated them then as much as they hate agents now. Why? Well, first there was the cost, both in terms of licensing and in continuing to maintain them (upgrades, etc.). Next there was the fact that you had to add software to already overloaded systems. I can remember the first time the company I worked for back then deployed a PATROL agent on an Oracle database. When it was started up it took the database down as it slammed the system with requests. Which leads me to the final point, outside of security issues that arise with an increase in the number of applications running on a system, the moment the system experiences a problem the blame will fall on the agent.

Despite that, agents still seem to proliferate. In part I think it is political. Downloading and installing agents looks like useful work. “Hey, I’m busy monitoring the network with these here agents”. Also in part, it is laziness. I have never met a programmer who liked working on someone else’s code, so why not come up with a proprietary protocol and write agents to implement it?

But what bothers me the most is that it is so unnecessary. The information you need for monitoring, with the possible exception of Windows, is already there. Modern operating systems (again, with the exception of Windows) ship with an SNMP agent, usually based on Net-SNMP. This is a secure, powerful extensible agent that has been tried and tested for many years, and it is maintained directly on server itself. You can use SNMPv3 for secure communications, and the “extend” and “pass” directives to make it easy to customize.

Heck, even Windows ships with an extensible SNMP agent, and you can also access data via WMI and PowerShell.

But what about applications? Don’t you need an agent for that?

Not really. Modern applications tend to have an API, usually based on ReST, that can be queried by a management station for important information. Java applications support JMX, databases support ODBC, and when all that fails you can usually use good ol’ HTTP to query the application directly. And the best part is that the application itself can be written to guard against a monitoring query causing undue load on the system.

At OpenNMS we work with a lot of large customers, and they are loathe to install new software on all of their servers. Plus, many of our customers have devices that can’t support additional agents, such as routers and switches, and IoT devices such as thermostats and door locks. This is the main reason why the OpenNMS monitoring platform is, by design, agentless.

A critic might point out that OpenNMS does have an agent in the remote poller, as well as in the upcoming Minion feature set. True, but those act as “user agents”, giving OpenNMS a view into networks as if it was a user of those networks. The software is not installed on every server but instead it just needs the same access as a user would have. So, it can be installed on an existing system or on a small system purchased for that purpose, at a minimum just one for each network to be monitored.

While some new IT fields may require agents, most successful solutions try to avoid them. Even in newer fields such as IT automation, the best solutions are agentless. They are not necessary, and I strongly suggest that anyone who is asked to install an agent for monitoring question that requirement.

OpenNMS is Sweet Sixteen

It was sixteen years ago today that the first code for OpenNMS was published on Sourceforge. While the project was started in the summer of 1999, no one seems to remember the exact date, so we use March 30th to mark the birthday of the OpenNMS project.

OpenNMS Project Details

While I’ve been closely associated with OpenNMS for a very long time, I didn’t start it. It was started by Steve Giles, Luke Rindfuss and Brian Weaver. They were soon joined by Shane O’Donnell, and while none of them are associated with the project today, they are the reason it exists.

Their company was called Oculan, and I joined them in 2001. They built management appliances marketed as “purple boxes” based on OpenNMS and I was brought on to build a business around just the OpenNMS piece of the solution.

As far as I know, this is the only surviving picture of most of the original team, taken at the OpenNMS 1.0 Release party:

OpenNMS 1.0 Release Team

In 2002 Oculan decided to close source all future work on their product, thus ending their involvement with OpenNMS. I saw the potential, so I talked with Steve Giles and soon left the company to become the OpenNMS project maintainer. When it comes to writing code I am very poorly suited to the job, but my one true talent is getting great people to work with me, and judging by the quality of people involved in OpenNMS, it is almost a superpower.

I worked out of my house and helped maintain the community mainly through the #opennms IRC channel on freenode, and surprisingly the project managed not only to survive, but to grow. When I found out that Steve Giles was leaving Oculan, I applied to be their new CEO, which I’ve been told was the source of a lot of humor among the executives. The man they hired had a track record of snuffing out all potential from a number of startups, but he had the proper credentials that VCs seem to like so he got the job. I have to admit to a bit of schadenfreude when Oculan closed its doors in 2004.

But on a good note, if you look at the two guys in the above picture right next to the cake, Seth Leger and Ben Reed, they still work for OpenNMS today. We’re still here. In fact we have the greatest team I’ve every worked with in my life, and the OpenNMS project has grown tremendously in the last 18 months. This July we’ll have our eleventh (!) annual developers conference, Dev-Jam, which will bring together people dedicated to OpenNMS, both old and new, for a week of hacking and camaraderie.

Our goal is nothing short of making OpenNMS the de facto management platform of choice for everyone, and while we still have a long way to go, we keep getting closer. My heartfelt thanks go out to everyone who made OpenNMS possible, and I look forward to writing many more of these notes in the future.

OpenNMS Horizon 17.1.1 Released

Probably the last Horizon 17 version, 17.1.1, has been released. According to TWiO, the next release will be Horizon 18 at the end of the month, with Horizon 19 following at the end of May.

This release is mainly a maintenance release. It does contain one fix I used (NMS-8199), which allows for the state names in the Jira Trouble Ticketing plugin to be configured. This helps a lot if Jira is not in English.

If you are running Horizon 17, this should help it run a bit smoother.

Bug

  • [NMS-7936] – Chart Servlet Outages model exception
  • [NMS-8010] – Groups config rolled back after deleting a user in web UI
  • [NMS-8034] – Adding com.sun.management.jmxremote.authenticate=true on opennms.conf is ignored by the opennms script
  • [NMS-8048] – org.hibernate.exception.SQLGrammarException with ACLs on V17
  • [NMS-8075] – vacuumd-configuration.xml — Database error executing statement
  • [NMS-8113] – Overview about major releases in the release notes
  • [NMS-8153] – Can't modify the Foreign ID on the Requisitions UI when adding a new node
  • [NMS-8159] – When altering the SNMP Trap NBI config, the externally referenced mapping groups are persisted into the main file.
  • [NMS-8161] – Tooltips are not working on the new Requisitions UI
  • [NMS-8165] – OutageDao ACL support is broken causing web UI failures
  • [NMS-8177] – Install guide should use postgres admin for schema updates
  • [NMS-8199] – Allows state names to be configured in the JIRA Ticketer Plugin

Enhancement

  • [NMS-6404] – Allow send events through ReST
  • [NMS-8148] – Create pull request and contribution template to GitHub project

Task

  • [NMS-8151] – Remove all jersey artifacts from lib classpath

Speeding Up OpenNMS Requisition Imports

One thing that differentiates OpenNMS from other applications is the strong focus on tools for provisioning the system. If you want to monitor hundred of thousands of devices, to ultimately millions, the ordinary methods just don’t work.

Users of OpenNMS often create large requisitions from external database sources, and sometimes it can take awhile for the import to complete. Delays can happen if the Foreign Source used for the requisition has a large number of service detectors that won’t exist on most devices.

For example, the default Foreign Source for Horizon 17 has about 15 detectors. Of those, only about 4 will exist on networking equipment (ICMP, SSH, HTTP and HTTPS). When scanning, this can add a lot of time per interface. Assuming 2 retries and a 3 second timeout, that would be 9 seconds for each non-existent service. With just 1000 interfaces, that’s 99000 seconds (9 seconds x 11 services x 1000 interfaces) of time just spent waiting, which translates to 27.5 hours.

Now, granted, the importer has multiple threads so the actual wait time will be less, but you can see how this can impact the time needed to import a requisition. This can be reduced significantly by tuning service detection to the bare minimum needed and perhaps adding other services later on a per device basis without scanning.

OpenNMS Horizon 17.1 Released

As some of you may have noticed, Horizon 17.1 has been released. As Horizon 17 will form the basis for Meridian 2016, I’m extremely happy to see how much progress has been made on fixing issues.

Be sure to check out the Release Notes.

Horizon is our rapid release version, and its goal is to get all the cool new features out as soon as they are ready. In this case, right about release we discovered an annoying but easy to fix bug with provisioning, so if you plan to run Horizon 17.1 you should also apply this patch.

Have fun, and we hope you find OpenNMS useful.

Bug

  • [NMS-4108] – Bad suggestions in install guide
  • [NMS-7152] – Enlinkd Topology Plugin fails to create LLDP links for mismatched link port descriptions.
  • [NMS-7820] – snmp-graph.properties.d files with –vertical-label="verticalLabel" in config
  • [NMS-7866] – Incorrect host in Location header when creating resources via ReST
  • [NMS-7910] – config-tester is broken
  • [NMS-7953] – Opsboard and Opspanel use wrong logo
  • [NMS-7966] – Unable to generate eventconf if a MIB (improperly?) uses a TC to define a TC
  • [NMS-7988] – container features.xml still references jersey 1.18 when it should reference 1.19.
  • [NMS-8000] – Topology-UI shows CDP links not correct
  • [NMS-8014] – Backshift graphs show dates in UTC instead of the browser's timezone
  • [NMS-8017] – StrafePing: Unexpected exception while polling PollableService
  • [NMS-8023] – Grafana Box did not work anymore
  • [NMS-8026] – Constant Thread Locking on Enlinkd
  • [NMS-8029] – Wrong use of opennms.web.base-url
  • [NMS-8038] – NRTG with newts – get StringIndexOutOfBoundsException
  • [NMS-8051] – The newts script only works if cassandra runs on localhost
  • [NMS-8054] – AlarmPersisterIT test is empty
  • [NMS-8064] – The 'newts init' script does work when authentication is enabled in Cassandra
  • [NMS-8065] – ReST Regression in Alarms/Events
  • [NMS-8066] – Newts only uses a single thread when writing to Cassandra
  • [NMS-8073] – User Restriction Filters: mapping class for roles to groups does not work
  • [NMS-8074] – The "Remove From Focus" button intermittently fails
  • [NMS-8079] – The OnmsDaoContainer does not update its cache correctly, leading to a NumberFormatException
  • [NMS-8084] – File not found exception for interfaceSTP-box.jsp on SNMP interface page
  • [NMS-8097] – Installation Guide Debian Bug Version 17.0.0
  • [NMS-8100] – Unable to complete creation of scheduled reports
  • [NMS-8103] – NPE when persisting data with Newts
  • [NMS-8104] – init script checkXmlFiles() fails to pick up errors
  • [NMS-8106] – INFO-severity syslog-derived events end up unmatched
  • [NMS-8109] – Memory leak when using the BSFDetector
  • [NMS-8112] – init script "configtest" exit value is always 1
  • [NMS-8116] – Heat map Alarms/Categories do not show all categories
  • [NMS-8119] – WS-MAN has broken ForeignSourceConfigRestService and the requisitions UI doesn't work.
  • [NMS-8123] – Removing ops boards via the configuration UI does not update the table
  • [NMS-8126] – JNA ping code reuses buffer causing inconsistent reads of packet contents
  • [NMS-8133] – Synchronizing a requisition fails
  • [NMS-8147] – Add all the services declared on Collectd and Pollerd configuration as available services on /opennms/rest/foreignSourceConfig/services

Enhancement

  • [NMS-7123] – Expose SNMP4J 2.x noGetBulk and allowSnmpV2cInV1 capabilities
  • [NMS-7978] – Add threshold comments and whitespace changes to match how the OpenNMS web GUI generates XML files
  • [NMS-8005] – Add support for using NRTG via Ajax calls
  • [NMS-8024] – Add support for OSGi-based Ticketing plugins
  • [NMS-8028] – Add event definition for postfix syslog message TLS disabled
  • [NMS-8030] – Improve the SNMP data collection config parsing to give more flexibility to the users
  • [NMS-8042] – set Up severities for RADLAN-MIB.events.xml
  • [NMS-8068] – Add support for marshalling NorthboundAlarms to XML
  • [NMS-8071] – Event definition file for JUNIPER-IVE-MIB
  • [NMS-8120] – Fixed a paragraph in the "Automatic Discovery" provisioning chapter
  • [NMS-8156] – Upgrade Angular Backend for the Requisitions UI

Add a Weather Widget to OpenNMS Home Screen

I was recently at a client site where I met a man named Jeremy Ford. He’s sharp as a knife and even though, at the time, he was new to OpenNMS, he had already hacked a few neat things into the system (open source FTW).

Weathermap on OpenNMS Home Page

One of those was the addition of a weathermap to the OpenNMS home page. He has graciously put the code up on Github.

The code is a script that will generate a JSP file in the OpenNMS “includes” directory. All you have to do then is to add a reference to it in the main index.jsp file.

For those of you who don’t know or who have never poked around, under the $OPENNMS_HOME directory should be a directory called jetty-webapps. That is the web root directory for the Jetty servlet container that ships with OpenNMS.

Under that directory you’ll find a subdirectory for opennms. When you surf to http://[my OpenNMS Server]:8980/opennms that is the directory you are visiting. In it is an index.jsp file that serves as the main page.

If you are familiar with HTML, the JSP file is very similar. It can contain references to Java code, but a lot of it is straight HTML. The file is kept simple on purpose, with each of the three columns on the main page indicated by comments. The part you will need to change is the third column:

<!-- Right Column -->
        <div class="col-md-3" id="index-contentright">
                <!-- weather box -->
                <jsp:include page="/includes/weather.jsp" flush="false" />

Feel free to look around. If you ever wanted to rearrange the OpenNMS Home page, this is a good place to start.

Now, I used to like poking around with these files since they would update automatically, but later versions of OpenNMS (which contain later versions of Jetty) seem to require a restart. If you get an error, restart OpenNMS and see if it goes away.

Now the weather.jsp file gets generated by Jeremy’s python script. In order to get that to work you’ll need to do two things. The most important is to get an API key from Weather Underground. It is a pretty easy process, but be aware that you can only do 500 queries a day without paying. The second thing you’ll need to do is edit the three URLs in the script and change the location. It is currently set to “CA/San_Francisco” but I was able to change it to “NC/Pittsboro” and it “just worked”.

Finally, you’ll need to set the script up to run via cron. I’m not sure how frequently Weather Underground updates the data, but a 10 minute interval seems to work well. That’s only 144 queries a day, so you could easily double it and still be within your limit.

[IMPORTANT UPDATE: Jeremy pointed out that the script actually does three queries, not just one, so instead of doing 144 queries a day, it’s 432. Still leaves some room with 10 minute queries but you don’t want to increase the frequency too much.]

Thanks to Jeremy for taking the time to share this. Remember, once you get it working, if you upgrade OpenNMS you’ll need to edit index.jsp and add it back, but that should be the only change needed.