OpenNMS, Grafana and the Internet of Things

Things have been as busy and crazy as usual here in OpenNMS-land, so I often can’t find the time to talk about all the cool new shiny that is available. As my truck wouldn’t start this morning (it’s on the charger now) I thought I’d take some time to talk about a cool new plugin available since Horizon 16 was released.

One of the things I think will be needed in the next few years is a management platform that can scale to Internet of Things (IoT) levels. I also think that the only way to overcome the “Internet of Silos” effect will be to make that platform open source. I’d like OpenNMS to fill that role.

To that end we’re working on our “minion” project. These are lightweight remote processes that do data collection and monitoring and report up to a master OpenNMS instance, or even a cluster of OpenNMS management stations. In order to scale to the massive amounts of data generated by the IoT, we’ve created the Newts project to store time series data on top of Cassandra. Both of those projects are well under way and available for testing in various OpenNMS code branches.

Then we were faced with how to display all of this information. Jesse decided to do an integration with the Grafana project, and now this functionality is available as a plug-in (click to embiggen):


OpenNMS Graphs in Grafana

It’s pretty cool – Jesse translates the syntax used for RRDTool reports into a form that Grafana can use, and since this is hosted on the Grafana server you can integrate data points from multiple OpenNMS instances or pretty much data from any source that Grafana can access. Details available on the Wiki.

Hat’s off to the Grafana project for making such a cool application, and as usual we hope you find this new addition to OpenNMS useful.

Early/Often on the Horizon

Lots of stuff, and I mean lots of cool stuff is going on and to paraphrase Hamlet I have not enough thoughts to put them in, imagination to give them shape, or time to act them in. I spent this week in the UK but I should be home for awhile and I hope to catch up.

But I wanted to put down a at least one thought. When we made the very difficult decision to split OpenNMS into two products, Horizon and Meridian, we had some doubts that it was the right thing to do. Well, at least for me, those doubts have been removed.

It used to take us 18 or more months to get a major release out. Due to the support business we were both hesitant to remove code we no longer needed or to try the newest things. Since we moved to the Horizon model we’ve released 3 major versions in six months and not only have we added a number of great features, we are finally getting around to removing stuff we no longer need and finishing projects that have languished in the past.

In the meantime we’re delivering Meridian to customers who value stability over features with the knowledge that the version they are running is supported for three years. Seriously, we have some customers upgrading from OpenNMS 1.8 (six major releases back) who obviously want longer release cycles, and even if you don’t need support you can get Meridian software for a rather modest fee coupled with OpenNMS Connect for those times when you really just need to ask a question.

Anything OpenNMS does well is a reflection on our great team and community, but I take personally any shortcomings. At least now I can see the path to minimize them if not remove them completely.

It’s a good feeling.

OpenNMS 16 Released

In keeping with our new Horizon release policy of a new major release every three to four months, we are happy to announce the availability of OpenNMS 16, codenamed Daredevil.

Most of the changes in OpenNMS 16 are under the covers. We are trying to streamline the code and thus have removed both capsd (which was deprecated) and linkd (which was replaced by enhanced linkd). This version also requires Java 8.

The main visible feature is that the Dashboard has been rewritten and should be a considerable improvement to those who use it.

A nearly complete list of changes is as follows:

Bug

  • [NMS-863] – "24hr Avail" went negative
  • [NMS-2213] – SLM categories totals are not being updated during runtime
  • [NMS-5631] – Deadlock inside RTC's DataManager during shutdown
  • [NMS-6100] – The Stp interface box page throws an exception
  • [NMS-6158] – When displaying Linkd link info on node, ifAlias data in interface columns missing opening quote
  • [NMS-6536] – NRTG is throwing ConcurrentModificationException
  • [NMS-6567] – IfIndex not updated in ipinterface table on change
  • [NMS-6568] – Requisition UI has inconsistent field labels for building the provisioning requisition
  • [NMS-6583] – linkd can't make use of learned MAC addresses on ports to determine path mapping
  • [NMS-6593] – sort order interfaces on node page
  • [NMS-6802] – EnLinkD IS-IS Link discovery fails on Cisco routers
  • [NMS-6902] – Geomaps are quite slow
  • [NMS-6905] – Remove Link Status Menu Item
  • [NMS-6912] – lldpchassisid not properly decoded for DragonWave in Enhanced Linkd Lldp node discovery
  • [NMS-6972] – test failure: org.opennms.netmgt.provision.detector.SmtpDetectorTest
  • [NMS-6974] – Link Status Provider is still an option for older Linkd Topology Provider
  • [NMS-7029] – Java 8 build fails some tests
  • [NMS-7089] – MAC 00:00:00:00:00:00 should be treated as null
  • [NMS-7090] – IpNetToMedia Table: Manage duplicated ip address
  • [NMS-7096] – Toggle icons on Node List Page are too small on resolutions greater than Full HD
  • [NMS-7148] – Geo-Maps running on a server without internet connection breaks the UI for valid nodes.
  • [NMS-7175] – Alarms dashlet: "ago" and node label columns can overlap when tiled
  • [NMS-7183] – LLdp link discovery: lldpRemLocalPortNum value 0
  • [NMS-7184] – LldpHelper decode exception
  • [NMS-7192] – Remove the logging directories from the DEB package
  • [NMS-7207] – Switch direction to zoom in and out in the topology
  • [NMS-7251] – Change filterfavorites.filter to 'text' SQL data type
  • [NMS-7294] – Enhanced Linkd inserts wrong Local Port bridge number
  • [NMS-7320] – Java environment in Debian has to be configured twice
  • [NMS-7337] – Database Report "Response time by node" Not Working.
  • [NMS-7358] – IllegalArgumentException on ipnettomediatable
  • [NMS-7362] – No CDP neighbors on a topological map
  • [NMS-7372] – ACLs ineffective in geographic map
  • [NMS-7379] – Unable to display performance data from Host Resource processor table
  • [NMS-7400] – KSC Reports with non-existing resources generate exceptions on the WebUI
  • [NMS-7410] – Title information on the node detail page are confusing
  • [NMS-7412] – Double footer in resource graph page
  • [NMS-7432] – Normalize the HTTP Host Header with the new HttpClientWrapper
  • [NMS-7434] – Disabling Notifd crashes webUI
  • [NMS-7456] – JRB to RRD converter no longer compiles
  • [NMS-7466] – Reload Collectd and Pollerd Configuration without restart OpenNMS
  • [NMS-7467] – Path Outage severity is not indicated in Web UI
  • [NMS-7481] – DrayTek Vigor2820 Series agent bug: zero-length IpAddress instance ID
  • [NMS-7485] – queued creates its own category for loggings
  • [NMS-7518] – SNMP version syntax inconsistent across components
  • [NMS-7531] – Surveillance View configuration is no longer dynamic
  • [NMS-7533] – EventconfFactoryTest fails with no events eventconf.xml
  • [NMS-7537] – Vaadin SV on index page not fitting to view
  • [NMS-7543] – Vaadin:Dashboard SV dashlet no longer indicate context of other dashlets
  • [NMS-7549] – NPE on admin/notification/noticeWizard/chooseUeis.jsp
  • [NMS-7554] – Smoke test is failing with the new dashboard
  • [NMS-7563] – gui and maps does not display lldp and cdp links
  • [NMS-7570] – Dashboard Auto-Refresh runs JVM out of memory (Full-GC)
  • [NMS-7576] – The XSD for the SNMP Hardware Inventory Provisioning Adapter is not included on the RPM/DEB packages.
  • [NMS-7577] – Search by foreignSource or severityLabel doesn't work on Geo Maps
  • [NMS-7590] – List of service names in the requisition editor should be pulled from the poller conifguration instead of capsd
  • [NMS-7597] – Tog depth for VmwareMonitor and VmwareCimMonitor is wront
  • [NMS-7598] – Varbinddecodes are being ignored on Notifications
  • [NMS-7603] – Some parameters logged out of order since slf4j conversion
  • [NMS-7604] – Replace PermGen VM arguments with Metaspace equivalents
  • [NMS-7610] – Remote Poller throws ClassNotFound Exception when loading config
  • [NMS-7615] – RPM dependency for JDK 8 is wrong
  • [NMS-7616] – Compass can't make a POST request from FILE URLs in some cases
  • [NMS-7617] – Test failure: org.opennms.netmgt.provision.service.Nms5414Test
  • [NMS-7620] – Scrolling issue
  • [NMS-7622] – Memory leak in RTC
  • [NMS-7626] – The PSM doesn't work with IPv6 addresses if the ${ipaddr} placeholder is used on host or virtual-host
  • [NMS-7629] – Timeline image links are not working with services containing spaces
  • [NMS-7630] – Database reports don't run in 16
  • [NMS-7631] – Match event params for auto-ack of Notification
  • [NMS-7633] – include-url doesn't work on poller packages
  • [NMS-7634] – ClassCastException in BSFNotificationStrategy
  • [NMS-7636] – Node resources are deleted when provisiond aborts a scan
  • [NMS-7637] – Default date width in Database Reports is too small
  • [NMS-7640] – Test failure: testImportAddrThenChangeAddr
  • [NMS-7641] – The IP Interface page is blank.
  • [NMS-7642] – The global variable org.opennms.rrd.queuing.category is set to OpenNMS.Queued and should be queued
  • [NMS-7643] – Test failure: testSerialFailover
  • [NMS-7644] – Fixing Logging Prefix/Category on several classes
  • [NMS-7645] – Test failure: tryStatus
  • [NMS-7650] – XML data collection with HTTP POST requests is not working
  • [NMS-7651] – Improving exception handling on the XML Collector
  • [NMS-7657] – Vaadin surveillance view configuration doesn't work with Firefox
  • [NMS-7658] – Error in Debian/Ubuntu init script

Enhancement

  • [NMS-1504] – Add option to turn off snmp v3 passphrase clear text in log files
  • [NMS-2995] – Trapd is not able to process SNMPv3 INFORMs
  • [NMS-4619] – XMPP: Make SASL mechanism configurable
  • [NMS-6442] – Set vertex to focal point
  • [NMS-6581] – Drools Update to 6.0.1 Final
  • [NMS-6963] – PATCH — Bridgewave Wireless Bridge
  • [NMS-7146] – Move RTC over to Spring and Hibernate
  • [NMS-7229] – Be able to set the rescanExisting flag when defining a scheduler task on provisiond-configuration.xml
  • [NMS-7310] – add Siemens HiPath 3000 event files
  • [NMS-7311] – add Siemens HiPath 3000 HG1500 event files
  • [NMS-7312] – add Siemens HiPath 8000 / OpenScapeVoice event files
  • [NMS-7318] – Move notification status indicator to header
  • [NMS-7424] – Add pathOutageEnabled="false" to poller-configuration.xml by default
  • [NMS-7441] – Change varchar to text for CDP and LLDP tables
  • [NMS-7453] – Update Smack API
  • [NMS-7461] – Update asciidoctor maven plugin from 1.5.0 to 1.5.2
  • [NMS-7473] – Remove Capsd from OpenNMS
  • [NMS-7474] – Modify WebDetector/Monitor/Plugin/Client to expose ability to enable/disable certificate validation
  • [NMS-7476] – Add support for gzip compression on REST APIs
  • [NMS-7479] – Allow RRD data to be retrieved via REST
  • [NMS-7480] – Make resource data accessible through ReST
  • [NMS-7505] – The DefaultResourceDao loads all child resources when retrieving a specific resource by id
  • [NMS-7528] – Use the default threshold definition as a template when adding TriggeredUEI/RearmedUEI on thresholds through the WebUI
  • [NMS-7579] – Remove unnecessary output from opennms-doc module
  • [NMS-7593] – BSFMonitor creates a new BSFManager every poll which makes caching script engines ineffective
  • [NMS-7595] – SNMP interface RRD migrator should create and clean up backups interface-wise
  • [NMS-7609] – Create a ReST API to expose the available detectors/policies/categories/assets/services required to manipulate foreign sources
  • [NMS-7612] – Need upgrade task for collection strategy classes
  • [NMS-7619] – Create opennms.properties option to choose between new and old dashboard
  • [NMS-7632] – Deprecation of LinkD

Story

  • [NMS-7299] – Allow user to create and modify surveillance views
  • [NMS-7303] – Migrate Surveillance view GWT UI component to Vaadin
  • [NMS-7304] – Migrate Alarms GWT UI component to Vaadin
  • [NMS-7305] – Migrate Notifications GWT UI component to Vaadin
  • [NMS-7306] – Migrate Node Status component from GWT to Vaadin
  • [NMS-7307] – Migrate Resource Graph Viewer component from GWT to Vaadin
  • [NMS-7323] – Update user documentation
  • [NMS-7325] – Allow user to select surveillance view in the Dashboard
  • [NMS-7326] – Remove the GWT dashboard from the code base
  • [NMS-7429] – Remove "report-category" attribute
  • [NMS-7430] – Add surveillance view's name in the left header cell
  • [NMS-7431] – Add an option to disable "refreshing"
  • [NMS-7469] – Add preview window in config UI
  • [NMS-7489] – Icons for alarms and notifications
  • [NMS-7490] – Modal window to show node, alarm and notification details
  • [NMS-7491] – Admin configuration panel shows dashboard instead of surveillance view
  • [NMS-7492] – Allow to configure refresh time per surveillance view
  • [NMS-7530] – Rename the surveillance config panel link in Admin menu
  • [NMS-7540] – Dashboard Dashlet: Refresh indicator
  • [NMS-7542] – Vaadin Dashboard: Alarm Dashlet should have severity sorting by default

Review: System 76 Sable

As you might guess, I am a big fan of all things open, and I tend to vote with my wallet. When the need arose to replace some iMacs in the office, I decided to check out the Sable systems offered by Linux-friendly vendor System 76.

System 76 was a sponsor at SCaLE this year (like OpenNMS) and they also sponsored the Bad Voltage Live event where they gave away a laptop and a server, so they already had my goodwill.

Back in 2008 I needed some machines for our training courses, so being an Apple fanboy at the time I bought iMacs. Outfitting training rooms can be problematic if you don’t do training full time because you usually end up with nice systems that you don’t use very often. Seems wasteful, so we decided to use them to run Bamboo and our unit tests for OpenNMS when they weren’t being used for training.

Seth noticed that it was taking those machines around 240 minutes to run the suite of tests versus 160 minutes for the newer iMacs we were using, and this was having a negative impact development (almost everything we do relies on test driven development). Since we were running Ubuntu on the boxes anyway, I decided on a Linux alternative and chose System 76 for the first six replacement systems.

I like all-in-one systems for training since they tend to move around (we use the training room as a conference room when there are no classes). The all-in-one form factor makes them easy to carry. The Sables I ordered came with a 23.6 inch touch screen at 1080p, 3.1 GHz i7 processor, 16GB of RAM and a 500GB SSD for a total price of US$1731.

The ordering process went smoothly (there was one glitch when the original quote was for seven instead of six but it was quickly corrected). I placed the order on March 18th and they shipped a week later on the 25th.

They arrived in six boxes marked AIO PC:

System 76 boxes

I think AIO must be the manufacturer in China, but I couldn’t find a similar system on the web. One box had a smashed-in corner, so I opened it first, but it was packed well enough that the unit wasn’t damaged:

System 76 open box

I removed the packing and pulled the unit out. It was wrapped to protect the screen.

System 76 screen wrap

and the whole unit was covered in plastic wrap to prevent scratches.

System 76 plastic wrap

These units come with a power brick that is external to the system and I ordered them with a Logitech keyboard and mouse. These came in a separate box along with extra cables, etc., for expansion (unlike Apple products, you can actually work on these systems).

System 76 keyboard box

The hardest part about the whole process was figuring out how to turn the darn thing on. I finally found the switch on the back of the system on the lower right side (as you face it). I felt kind of stupid and yes, I even read the little pamphlet that came with it. Perhaps they should add and IKEA-like drawing with the little dude pointing to the switch.

It booted right up into Ubuntu 14.10, and all I had to do was create an account and set the IP address. Ben was then able to get in and deploy our Bamboo image and we were up and running in no time.

System 76 screen

While we still have some iMacs being used, the Sables have, so far, proven to be a solid replacement. I haven’t really used them as a desktop, yet, but they can run our test suite in a little over an hour which is almost a four-fold increase.

System 76 in a line

While Apple doesn’t offer a 24-inch iMac anymore, the 21-inch version with similar processor, RAM and SSD is US$2399, or quite a premium. The Sable is not nearly as thin or stylish as the iMac, but it is a nice looking machine and after struggling this week to correctly replace the hard drive in a late 2009 iMac I appreciate the fact that I can work on these if I need to, and the extra cables shipped with it even encourage me to do so.

And that’s what open is all about.

♫ To Be Thick as a Brick ♫

In keeping with the musical theme this week, I thought it would be cool to post about a little bit of OpenNMS “bling” now featured at the Chatham County Public Library in Pittsboro, NC.

OpenNMS Brick

We like to both talk about OpenNMS as well as support the local community, so when I found out that the library was raising money by selling personalized bricks, I thought it would be cool to get one.

OpenNMS Brick

We also have one to be installed at the Tesla Museum. I’m going to have to take a road trip to get a picture of that one, or see if Jeremy Garcia will drive over when it is open and take one for us.

OpenNMS at Fifteen

It was fifteen years ago today that the OpenNMS Project was registered on Sourceforge.

OpenNMS Sourceforge Summary

The project itself was started sometime in 1999, but I wasn’t around then as I didn’t get involved until 2001. I’ve been told that it started in July of that year, but since an open source project really doesn’t exist until something gets shared, it seems that March 30, 2000, is as good a day as any to mark the birth of OpenNMS.

I went poking around on the site and wasn’t able to find the very first thing posted there. I believe it was a mockup of an administration console using the Java Swing toolkit that never actually made it into the product. While I believe the code is still in there somewhere, in switching from CVS to SVN to git, dates do get a little corrupted and I couldn’t find it.

Anniversaries don’t really mean that much in practical terms. In moving from Sunday, March 29th, to Monday, March 30th there was no substantial change in OpenNMS at all. But it does lend itself to a bit of reflection, and fifteen years is a lot of time on which to reflect.

While I have been working on OpenNMS most of my professional career, I didn’t start it. People much smarter than me did, and that has pretty much been the story of my life. My only true talent is getting intelligent and creative people to work with me, and the rest of my career is just basking in their reflected glory. In 2002, the original founders decided to stop working on the project, but I saw its potential and was able to become its maintainer.

My original plan was to simply remain a company of one and provide consulting services around OpenNMS. That didn’t work out so well, as I soon realized that it could be much bigger than one person. In September of 2004, The OpenNMS Group was born in part to insure that the OpenNMS platform would always be around. We wanted to build something amazing, and this was reflected in our goal “to make OpenNMS the de facto management platform of choice.”

Being pretty much a group of technical people, we didn’t know we were doing things wrong. For a business plan we chose “Spend less money than you earn.” For a mission statement we liked “Help Customers – Have Fun – Make Money”. I put forth my two desires that OpenNMS should never suck and that OpenNMS should always be free software. We just took it from there.

This is not to say that we haven’t met with frustration. Gartner likes to diagram companies on two axes: “Vision” and “Ability to Execute that Vision”. We have a lot of vision, but our business model doesn’t give us a lot of resources to execute that vision quickly.

In order to change this, I spent a lot of time in Silicon Valley looking for an investor. Silicon Valley is pretty much the center of the technology industry, and one would assume that they would know the best way to run a technology based business. But I was pretty much told that you can’t be anyone unless you work in the Valley, you’re too old, and most importantly, you are doing it wrong.

There seems to be a formula they like out there. You raise a bunch of money. You hire as many people as fast as you can. You get as many users as possible and you hope that some larger company will buy you out. They call this an “exit strategy”, and this is supposed to be the focus of the business. Once you “exit” you can do it all again.

The problem, as I see it, is that a lot of companies have to exit before they get bought out. They run out of money, the investors run out of interest or patience, and then they just shutter the endeavor. Sure, you have your prominent billion dollar acquisitions, but in the scheme of things they are a very, very small percentage.

Plus, I’m already doing what I love to do. I really don’t want to do anything else. My chosen field, network management, is huge and I can always find something interesting in it, such as figuring out the best way to deal with the Internet of Things.

Sure, I believe that there are companies out there that would complement what we do. Ones that have the capital to help OpenNMS grow in a way that doesn’t go against our corporate culture. And while our involvement with such a company would probably be through an acquisition, I don’t see that as much as an “exit” as an evolution. I wouldn’t do the deal if I didn’t think I’d want to continue to work on the project, so I wouldn’t be going anywhere.

I see this post has become more about the business side of OpenNMS than the project itself, but I felt it was important to think about how our business philosophy permeates the project. Thus I thought it was serendipitous that Ben sent me a link to an article about an alternative to the “exit strategy” called the “exist strategy”.

The Nishiyama Onsen Keiunkan is the world’s oldest business. It is a hot springs hotel in Japan that was founded in 705 and has been run by fifty-two generations of the same family. They have survived and even thrived for 1300+ years by having a relentless focus on their customers. Even though they have only 40 rooms, by any measure you have to call their undertaking a success.

I think there is a huge problem with the tech industry’s focus on the exit. It’s such a short term goal. I expect the goal we set for OpenNMS to take the rest of my life and maybe some time after that. By focusing on an exit the people who usually end up paying for it are your customers, and that just doesn’t strike me as a way to run a business. I’m certain that if the Nishiyama Onsen Keiunkan had focused on growth over service they would have died out a long time ago. Heck, even the company that started OpenNMS closed its doors in 2004. When they weren’t moving fast enough toward their goal for the investors, the did what today we would call “a pivot” and it didn’t work out, even thought that’s what anyone in the Valley would have said was the right decision.

Look, I don’t want to come across as some sort of holier than thou “money is evil” kind of person. I run a business, not a charity. But as a businessman, and not a gambler, I truly believe that our best chance at financial success is to find a way for us deliver the best value we can to our customers. Period. That’s our focus, and any type of “exit” is way down on the list. Heck, the current management team at The OpenNMS Group is ten years older than the rest of the guys, and we’ve even thought of selling the business to them when we wish to retire. Not sure we can do it 52 times, but that is one form of exit that is still in line with an “exist strategy”.

And that’s the thought I want to take into the next fifteen years of OpenNMS. We have a covenant with our users and they have paid us back in kind with their support. This has resulted in a number of other impressive numbers. The OpenNMS Group has prospered for more than a decade. We are getting ready for our tenth OpenNMS Developers Conference, Dev-Jam. We’ve had almost the same number of OpenNMS User Conferences, the next one is in September and hosted by the independent OpenNMS Foundation.

We still have quite a few years to go to match the numbers of the Nishiyama Onsen Keiunkan, but I think that focusing on an “exist strategy” is the way to go. We still have the greatest team of people ever assembled to work on a software project, and while the faces and names have changed over the years, I still feel like I’m standing on the shoulders of giants.

And the view is great from up here.

OpenNMS Horizon 15.0.1 Released

Just a quick note to let everyone know that OpenNMS 15.0.1 has been released. This is the first bug fix release for OpenNMS 15, and if you are running it I strongly suggest you upgrade.

As we are working to complete our transition to Hibernate (which will allow OpenNMS to use any database backend, not just PostgreSQL) we discovered an old issue where, under certain circumstances, duplicate outage records could be created. When this happened under the new code, it would cause an exception and the outages would never be cleared. This has been corrected.

The complete list of changes is as follows:

Bug

  • [NMS-7331] – Outage timeline does not show all outages in timeframe
  • [NMS-7392] – Side-menu layout issues in node resources
  • [NMS-7394] – Outage records are not getting written to the database
  • [NMS-7395] – Overlapping input label in login screen
  • [NMS-7396] – Notifications with asset fields on the message are not working
  • [NMS-7399] – Surveillance box on start page doesn't work
  • [NMS-7403] – Data Collection Logs in wrong file
  • [NMS-7406] – Incorrect Availability information and Outage information
  • [NMS-7409] – Visual issues on the start page
  • [NMS-7423] – Duplicate copies of bootstrap.js are included in our pages
  • [NMS-7425] – Poller: start: Failed to schedule existing interfaces
  • [NMS-7426] – Not monitored services are shown as 100% available on the WebUI
  • [NMS-7427] – The PageSequenceMonitor is broken in OpenNMS 15
  • [NMS-7432] – Normalize the HTTP Host Header with the new HttpClientWrapper
  • [NMS-7433] – Topology UI takes a long to load after login
  • [NMS-7434] – Disabling Notifd crashes webUI
  • [NMS-7435] – The Quick Add Node menu item shouldn't be under the Admin menu
  • [NMS-7437] – The default log level is DEBUG instead of WARN on log4j2.xml
  • [NMS-7452] – CORS filter not working
  • [NMS-7454] – Netscaler systemDef will never match a real Netscaler

Enhancement

  • [NMS-7419] – Read port and authentication user from XMP config
  • [NMS-7438] – Apply the auto-resize feature for the timeline charts

Welcome to OpenNMS 15

Today OpenNMS 15 was released. It was a year and a half between the release of OpenNMS 1.12 and OpenNMS 14, but only three months between OpenNMS 14 and OpenNMS 15.

As we move forward this year we are trying to adhere more to the open source mantra of “release early, release often”, and thus the new major release. There have been 1177 new commits since 14.0.3

You’ll also notice that this version of OpenNMS has a new name – Horizon. We’ve always thought that OpenNMS represents the best network management platform available and the name is meant to reflect that. We hope to make as many improvements we can, as fast as we can, without sacrificing quality, thus keeping OpenNMS out on the “horizon” from the competition.

The main improvement for the 15 release is in the webUI. Although you might not notice it at first, we’ve spent months migrating the whole interface to a technology called Bootstrap. The Bootstrap framework allows us to create a responsive UI that should look fine on a computer, a tablet or a phone. This should allow us a lot more freedom to modify the style sheet and we hope to be able to add “skinable” theme options soon.

A cool feature that can be found in this new UI is the ability to automatically resize resource graphs. If you have a particular set of resource graphs displayed:

and then you shrink the window, you’ll note that the menu turns into a dropdown and the graphs themselves now fit the more narrow window:

There are a number of bug fixes and other new features, and a complete list can be found at the bottom of this post or in our Jira instance (but for some reason you have to be logged in to see it). I am happy to say that there was no need for major security fixes in this release. (grin)

Sub-task

  • [NMS-6642] – CiscoPingMibMonitor
  • [NMS-6674] – NetScalerGroupHealthMonitor
  • [NMS-7060] – merge DocuMerge branch into develop branch
  • [NMS-7086] – alter documentation deploy step in bamboo to match the new structure
  • [NMS-7164] – Fix fortinet event typos (fortinet vs fortimail)
  • [NMS-7238] – Fix UEI names for CitrixNetScaler trap events
  • [NMS-7264] – Document CORS Support

Bug

  • [NMS-1956] – Missing localised time in web pages
  • [NMS-2358] – Time to load Path Outages page grows with each entry added
  • [NMS-2580] – Null/blank sysName value causes null/blank node label
  • [NMS-3033] – Create a HibernateEventWriter to replace JdbcEventWriter
  • [NMS-3207] – Able to get to non authorised devices via path outages link.
  • [NMS-3615] – Custom Resource Performance Reports not available
  • [NMS-3847] – jdbcEventWriter: Failed to convert time to Timestamp
  • [NMS-4009] – wrong content type in rss.jsp
  • [NMS-4246] – Paging arrows invisible with firefox on mac
  • [NMS-4493] – Notification WebUI has issues
  • [NMS-4528] – Time format on Event webpage is different that on Notices webpage
  • [NMS-5057] – Installer database upgrade script (install -d) scans every RRD directory, bombs with "too many open files"
  • [NMS-5427] – RSS feeds are not valid
  • [NMS-5618] – notifications list breadcrumbs differs from notifications index page
  • [NMS-5858] – Resource Graphs No Longer Centered
  • [NMS-6022] – Vaadin Header not consistent with JSP Header
  • [NMS-6042] – Empty Notification search bug
  • [NMS-6472] – Map Menu is not listing all maps
  • [NMS-6529] – Web UI shows not the correct Java version
  • [NMS-6613] – Problems installing "Testing" on Ubuntu 14.04
  • [NMS-6826] – Queued Ops Pending default graph needs rename
  • [NMS-6827] – Many graph definitions in snmp-graph.properties have line continuation slashes
  • [NMS-6894] – New Focal Point Topology UI (STUI-2) very slow
  • [NMS-6917] – Node page availability graph isn't "(last 24 hours)"
  • [NMS-6924] – WMI collector does not support persistence selectors
  • [NMS-6956] – test failure: org.opennms.mock.snmp.LLDPMibTest
  • [NMS-6958] – Requisition list very slow to display
  • [NMS-6967] – GeoMap polygons activation doesn't accurately reflect cursor location
  • [NMS-7015] – Navbar in Distributed Map is missing
  • [NMS-7059] – Local interface not displayed correctly in "Cdp Cache Table Links"
  • [NMS-7075] – xss in device snmp settings
  • [NMS-7112] – provision.pl just works if the admin user credentials are used
  • [NMS-7115] – Message Error in DnsMonitor
  • [NMS-7120] – Unable to add graph to KSC report
  • [NMS-7126] – ReST call for outages ends up with 500 status
  • [NMS-7144] – OpenNMS logo doesn't point to the same file
  • [NMS-7149] – footer rendering is weird in opennms docs
  • [NMS-7170] – Add a unit test for NodeLabel.computeLabel()
  • [NMS-7176] – ie9 does not display any 'interfaces' on a switch node – the tabs are blank
  • [NMS-7185] – NullPointerException When Querying offset in ReST Events Endpoint
  • [NMS-7246] – OpenNMS does not eat yellow runts
  • [NMS-7270] – HTTP 500 errors in WebUI after upgrade to 14.0.2
  • [NMS-7277] – WMI changed naming format for wmiLogicalDisk and wmiPhysicalDisk device
  • [NMS-7279] – Enable WMI Opennms Cent OS box
  • [NMS-7287] – Non provisioned switches with multiple VLANs generate an error
  • [NMS-7322] – SNMP configuration shows v1 as default and v2c is set.
  • [NMS-7330] – Include parts of a configuration doesn't work
  • [NMS-7331] – Outage timeline does not show all outages in timeframe
  • [NMS-7332] – Unnecessary and confusing DEBUG entry on poller.log
  • [NMS-7333] – Switches values retrieved incorrectly in the BSF notification strategy
  • [NMS-7335] – QueryManagerDaoImpl crashes in getNodeServices()
  • [NMS-7359] – Acknowledging alarms from the geo-map is not working
  • [NMS-7360] – Add/Edit notifications takes too much time
  • [NMS-7363] – Update Java in OpenNMS yum repos
  • [NMS-7367] – Octectstring not well stored in strings.properties file
  • [NMS-7368] – RrdDao.getLastFetchValue() throws an exception when using RRDtool
  • [NMS-7381] – Authentication defined in XML collector URLs cannot contain some reserved characters, even if escaped.
  • [NMS-7387] – The hardware inventory scanner doesn't recognize PhysicalClass::cpu(12) for entPhysicalClass
  • [NMS-7391] – Crash on path outage JSP after DAO upgrade

Enhancement

  • [NMS-1595] – header should always contain links for all sections
  • [NMS-2233] – No link back to node after manually unmanaging services
  • [NMS-2359] – Group path outages by critical node
  • [NMS-2582] – Search for nodes by sysObjectID in web UI
  • [NMS-2694] – Modify results JSP to render multiple columns
  • [NMS-5079] – Sort the Path Outages by Critical Path Node
  • [NMS-5085] – Default hrStorageUsed disk space relativeChange threshold only alerts on a sudden _increase of free space_, not a decrease of free space
  • [NMS-5133] – Add ability to search for nodes by SNMP values like Location and Contact
  • [NMS-5182] – Upgrade JasperReports 3.7.6 to most recent version
  • [NMS-5448] – Add link to a node's upstream critical path node in the dependent node's web page
  • [NMS-6508] – Event definitions: Fortinet
  • [NMS-6736] – ImapMonitor does not work with nginx
  • [NMS-7123] – Expose SNMP4J 2.x noGetBulk and allowSnmpV2cInV1 capabilities
  • [NMS-7157] – showNodes.jsp should show nodes in alphabetical order
  • [NMS-7166] – Backup Exec UEI contain "http://" in uei
  • [NMS-7205] – Rename link to configure the Ops Board in the Admin section.
  • [NMS-7206] – Remove "JMX Config Generator Web UI ALPHA" from stable
  • [NMS-7228] – Document that user must be in 'rest', 'provision' or 'admin' role for provision.pl to work
  • [NMS-7247] – Add collection of SNMP MIB2 UDP scalar stats
  • [NMS-7261] – CORS Support
  • [NMS-7278] – Improve the speed of the ReST API and Service Layer for the requisitions' repositories.
  • [NMS-7308] – Enforce selecting a single resource for Custom Resource Performance Reports
  • [NMS-7317] – Rearrange Node/Event/Alarm/Outage links on bootstrap UI
  • [NMS-7384] – Add configuration property for protobuf queue size
  • [NMS-7388] – IpInterfaceScan shouldDetect() method should check for empty string in addition to null string

Important Security Issue with OpenNMS

It is said that “given enough eyeballs, all bugs are shallow”, which is true, but the tricky part is finding enough eyeballs, especially useful ones and not the ones in that jar in Blade Runner.

Recently, an end user reported a rather severe security issue with OpenNMS.

The process that serves up the “Categories” section on the front page of the web interface is called RTC (for Real Time Console). The database queries that create the availability numbers on that page can be expensive in terms of resources, so the RTC daemon was created to periodically query the database and then cache the results so that lots of users wouldn’t create an undo load on the system.

We use a tool called Castor to process XML data within OpenNMS. Due to a bug in Castor, if Castor discovers an error when processing an XML file, it can throw an exception that includes the contents of the file.

This is very useful when the files relate to OpenNMS and you are trying to debug them, but you don’t exactly want the contents of /etc/shadow or /etc/passwd displayed indiscriminately. That’s exactly what this exploit allows.

Since the default username and password for the RTC user is “rtc” and exists on every system, a malicious person could use that information to obtain the contents of any file on the system. Note that as far as the OpenNMS application is concerned, the RTC user has very limited permissions, but this is caused by an issue with Castor and it has just
enough permissions to trigger it.

This has been reported as our first ever CVE: CVE-2015-0975

The best fix is to upgrade to OpenNMS 14.0.3. If, however, you are unable to upgrade soon, you can edit the Spring security file to limit requests from RTC to just the localhost, which should mitigate most of the issue. Full instructions and files can be found on the wiki.

To summarize, all versions of OpenNMS prior to 14.0.3 contain a bug where *anyone* with access to the webUI (port 8980 on the OpenNMS server) can retrieve any file that is on the system. While this isn’t the end of the world, it definitely could be considered bad and should be addressed.

OpenNMS 14.0.2 Released

Today we released version 14.0.2 of OpenNMS. It is a recommended upgrade for all OpenNMS 14 users because is addresses a memory leak caused by the version of Vaadin we were using.

Here is a list of all the changes.

Sub-task

  • [NMS-7238] – Citrix Netscaler trap events

Bug

  • [NMS-6551] – Syslog Northbounder throws exceptions on certain alarms
  • [NMS-7073] – ICMP availability with custom packet size doesn't work with JNI
  • [NMS-7092] – Node page for a switch or router is unusable with Enhanced Linkd enabled
  • [NMS-7130] – Vaadin applications show Page Not Found error
  • [NMS-7186] – The XML Collector is not storing the proper data for node-level resources
  • [NMS-7187] – The XML Collection Handler is caching the resourceTypes
  • [NMS-7190] – Edit an existing scheduled outage from node's page doesn't work
  • [NMS-7193] – The report "Total Bytes Transferred By Interface" is not working with RRDtool
  • [NMS-7195] – When the DNS name of a discovered node changes, Provisiond doesn't update the node label.
  • [NMS-7218] – Null pointer exception removing services from node
  • [NMS-7227] – Some GWT pages are not working on IE
  • [NMS-7231] – The downtime model never removes the nodes when it is instructed to do it
  • [NMS-7243] – XML collector in JSON mode assumes all element content is String
  • [NMS-7245] – NPE on "manage and unmanage services and interfaces"
  • [NMS-7250] – Clicking On View Node Link Detailed Info Give java.lang.IllegalArgumentException

Enhancement

  • [NMS-7194] – Move the "Add new outage" to the top of the page.
  • [NMS-7230] – The Wallboard app makes OpenNMS unusable after a few days even if it is not used.
  • [NMS-7237] – Mikrotik RouterOS trap definitions