OpenNMS Horizon 15.0.1 Released

Just a quick note to let everyone know that OpenNMS 15.0.1 has been released. This is the first bug fix release for OpenNMS 15, and if you are running it I strongly suggest you upgrade.

As we are working to complete our transition to Hibernate (which will allow OpenNMS to use any database backend, not just PostgreSQL) we discovered an old issue where, under certain circumstances, duplicate outage records could be created. When this happened under the new code, it would cause an exception and the outages would never be cleared. This has been corrected.

The complete list of changes is as follows:

Bug

  • [NMS-7331] – Outage timeline does not show all outages in timeframe
  • [NMS-7392] – Side-menu layout issues in node resources
  • [NMS-7394] – Outage records are not getting written to the database
  • [NMS-7395] – Overlapping input label in login screen
  • [NMS-7396] – Notifications with asset fields on the message are not working
  • [NMS-7399] – Surveillance box on start page doesn't work
  • [NMS-7403] – Data Collection Logs in wrong file
  • [NMS-7406] – Incorrect Availability information and Outage information
  • [NMS-7409] – Visual issues on the start page
  • [NMS-7423] – Duplicate copies of bootstrap.js are included in our pages
  • [NMS-7425] – Poller: start: Failed to schedule existing interfaces
  • [NMS-7426] – Not monitored services are shown as 100% available on the WebUI
  • [NMS-7427] – The PageSequenceMonitor is broken in OpenNMS 15
  • [NMS-7432] – Normalize the HTTP Host Header with the new HttpClientWrapper
  • [NMS-7433] – Topology UI takes a long to load after login
  • [NMS-7434] – Disabling Notifd crashes webUI
  • [NMS-7435] – The Quick Add Node menu item shouldn't be under the Admin menu
  • [NMS-7437] – The default log level is DEBUG instead of WARN on log4j2.xml
  • [NMS-7452] – CORS filter not working
  • [NMS-7454] – Netscaler systemDef will never match a real Netscaler

Enhancement

  • [NMS-7419] – Read port and authentication user from XMP config
  • [NMS-7438] – Apply the auto-resize feature for the timeline charts

Welcome to OpenNMS 15

Today OpenNMS 15 was released. It was a year and a half between the release of OpenNMS 1.12 and OpenNMS 14, but only three months between OpenNMS 14 and OpenNMS 15.

As we move forward this year we are trying to adhere more to the open source mantra of “release early, release often”, and thus the new major release. There have been 1177 new commits since 14.0.3

You’ll also notice that this version of OpenNMS has a new name – Horizon. We’ve always thought that OpenNMS represents the best network management platform available and the name is meant to reflect that. We hope to make as many improvements we can, as fast as we can, without sacrificing quality, thus keeping OpenNMS out on the “horizon” from the competition.

The main improvement for the 15 release is in the webUI. Although you might not notice it at first, we’ve spent months migrating the whole interface to a technology called Bootstrap. The Bootstrap framework allows us to create a responsive UI that should look fine on a computer, a tablet or a phone. This should allow us a lot more freedom to modify the style sheet and we hope to be able to add “skinable” theme options soon.

A cool feature that can be found in this new UI is the ability to automatically resize resource graphs. If you have a particular set of resource graphs displayed:

and then you shrink the window, you’ll note that the menu turns into a dropdown and the graphs themselves now fit the more narrow window:

There are a number of bug fixes and other new features, and a complete list can be found at the bottom of this post or in our Jira instance (but for some reason you have to be logged in to see it). I am happy to say that there was no need for major security fixes in this release. (grin)

Sub-task

  • [NMS-6642] – CiscoPingMibMonitor
  • [NMS-6674] – NetScalerGroupHealthMonitor
  • [NMS-7060] – merge DocuMerge branch into develop branch
  • [NMS-7086] – alter documentation deploy step in bamboo to match the new structure
  • [NMS-7164] – Fix fortinet event typos (fortinet vs fortimail)
  • [NMS-7238] – Fix UEI names for CitrixNetScaler trap events
  • [NMS-7264] – Document CORS Support

Bug

  • [NMS-1956] – Missing localised time in web pages
  • [NMS-2358] – Time to load Path Outages page grows with each entry added
  • [NMS-2580] – Null/blank sysName value causes null/blank node label
  • [NMS-3033] – Create a HibernateEventWriter to replace JdbcEventWriter
  • [NMS-3207] – Able to get to non authorised devices via path outages link.
  • [NMS-3615] – Custom Resource Performance Reports not available
  • [NMS-3847] – jdbcEventWriter: Failed to convert time to Timestamp
  • [NMS-4009] – wrong content type in rss.jsp
  • [NMS-4246] – Paging arrows invisible with firefox on mac
  • [NMS-4493] – Notification WebUI has issues
  • [NMS-4528] – Time format on Event webpage is different that on Notices webpage
  • [NMS-5057] – Installer database upgrade script (install -d) scans every RRD directory, bombs with "too many open files"
  • [NMS-5427] – RSS feeds are not valid
  • [NMS-5618] – notifications list breadcrumbs differs from notifications index page
  • [NMS-5858] – Resource Graphs No Longer Centered
  • [NMS-6022] – Vaadin Header not consistent with JSP Header
  • [NMS-6042] – Empty Notification search bug
  • [NMS-6472] – Map Menu is not listing all maps
  • [NMS-6529] – Web UI shows not the correct Java version
  • [NMS-6613] – Problems installing "Testing" on Ubuntu 14.04
  • [NMS-6826] – Queued Ops Pending default graph needs rename
  • [NMS-6827] – Many graph definitions in snmp-graph.properties have line continuation slashes
  • [NMS-6894] – New Focal Point Topology UI (STUI-2) very slow
  • [NMS-6917] – Node page availability graph isn't "(last 24 hours)"
  • [NMS-6924] – WMI collector does not support persistence selectors
  • [NMS-6956] – test failure: org.opennms.mock.snmp.LLDPMibTest
  • [NMS-6958] – Requisition list very slow to display
  • [NMS-6967] – GeoMap polygons activation doesn't accurately reflect cursor location
  • [NMS-7015] – Navbar in Distributed Map is missing
  • [NMS-7059] – Local interface not displayed correctly in "Cdp Cache Table Links"
  • [NMS-7075] – xss in device snmp settings
  • [NMS-7112] – provision.pl just works if the admin user credentials are used
  • [NMS-7115] – Message Error in DnsMonitor
  • [NMS-7120] – Unable to add graph to KSC report
  • [NMS-7126] – ReST call for outages ends up with 500 status
  • [NMS-7144] – OpenNMS logo doesn't point to the same file
  • [NMS-7149] – footer rendering is weird in opennms docs
  • [NMS-7170] – Add a unit test for NodeLabel.computeLabel()
  • [NMS-7176] – ie9 does not display any 'interfaces' on a switch node – the tabs are blank
  • [NMS-7185] – NullPointerException When Querying offset in ReST Events Endpoint
  • [NMS-7246] – OpenNMS does not eat yellow runts
  • [NMS-7270] – HTTP 500 errors in WebUI after upgrade to 14.0.2
  • [NMS-7277] – WMI changed naming format for wmiLogicalDisk and wmiPhysicalDisk device
  • [NMS-7279] – Enable WMI Opennms Cent OS box
  • [NMS-7287] – Non provisioned switches with multiple VLANs generate an error
  • [NMS-7322] – SNMP configuration shows v1 as default and v2c is set.
  • [NMS-7330] – Include parts of a configuration doesn't work
  • [NMS-7331] – Outage timeline does not show all outages in timeframe
  • [NMS-7332] – Unnecessary and confusing DEBUG entry on poller.log
  • [NMS-7333] – Switches values retrieved incorrectly in the BSF notification strategy
  • [NMS-7335] – QueryManagerDaoImpl crashes in getNodeServices()
  • [NMS-7359] – Acknowledging alarms from the geo-map is not working
  • [NMS-7360] – Add/Edit notifications takes too much time
  • [NMS-7363] – Update Java in OpenNMS yum repos
  • [NMS-7367] – Octectstring not well stored in strings.properties file
  • [NMS-7368] – RrdDao.getLastFetchValue() throws an exception when using RRDtool
  • [NMS-7381] – Authentication defined in XML collector URLs cannot contain some reserved characters, even if escaped.
  • [NMS-7387] – The hardware inventory scanner doesn't recognize PhysicalClass::cpu(12) for entPhysicalClass
  • [NMS-7391] – Crash on path outage JSP after DAO upgrade

Enhancement

  • [NMS-1595] – header should always contain links for all sections
  • [NMS-2233] – No link back to node after manually unmanaging services
  • [NMS-2359] – Group path outages by critical node
  • [NMS-2582] – Search for nodes by sysObjectID in web UI
  • [NMS-2694] – Modify results JSP to render multiple columns
  • [NMS-5079] – Sort the Path Outages by Critical Path Node
  • [NMS-5085] – Default hrStorageUsed disk space relativeChange threshold only alerts on a sudden _increase of free space_, not a decrease of free space
  • [NMS-5133] – Add ability to search for nodes by SNMP values like Location and Contact
  • [NMS-5182] – Upgrade JasperReports 3.7.6 to most recent version
  • [NMS-5448] – Add link to a node's upstream critical path node in the dependent node's web page
  • [NMS-6508] – Event definitions: Fortinet
  • [NMS-6736] – ImapMonitor does not work with nginx
  • [NMS-7123] – Expose SNMP4J 2.x noGetBulk and allowSnmpV2cInV1 capabilities
  • [NMS-7157] – showNodes.jsp should show nodes in alphabetical order
  • [NMS-7166] – Backup Exec UEI contain "http://" in uei
  • [NMS-7205] – Rename link to configure the Ops Board in the Admin section.
  • [NMS-7206] – Remove "JMX Config Generator Web UI ALPHA" from stable
  • [NMS-7228] – Document that user must be in 'rest', 'provision' or 'admin' role for provision.pl to work
  • [NMS-7247] – Add collection of SNMP MIB2 UDP scalar stats
  • [NMS-7261] – CORS Support
  • [NMS-7278] – Improve the speed of the ReST API and Service Layer for the requisitions' repositories.
  • [NMS-7308] – Enforce selecting a single resource for Custom Resource Performance Reports
  • [NMS-7317] – Rearrange Node/Event/Alarm/Outage links on bootstrap UI
  • [NMS-7384] – Add configuration property for protobuf queue size
  • [NMS-7388] – IpInterfaceScan shouldDetect() method should check for empty string in addition to null string

Important Security Issue with OpenNMS

It is said that “given enough eyeballs, all bugs are shallow”, which is true, but the tricky part is finding enough eyeballs, especially useful ones and not the ones in that jar in Blade Runner.

Recently, an end user reported a rather severe security issue with OpenNMS.

The process that serves up the “Categories” section on the front page of the web interface is called RTC (for Real Time Console). The database queries that create the availability numbers on that page can be expensive in terms of resources, so the RTC daemon was created to periodically query the database and then cache the results so that lots of users wouldn’t create an undo load on the system.

We use a tool called Castor to process XML data within OpenNMS. Due to a bug in Castor, if Castor discovers an error when processing an XML file, it can throw an exception that includes the contents of the file.

This is very useful when the files relate to OpenNMS and you are trying to debug them, but you don’t exactly want the contents of /etc/shadow or /etc/passwd displayed indiscriminately. That’s exactly what this exploit allows.

Since the default username and password for the RTC user is “rtc” and exists on every system, a malicious person could use that information to obtain the contents of any file on the system. Note that as far as the OpenNMS application is concerned, the RTC user has very limited permissions, but this is caused by an issue with Castor and it has just
enough permissions to trigger it.

This has been reported as our first ever CVE: CVE-2015-0975

The best fix is to upgrade to OpenNMS 14.0.3. If, however, you are unable to upgrade soon, you can edit the Spring security file to limit requests from RTC to just the localhost, which should mitigate most of the issue. Full instructions and files can be found on the wiki.

To summarize, all versions of OpenNMS prior to 14.0.3 contain a bug where *anyone* with access to the webUI (port 8980 on the OpenNMS server) can retrieve any file that is on the system. While this isn’t the end of the world, it definitely could be considered bad and should be addressed.

OpenNMS 14.0.2 Released

Today we released version 14.0.2 of OpenNMS. It is a recommended upgrade for all OpenNMS 14 users because is addresses a memory leak caused by the version of Vaadin we were using.

Here is a list of all the changes.

Sub-task

  • [NMS-7238] – Citrix Netscaler trap events

Bug

  • [NMS-6551] – Syslog Northbounder throws exceptions on certain alarms
  • [NMS-7073] – ICMP availability with custom packet size doesn't work with JNI
  • [NMS-7092] – Node page for a switch or router is unusable with Enhanced Linkd enabled
  • [NMS-7130] – Vaadin applications show Page Not Found error
  • [NMS-7186] – The XML Collector is not storing the proper data for node-level resources
  • [NMS-7187] – The XML Collection Handler is caching the resourceTypes
  • [NMS-7190] – Edit an existing scheduled outage from node's page doesn't work
  • [NMS-7193] – The report "Total Bytes Transferred By Interface" is not working with RRDtool
  • [NMS-7195] – When the DNS name of a discovered node changes, Provisiond doesn't update the node label.
  • [NMS-7218] – Null pointer exception removing services from node
  • [NMS-7227] – Some GWT pages are not working on IE
  • [NMS-7231] – The downtime model never removes the nodes when it is instructed to do it
  • [NMS-7243] – XML collector in JSON mode assumes all element content is String
  • [NMS-7245] – NPE on "manage and unmanage services and interfaces"
  • [NMS-7250] – Clicking On View Node Link Detailed Info Give java.lang.IllegalArgumentException

Enhancement

  • [NMS-7194] – Move the "Add new outage" to the top of the page.
  • [NMS-7230] – The Wallboard app makes OpenNMS unusable after a few days even if it is not used.
  • [NMS-7237] – Mikrotik RouterOS trap definitions

Test Driven Development

One of the things that bothers me a lot about the software industry is this idea that proprietary software is somehow safer and better written than open source software. Perhaps it is because a lot of people still view software as “magic” and since you can’t see the code, is must be more “magical”. Or perhaps is it because people assume that something you have to pay for must be better than something that is free.

I’ve worked for and with a number of proprietary software companies, so I’ve seen how the sausage is made, and in some cases you don’t want to know. Don’t get me wrong, I’ve seen well managed commercial software companies that produce solid code because in the long run solid code is better and costs less, but I’ve also seen the opposite done simply to get a product to market quickly.

With open source, at least if you expect contribution, you have to produce code that is readable. It also helps if it is well written since good programmers respect and like working with other good programmers. It’s out there for everyone to see, and that puts extra demands on its quality.

In the interest of making great code, many years ago we switched to the Spring framework which had the benefit that we could start writing software tests. This test driven development is one reason OpenNMS is able to stay so stable with lots of code changes and a small test team.

What’s funny is that we’ve talked to at least two other companies who started implementing test driven development but then dropped it because it was too hard. It wasn’t easy for us, either, but as of this writing we run 5496 tests every time something changes in the main OpenNMS application, and that doesn’t include all of the other branches and projects such as Newts. We use the Bamboo product from Atlassian to manage the tests so I want to take this opportunity to thank them for supporting us.

OpenNMS 14 contained some of the biggest code changes in the platform’s history but so far it has been one of the smoothest releases yet. While most of that was due to to the great team of developers we have, part of it was due to the transparency that the open source process encourages.

Commercial software could learn a thing or two from it.