OpenNMS and Elasticsearch

With Horizon 18 we added support for sending OpenNMS events into Elasticsearch. Unfortunately, it only works with Elasticsearch 1.0. Elasticsearch 2.0 and higher requires Camel 17, but OpenNMS can’t use it. I wondered why, and if you were wondering too, here is the answer from Seth:

Camel 17 has changed their OSGi metadata to only be compatible with Spring 4.1 and higher. We’re still using Spring 4.0 so that’s one problem. The second issue is that ActiveMQ’s OSGi metadata bans Spring 4.0 and higher. So currently, ActiveMQ and Camel are mutually incompatible with one another inside Karaf at any version higher than the ones that we are currently running.

The biggest issue is the ActiveMQ problem, I’ve opened this bug and it sounds like they’re going to address it in their next major release

So there you have it.

The Inverter: Episode 69 – Bill and Ted and Jeremy and Bryan and Jono and Stuart’s Excellent Adventure

So the Gang of Four decided to actually produce a regular episode of Bad Voltage for the first time in, like, a month, so I decided to resurrect this little column making fun of them.

I am actually supposed to be on vacation this week, but for me vacation means working around the farm. I was working outside when the heat index hit 108.5F so while I was recovering from heat stroke I decided to give this week’s show a listen.

Clocking in at a healthy 75 minutes, give or take, it was an okay show, although the last fifteen minutes kind of wandered (much like most of this review).

The first segment concerned the creation of NextCloud as a fork of OwnCloud. I’ve already presented my thoughts on it from Bryan’s Youtube interview with the founders of NextCloud, and not much new was covered here. But it was a chance for all four of them to discuss it. One of the touted benefits of the new project is the lack of a contributor agreement. I don’t find this a good thing. Note that while I whole heartedly agree that many contributor agreements are evil, that doesn’t make them all evil. Take the OpenNMS contributor agreement. It’s pretty simple, and it protects both the contributor and the project. The most important feature, to me, is that the contributor states that they have a right to contribute the code to the project. I think that’s important, although if it were lacking or the contributor lied, the results would be the same (the infringing code would be removed from the application). It at least makes people think just a bit before sending in code.

Bryan made an offhand mention about trademarks in the same discussion, and I wasn’t sure what he meant by it. Does it mean NextCloud won’t enforce trademarks, or that there is an easy process that allows people to freely use them? I think enforcing trademarks is extremely important for open source companies. Otherwise, someone could take your code, crap all over it, and then ship it out under the same name. At OpenNMS we had issues with this back in 2005 but luckily since then it has been pretty quiet.

While there was even more speculation, no one really knows why the NextCloud fork happened. Some say it was that Frank Karlitschek was friends with Niels Mache of Spreed.me and wanted a partnership, but OwnCloud was against it. I think we’ll never know. Another suggestion that was been made is that it had to deal with the community of OwnCloud vs. the investors. Jono made the statement that VCs don’t take an active role in the community, but I have to disagree. My interactions with 90% of VCs have been an episode of Silicon Valley, and while they may not take an active role, you can expect them to say things like “These features over here will be part of our ‘enterprise’ version and not open, and make sure to hobble the ‘community’ version to drive sales, but other than that, run your community the way you want.”

One new point that was brought up was the business perception of the company. I think everyone who self identifies as an open source fan who is using OwnCloud will most likely switch to NextCloud since that is where the developers went, but will businesses be cautious about investing in NextCloud? The argument can be made that “who knows what will set Frank off next?” and the threat of NextNextCloud might worry some. I am not expecting this to happen (once bitten, twice shy, I bet Frank has learned a lot about what he wants out of his project) but it is a concern.

It is similar to Libreoffice. I don’t know anyone in the open source world using OpenOffice, but it is still huge outside of that world (I did a ride along with a friend who is police and was pleasantly surprised to see him bring up OpenOffice on his patrol car’s laptop).

It kind of reminds me when Google killed Reader and then announced Keep – seemed a bit ironic at the time. If a company can radically change or even remove a service you have come to rely on, will you trust them in the future?

The segment ended with a discussion of the early days of Ubuntu. Bryan made the claim that Ubuntu was made as an easier to use version Debian which Jono vehemently denied. He claimed the goal was to create a free, powerful desktop operating system. All I remember from those days were those kids from the United Colors of Bennetton ads on the covers of the free CDs.

The next piece was Bryan reviewing the latest Dell XPS 13 laptop. My last two laptops have been XPS 13 models and I love them. They ship with Linux (which I want to encourage) and I find they provide a great Linux desktop experience.

I got my newest one last year, and the main issue I’ve had is with the trackpad. Later kernels seem to have addressed most of my problems. I also dumped the Ubuntu 14.04 that shipped with it in exchange for Linux Mint, but I’m still running mainline kernels (4.6 at the moment). I’m eager for Mint 18 to release to see if the (rumoured) 4.4 kernel will work well (they keep backporting device driver changes) but outside of that I’ve had few problems.

Battery life is great, and the HiDPI screen is a big improvement over my old XPS 13. The main weirdness, for my model, is the location of the camera. In order to make the InfinityEdge display, they moved it to the bottom left of the screen so that the top bezel could be as thin as possible. It means people end up looking at the flabby underside of my chin instead of my face at times, but I use it so little that it doesn’t bother me much.

The third segment was about funding open source projects. It’s an eternal question: how do you pay for developers to work on free software? The guys didn’t really address it, focusing for the most part on programs that would provide some compensation for, say, travel to a conference, versus paying someone enough to make their mortgage. Stuart finally brought up that point but no real answers were offered.

The last fifteen minutes was the gang just shooting the breeze. Bryan used the term “duck fart” which apparently is a cocktail (sounds nasty, so don’t expect it on the cocktail blog). There is also, apparently, a science fiction novel called Bad Voltage that is not supposed to be that great, and the suggestion was made that the four of them should write their own version, but in the form of an “exquisite corpse” (my term, not theirs) where each would right their section independently and see what happens when it gets combined.

All in all, not a horrible show but not great, either. It is nice to have them all back together.

I’m eager to see how Bryan manages the next one, since he is spending 30 days solely in the Linux shell. How will Google Hangouts (which is what they use to make the show) work?

Curious minds want to know.

Choose the Right Thermometer

Okay, so I have a love/hate relationship with Centurylink. Centurylink provides a DSL circuit to my house. I love the fact that I have something resembling broadband with 10Mbps down and about 1Mbps up. Now that doesn’t even qualify as broadband according to the FCC, but it beats the heck out of the alternatives (and I am jealous of my friends with cable who have 100Mbps down or even 300Mbps).

The hate part comes from reliability, which lately has been crap. This post is actually focused on OpenNMS so I won’t go into all of my issues, but I’ve been struggling with long outages in my service.

The latest issue is a new one: packet loss. Usually the circuit is either up or completely down, but for the last three days I’ve been having issues with a large percentage of dropped packets. Of course I monitor my home network from the office OpenNMS instance, and this will usually manifest itself with multiple nodeLostService events around HTTP since I have a personal web server that I monitor.

The default ICMP monitor does not measure packet loss. As long as at least one ping reply makes it, ICMP is considered up, so the node itself remains up. OpenNMS does have a monitor for packet loss called Strafeping. It sends out 20 pings in a short amount of time and then measures how long they take to come back. So I added it to the node for my home and I saw something unusual: a consistent 19 out of 20 lost packets.

Strafeping Graph

Power cycling the DSL modem seems to correct the problem, and the command line ping was reporting no lost packets, so why was I seeing such packet loss from the monitor? Was Strafeping broken?

While it is always a possibility, I didn’t think that Strafeping was broken, but I did check a number of graphs for other circuits and they looked fine. Thus it had to be something else.

This brings up a touchy subject for me: false positives. Is OpenNMS reporting false problems?

It reminds me of an event happened when I was studying physics back in the late 1980s. I was working with some newly discovered ceramic material that exhibited superconductivity at relatively high temperatures (around 92K). That temperature can be reached using liquid nitrogen, which was relatively easy to source compared to cooler liquids like liquid helium.

I needed to measure the temperature of the ceramic, but mercury (used in most common thermometers) is a solid at those temperatures, so I went to my advisor for suggestions. His first question to me was “What does a thermometer measure?”

I thought it was a trick question, so I answered “temperature” (“thermo” meaning temperature and meter meaning “to measure”). He replied, “Okay, smart guy, the temperature of what?”

That was harder to answer exactly, so I said vague things like the ambient environment, whatever it was next to, etc. He interrupted me and said “No, a thermometer measures one thing: the temperature of the thermometer”.

This was an important lesson, even though it seems obvious. In the case of the ceramic it meant a lot of extra steps to make sure the thermometer we were using (which was based on changes in resistance) was as close to the temperature of the material as possible.

What does that have to do with OpenNMS? Well, OpenNMS is like that thermometer. It is up to us to make sure that the way we decide to use it for monitoring is as close to our criteria as possible. A “false positive” usually indicates a problem with the method versus the tool – OpenNMS is behaving exactly as it should but we need to match it better to what we expect.

In my case I found out the router I use was limited by default to responding 1 ping per second (to avoid DDoS attacks I assume), so last night when I upped that to allow 20 pings per second Strafeping started to work as expected (as you can see in the graph above).

This allowed me to detect when my DSL circuit packet loss started again today. A little after 14:00 the system detected high packet loss. When this happened before, power cycling the modem seemed to fix it, so I headed home to do just that.

While I was on the way, around 15:30, the packet loss seemed to improve, but as you can see from the graph the ping times were all over the place (the line is green but there is a lot of extra “smoke” around it indicating a variance in the response times). I proactively power cycled the modem and things settled down. The Centurylink agent agreed to send me a new modem.

The point of this post is to stress that you need to understand how your monitoring tools actually work and you can often correct issues that make a monitor unusable and turn it into to something useful. Choose the right thermometer.

Nextcloud, Never Stop Nexting!

It’s been awhile since I’ve posted a long, navel-gazing rant about the business of open source software. I’ve been trying to focus more on our business than spending time talking about it, but yesterday an announcement was made that brought all of it back to the fore.

TL;DR; Yesterday the Nextcloud project was announced as a fork of the popular ownCloud project. It was founded by many of the core developers of ownCloud. On the same day, the US corporation behind ownCloud shut it doors, citing Nextcloud as the reason. Is this a good thing? Only time will tell, but it represents the (still) ongoing friction between open source software and traditional software business models.

I was looking over my Google+ stream yesterday when I saw a post by Bryan Lunduke announcing a special “secret” broadcast coming at 1pm (10am Pacific). As I am a Lundookie, I made a point to watch it. I missed the start of it but when I joined it turned out to be an interview with the technical team behind a new project called Nextcloud, which was for the most part the same team behind ownCloud.

Nextcloud is a fork, and in the open source world a “fork” is the nuclear option. When a project’s community becomes so divided that they can’t work things out, or they don’t want to work things out for whatever reasons, there is the option to take the code and start a new project. It always represents a failure but sometimes it can’t be helped. The two forks I can think of off hand, Joomla from Mambo and Icinga from Nagios, both resulted in stronger projects and better software, so maybe this will happen here.

In part I blame the VC model for financing software companies for the fork. In the traditional software model, a bunch of money is poured into a company to create software, but once that software is created the cost of reproducing it is near zero, so the business model is to sell licenses to the software to the end users in order to generate revenue in the future. This model breaks when it comes to free and open source software, since once the software is created there is no way to force the end users to pay for it.

That still doesn’t keep companies from trying. This resulted in a trend (which is dying out) called “open core” – the idea that some software is available under an open source license but certain features are kept proprietary. As Brian Prentice at Gartner pointed out, there is little difference between this and just plain old proprietary software. You end up with the same lack of freedom and same vendor lock in.

Those of us who support free software tend to be bothered by this. Few things get me angrier than to be at a conference and have someone go “Oh, this OpenNMS looks nice – how much is the enterprise version?”. We only have the enterprise version and every bit of code we produce is available under an open source license.

Perhaps this happened at ownCloud. When one of the founders was on Bad Voltage awhile back, I had this to say about the interview:

The only thing that wasn’t clear to me was the business model. The founder Frank Karlitschek states that ownCloud is not “open core” (or as we like to call it “fauxpensource“) but I’m not clear on their “enterprise” vs. “community” features. My gut tells me that they are on the side of good.

Frank seemed really to be on the side of freedom, and I could see this being a problem if the rest of the ownCloud team wasn’t so dedicated.

On the interview yesterday I asked if Nextcloud was going to have a proprietary (or “enterprise”) version. As you can imagine I am pretty strongly against that.

The reason I asked was from this article on the new company that stated:

There will be two editions of Nextcloud: the free of cost community edition and the paid enterprise edition. The enterprise edition will have some additional features suited for enterprise customers, but unlike ownCloud, the community and enterprise editions for Nextcloud will borrow features from each other more freely.

Frank wouldn’t commit to making all of Nextcloud open, but he does seem genuinely determined to make as much of it open as possible.

Which leads me to wonder, what’s stopping him?

It’s got to be the money guys, right? Look, nothing says that open source companies can’t make money, it’s just you have to do it differently than you would with proprietary software. I can’t stress this enough – if your “open source” business model involves selling proprietary software you are not an open source company.

This is one of the reasons my blood pressure goes up whenever I visit Silicon Valley. Seriously, when I watch the HBO show to me it isn’t a comedy, it’s a documentary (and the fact that I most closely identify with the character of Erlich doesn’t make me feel all that better about myself).

I want to make things. I want to make things that last. I can remember the first true vacation I took, several years after taking over the OpenNMS project when it had grown it to the point that it didn’t need me all the time. I was so happy that it had reached that point. I want OpenNMS to be around well after I’m gone.

It seems, however, that Silicon Valley is more interested in making money rather than making things. They hunt “unicorns” – startups with more than a $1 billion valuation – and frequently no one can really determine how they arrive at that valuation. They are so consumed with jargon that quite often you can’t even figure out what some of these companies do, and many of them fade in value after the IPO.

I can remember a keynote at OSCON by Martin Mickos about Eucalyptus, and how it was “open source” but of course would have proprietary code because “well, we need to make money”. He is one of those Silicon Valley darlings who just doesn’t get open source, and it’s why we now have OpenStack.

The biggest challenge to making money in open source is educating the consumer that free software doesn’t mean free solution. Free software can be very powerful but it comes with a certain level of complexity, and to get the most out of it you have to invest in it. The companies focused on free and open source software make money by providing products that address this complexity.

Traditionally, this has been service and support. I like to say at OpenNMS we don’t sell software, we sell time. Since we do little marketing, all of our users are self selecting (which makes them incredibly intelligent and usually quite physically beautiful) and most of them have the ability to figure out their own issues. But by working with us we can greatly shorten the time to deploy as well as make them aware of options they may not know exist.

In more recent times, there is also the option to offer open source software as a service. Take WordPress, one of my favorite examples. While I find it incredibly easy to install an instance of WordPress, if you don’t want to or if you find it difficult, you can always pay them to host it for you. Change your mind later? You can export it to an instance you control.

The market is always changing and with it there is opportunity. As OpenNMS is a network monitoring platform and the network keeps getting larger, we are focusing on moving it to OpenStack for ultimate scalability, and then coupled with our Minions we’ll have the ability to handle an “Internet of Things” amount of devices. At each point there are revenue opportunities as we can help our clients get it set up in their private cloud, or help them by letting them outsource some or all of it, such as Newts storage. The beauty is that the end user gets to own their solution and they always have the option of bringing it back in house.

None of these models involves requiring a license purchase as part of the business plan. In fact, I can foresee a time in the near future where purchasing a proprietary software product without fully exploring open source alternatives will be considered a breach of fiduciary responsibility.

And these consumers will be savvy enough to demand pure open source solutions. That is why I think Nextcloud, if they are able to focus their revenue efforts on things such as an appliance, has a better chance of success than a company like ownCloud that relies on revenue from software licensing sales. The fact that most of the creators have left doesn’t help them, either.

The lack of revenue from licenses sales makes most VCs panic, and it looks like that’s exactly what happened with the US division of ownCloud:

Unfortunately, the announcement has consequences for ownCloud, Inc. based in Lexington, MA. Our main lenders in the US have cancelled our credit. Following American law, we are forced to close the doors of ownCloud, Inc. with immediate effect and terminate the contracts of 8 employees. The ownCloud GmbH is not directly affected by this and the growth of the ownCloud Foundation will remain a key priority.

I look forward to the time in the not too distant future when the open core model is seen as quaint as selling software on floppy disks at the local electronics store, and I eagerly await the first release of Nextcloud.

Emley Moor, Kirklees, West Yorkshire

I spent last week back in the United Kingdom. I always find it odd to travel to the UK. When I’m in, say, Germany or Spain, I know I’m in a different country. With the UK I sometimes forget and hijinks ensue. As Shaw may have once said, we are two countries separated by a common language.

Usually I spend time in the South, mainly Hampshire, but this trip was in Yorkshire, specifically West Yorkshire. I was looking forward to this for a number of reasons. For example, I love Yorkshire Pudding, and the Four Yorkshiremen is my favorite Monty Python routine.

Also, it meant that I could fly into Manchester Airport and miss Heathrow. Well, I didn’t exactly miss it.

I was visiting a big client that most people have never heard of, even though they are probably an integral part of your life if you live in the UK. Arqiva provides the broadcast infrastructure for much of the television and mobile phone industry in the country, as well as being involved in deploying networks for projects such as smart metering and the Internet of Things.

We were working at the Emley Moor location, which is home to the Emley Moor Mast. This is the largest freestanding structure in Britain (and third in the European Union). With a total height of 1084 feet, it is higher than the Eiffel Tower and almost twice as high as the Washington Monument.

Emily Moor Mast View

The mast was built in 1971 to replace a metal lattice tower that fell, due to a combination of ice and wind, in 1969. I love the excerpt from the log book mentioned in the Wikipedia article:

  • Day: Lee, Caffell, Vander Byl
  • Ice hazard – Packed ice beginning to fall from mast & stays. Roads close to station temporarily closed by Councils. Please notify councils when roads are safe (!)
  • Pye monitor – no frame lock – V10 replaced (low ins). Monitor overheating due to fan choked up with dust- cleaned out, motor lubricated and fan blades reset.
  • Evening :- Glendenning, Bottom, Redgrove
  • 1,265 ft (386 m) Mast :- Fell down across Jagger Lane (corner of Common Lane) at 17:01:45. Police, I.T.A. HQ, R.O., etc., all notified.
  • Mast Power Isolator :- Fuses removed & isolator locked in the “OFF” position. All isolators in basement feeding mast stump also switched off. Dehydrators & TXs switched off.

They still have that log book, open to that page.

Emily Moor Log Book

If you have 20 minutes, there is a great old documentary on the fall of the old tower and the construction of the new mast.

On my last day there we got to go up into the structure. It’s pretty impressive:

Emily Moor Mast Up Close

and the inside looks like something from a 1970s sci-fi movie:

Emily Moor Mast Inside

The article stated that it takes seven minutes to ride the lift to the top. I timed it at six minutes, fifty-seven seconds, so that’s about right (it’s fifteen seconds quicker going down). I was working with Dr. Craig Gallen who remembers going up in the open lift carriage, but we were in an enclosed car. It’s very small and with five of us in it I will admit to a small amount of claustrophobia on the way up.

But getting to the top is worth it. The view is amazing:

View from Emily Moor Mast

It was a calm day but you could still feel the tower sway a bit. They have a plumb bob set up to measure the drift, and it was barely moving while we were up there. Toby, our host, told of a time he had to spend seven hours installing equipment when the bob was moving four to five inches side to side. They had to move around on their hands and knees to avoid falling over.

Plumb Bob

I’m glad I wasn’t there on that day, but our day was fantastic. Here is a shot of the parking lot where the first picture (above) was taken.

View of Emily Moor Parking Lot

I had a really great time on this trip. The client was amazing, and I really like the area. It reminds me a bit of the North Carolina mountains. I did get my Yorkshire Pudding in Yorkshire (bucket list item):

Yorkshire Pudding in Yorkshire

and one evening Craig and I got to meet up with Keith Spragg.

Keith Spragg and Craig Gallen

Keith is a regular on the OpenNMS IRC channel (#opennms on freenode.net), and he works for Southway Housing Trust. They are a non-profit that manages several thousand homes, and part of that involves providing certain IT services to their tenants. They are mainly a Windows/Citrix shop but OpenNMS is running on one of the two Linux machines in their environment. He tried out a number of solutions before finding that OpenNMS met his needs, and he pays it forward by helping people via IRC. It always warms my heart to see OpenNMS being used in such places.

I hope to return to the area, although I was glad I was there in May. It’s around 53 degrees north latitude, which puts it level with the southern Alaskan islands. It would get light around 4am, and in the winter ice has been known to fall in sheets from the Mast (the walkways are covered to help protect the people who work there).

I bet Yorkshire Pudding really hits the spot on a cold winter’s day.

OpenNMS Horizon 18 “Tardigrade” Is Now Available

I am extremely happy to announce the availability of Horizon 18, codenamed “Tardigrade”. Ben is responsible for naming our releases and he’s decided that the theme for Horizon 18 will be animals. The name “Tardigrade” was suggested in the IRC channel by Uberpenguin, and while they aren’t the prettiest things, Wikipedia describes them as “perhaps the most durable of known organisms” so in the context of OpenNMS that is appropriate.

OpenNMS Horizon 18

I am also happy to see the Horizon program working. When we split OpenNMS into Horizon and Meridian, the main reason was to drive faster development. Now instead of a new stable release every 18 months, we are getting them out every 3 to 4 months. And these are great releases – not just major releases in name only.

The first thing you’ll notice if you log in to Horizon 18 as a user in the admin role is that we’ve added a new “opt-in” feature that let’s us know a little bit about how OpenNMS is being used by people. We hope that most of you will choose to send us this information, and in the spirit of the Open Source Way we’ve made all of the statistics available publicly.

OpenNMS Opt-In Screen

One of the key things we are looking for is the list of SNMP Object IDs. This will let us know what devices are being monitored by our users and to increase their level of support. Of course, this requires that your OpenNMS instance be able to reach the stats server on the Internet, and you can change your choice at any time on the Configuration admin page under “Data Choices”. It will only send this information once every 24 hours, so we don’t expect it to impact network traffic at all.

Once you’ve opted in, the next thing you’ll probably notice is new problem lists on the home page listing “services” and “applications”.

OpenNMS BSM Problem Lists

This related to the major feature addition in Horizon 18 of the Business Service Monitor (BSM).

OpenNMS BSM OpenDaylight

As people move from treating servers as pets to treating them like cattle, the emphasis has shifted to understanding how well applications and microservices are running as a whole instead of focusing on individual devices. The BSM allows you to configure these services and then leverage all the usual OpenNMS crunchy goodness as you would a legacy service like HTTP running on a particular box. The above screenshot comes from some prototype work Jesse has been doing with integrating OpenNMS with OpenDaylight. As you can see at a glance, while the ICMP service is down on a particular device, the overall Network Fabric is still functioning perfectly.

Another thing I’m extremely proud of is the increase in the quality of documentation. Ronny and the rest of the documentation team are doing a great job, and we’ve made it a requirement that new features aren’t complete without documentation. Please check out the release notes as an example. It contains a pretty comprehensive lists of changes in 18.

A few I’d like to point out:

Horizon 17 is one of the most powerful and stable releases of OpenNMS ever, and we hope to continue that tradition with Horizon 18. Hats off to the team for such great work.

Here is a list of all the issues addressed in Horizon 18:

Release Notes – OpenNMS – Version 18.0.0

Bug

  • [NMS-3489] – "ADD NODE" produces "too much" config
  • [NMS-4845] – RrdUtils.createRRD log message is unclear
  • [NMS-5788] – model-importer.properties should be deprecated and removed
  • [NMS-5839] – Bring WaterfallExecutor logging on par with RunnableConsumerThreadPool
  • [NMS-5915] – The retry handler used with HttpClient is not going to do what we expect
  • [NMS-5970] – No HTML title on Topology Map
  • [NMS-6344] – provision.pl does not import requisitions with spaces in the name
  • [NMS-6549] – Eventd does not honor reloadDaemonConfig event
  • [NMS-6623] – Update JNA.jar library to support ARM based systems
  • [NMS-7263] – jaxb.properties not included in jar
  • [NMS-7471] – SNMP Plugin tests regularly failing
  • [NMS-7525] – ArrayOutOfBounds Exception in Topology Map when selecting bridge-port
  • [NMS-7582] – non RFC conform behaviour of SmtpMonitor
  • [NMS-7731] – Remote poller dies when trying to use the PageSequenceMonitor
  • [NMS-7763] – Bridge Data is not Collected on Cisco Nexus
  • [NMS-7792] – NPE in JmxRrdMigratorOffline
  • [NMS-7846] – Slow LinkdTopologyProvider/EnhancedLinkdTopologyProvider in bigger enviroments
  • [NMS-7871] – Enlinkd bridge discovery creates erroneous entries in the Bridge Forwarding Tables of unrelated switches when host is a kvm virtual host
  • [NMS-7872] – 303 See Other on requisitions response breaks the usage of the Requisitions ReST API
  • [NMS-7880] – Integration tests in org.opennms.core.test-api.karaf have incomplete dependencies
  • [NMS-7918] – Slow BridgeBridgeTopologie discovery with enlinkd.
  • [NMS-7922] – Null pointer exceptions with whitespace in requisition name
  • [NMS-7959] – Bouncycastle JARs break large-key crypto operations
  • [NMS-7967] – XML namespace locations are not set correctly for namespaces cm, and ext
  • [NMS-7975] – Rest API v2 returns http-404 (not found) for http-204 (no content) cases
  • [NMS-8003] – Topology-UI shows LLDP links not correct
  • [NMS-8018] – Vacuumd sends automation events before transaction is closed
  • [NMS-8056] – opennms-setup.karaf shouldn't try to start ActiveMQ
  • [NMS-8057] – Add the org.opennms.features.activemq.broker .xml and .cfg files to the Minion repo webapp
  • [NMS-8058] – Poll all interface w/o critical service is incorrect
  • [NMS-8072] – NullPointerException for NodeDiscoveryBridge
  • [NMS-8079] – The OnmsDaoContainer does not update its cache correctly, leading to a NumberFormatException
  • [NMS-8080] – VLAN name is not displayed
  • [NMS-8086] – Provisioning Requisitions with spaces in their name.
  • [NMS-8096] – JMX detector connection errors use wrong log level
  • [NMS-8098] – PageSequenceMonitor sometimes gives poor failure reasons
  • [NMS-8104] – init script checkXmlFiles() fails to pick up errors
  • [NMS-8116] – Heat map Alarms/Categories do not show all categories
  • [NMS-8118] – CXF returning 204 on NULL responses, rather than 404
  • [NMS-8125] – Memory leak when using Groovy + BSF
  • [NMS-8128] – NPE if provisioning requisition name has spaces
  • [NMS-8137] – OpenNMS incorrectly discovers VLANs
  • [NMS-8146] – "Show interfaces" link forgets the filters in some circumstances
  • [NMS-8167] – Cannot search by MAC address
  • [NMS-8168] – Vaadin Applications do not show OpenNMS favicon
  • [NMS-8189] – Wrong interface status color on node detail page
  • [NMS-8194] – Return an HTTP 303 for PUT/POST request on a ReST API is a bad practice
  • [NMS-8198] – Provisioning UI indication for changed nodes is too bright
  • [NMS-8208] – Upgrade maven-bundle-plugin to v3.0.1
  • [NMS-8214] – AlarmdIT.testPersistManyAlarmsAtOnce() test ordering issue?
  • [NMS-8215] – Chart servlet reloads Notifd config instead of Charts config
  • [NMS-8216] – Discovery config screen problems in latest code
  • [NMS-8221] – Operation "Refresh Now" and "Automatic Refresh" referesh the UI differently
  • [NMS-8224] – JasperReports measurements data-source step returning null
  • [NMS-8235] – Jaspersoft Studio cannot be used anymore to debug/create new reports
  • [NMS-8240] – Requisition synchronization is failing due to space in requisition name
  • [NMS-8248] – Many Rcsript (RScript) files in OPENNMS_DATA/tmp
  • [NMS-8257] – Test flapping: ForeignSourceRestServiceIT.testForeignSources()
  • [NMS-8272] – snmp4j does not process agent responses
  • [NMS-8273] – %post error when Minion host.key already exists
  • [NMS-8274] – All the defined Statsd's reports are being executed even if they are disabled.
  • [NMS-8277] – %post failure in opennms-minion-features-core: sed not found
  • [NMS-8293] – Config Tester Tool doesn't check some of the core configuration files
  • [NMS-8298] – Label of Vertex is too short in some cases
  • [NMS-8299] – Topology UI recenters even if Manual Layout is selected
  • [NMS-8300] – Center on Selection no longer works in STUI
  • [NMS-8301] – v2 Rest Services are deployed twice to the WEB-INF/lib directory
  • [NMS-8302] – Json deserialization throws "unknown property" exception due to usage of wrong Jax-rs Provider
  • [NMS-8304] – An error on threshd-configuration.xml breaks Collectd when reloading thresholds configuration
  • [NMS-8313] – Pan moving in Topology UI automatically recenters
  • [NMS-8314] – Weird zoom behavior in Topology UI using mouse wheel
  • [NMS-8320] – Ping is available for HTTP services
  • [NMS-8324] – Friendly name of an IP service is never shown in BSM
  • [NMS-8330] – Switching Topology Providers causes Exception
  • [NMS-8335] – Focal points are no longer persisted
  • [NMS-8337] – Non-existing resources or attributes break JasperReports when using the Measurements API
  • [NMS-8353] – Plugin Manager fails to load
  • [NMS-8361] – Incorrect documentation for org.opennms.newts.query.heartbeat
  • [NMS-8371] – The contents of the info panel should refresh when the vertices and edges are refreshed
  • [NMS-8373] – The placeholder {diffTime} is not supported by Backshift.
  • [NMS-8374] – The logic to find event definitions confuses the Event Translator when translating SNMP Traps
  • [NMS-8375] – License / copyright situation in release notes introduction needs simplifying
  • [NMS-8379] – Sluggish performance with Cassandra driver
  • [NMS-8383] – jmxconfiggenerator feature has unnecessary includes
  • [NMS-8386] – Requisitioning UI fails to load in modern browsers if used behind a proxy
  • [NMS-8388] – Document resources ReST service
  • [NMS-8389] – Heatmap is not showing
  • [NMS-8394] – NoSuchElement exception when loading the TopologyUI
  • [NMS-8395] – Logging improvements to Notifd
  • [NMS-8401] – There are errors on the graph definitions for OpenNMS JMX statistics
  • [NMS-8403] – Document styles of identifying nodes in resource IDs

Enhancement

  • [NMS-2504] – Create a better landing page for Configure Discovery aftermath
  • [NMS-4229] – Detect tables with Provisiond SNMP detector
  • [NMS-5077] – Allow other services to work with Path Outages other than ICMP
  • [NMS-5905] – Add ifAlias to bridge Link Interface Info
  • [NMS-5979] – Make the Provisioning Requisitions "Node Quick-Add" look pretty
  • [NMS-7123] – Expose SNMP4J 2.x noGetBulk and allowSnmpV2cInV1 capabilities
  • [NMS-7446] – Enhance Bridge Link Object Model
  • [NMS-7447] – Update BridgeTopology to use the new Object Model
  • [NMS-7448] – Update Bridge Topology Discovery Strategy
  • [NMS-7756] – Change icon for Dell PowerConnector switch
  • [NMS-7798] – Add Sonicwall Firewall Events
  • [NMS-7903] – Elasticsearch event and alarm forwarder
  • [NMS-7950] – Create an overview for the developers guide
  • [NMS-7965] – Add support for setting system properties via user supplied .properties files
  • [NMS-7976] – Merge OSGi Plugin Manager into Admin UI
  • [NMS-7980] – provide HTTPS Quicklaunch into node page
  • [NMS-8015] – Remove Dependencies on RXTX
  • [NMS-8041] – Refactor Enhanced Linkd Topology
  • [NMS-8044] – Provide link for Microsoft RDP connections
  • [NMS-8063] – Update asciidoc dependencies to latest 1.5.3
  • [NMS-8076] – Allow user to access local documentation from OpenNMS Jetty Webapp
  • [NMS-8077] – Add NetGear Prosafe Smart switch SNMP trap events and syslog events
  • [NMS-8092] – Add OpenWrt syslog and related event definitions
  • [NMS-8129] – Disallow restricted characters from foreign source and foreign ID
  • [NMS-8149] – Update asciidoctorj to 1.5.4 and asciidoctorjPdf to 1.5.0-alpha.11
  • [NMS-8152] – Collect and publish anonymous statistics to stats.opennms.org
  • [NMS-8160] – Remove Quick-Add node to avoid confusions and avoid breaking the ReST API
  • [NMS-8163] – Requisitions UI Enhancements
  • [NMS-8179] – ifIndex >= 2^31
  • [NMS-8182] – Add HTTPS as quick-link on the node page
  • [NMS-8205] – Generate events for alarm lifecycle changes
  • [NMS-8209] – Upgrade junit to v4.12
  • [NMS-8210] – Add support for calculating the derivative with a Measurements API Filter
  • [NMS-8211] – Add support for retrieving nodes with a filter expression via the ReST API
  • [NMS-8218] – External event source tweaks to admin guide
  • [NMS-8219] – Copyright bump on asciidoc docs
  • [NMS-8225] – Integrate the Minion container and packages into the mainline OpenNMS build
  • [NMS-8226] – Upgrade SNMP4J to version 2.4
  • [NMS-8238] – Topology providers should provide a description for display
  • [NMS-8251] – Parameterize product name in asciidoc docs
  • [NMS-8259] – Cleanup testdata in SnmpDetector tests
  • [NMS-8265] – SNMP collection systemDefs for Cisco ASA5525-X, ASA5515-X
  • [NMS-8266] – SNMP collection systemDefs for Juniper SRX210he2, SRX100h
  • [NMS-8267] – Create documentation for SNMP detector
  • [NMS-8271] – Enable correlation engines to register for all events
  • [NMS-8296] – Be able to re-order the policies on a requisition through the UI
  • [NMS-8334] – Implement org.opennms.timeseries.strategy=evaluate to facilitate the sizing process
  • [NMS-8336] – Set the required fields when not specified while adding events through ReST
  • [NMS-8349] – Update screenshots with 18 theme in user documentation
  • [NMS-8365] – Add metric counter for drop counts when the ring buffer is full
  • [NMS-8377] – Applying some organizational changes on the Requisitions UI (Grunt, JSHint, Dist)

Story

Task

  • [NMS-8236] – Move the "vaadin-extender-service" module to opennms code base

Welcome Ecuador (Country 29)

It is with mixed emotions that I get to announce that we now have a customer in Ecuador, our 29th country.

My emotions are mixed as my excitement at having a new customer in a new country is offset by the tragedy that country suffered recently. Everyone at OpenNMS is sending out our best thoughts and we hope things settle down (quite literally) soon.

They join the following countries:

Australia, Canada, Chile, China, Costa Rica, Denmark, Egypt, Finland, France, Germany, Honduras, India, Ireland, Israel, Italy, Japan, Malta, Mexico, The Netherlands, Portugal, Singapore, Spain, Sweden, Switzerland, Trinidad, the UAE, the UK and the US.

Agent Provocateur

I’ve been involved with the monitoring of computer networks for a long time, two decades actually, and I’m seeing an alarming trend. Every new monitoring application seems to be insisting on software agents. Basically, in order to get any value out of the application, you have to go out and install additional software on each server in your network.

Now there was a time when this was necessary. BMC Software made a lot of money with its PATROL series of agents, yet people hated them then as much as they hate agents now. Why? Well, first there was the cost, both in terms of licensing and in continuing to maintain them (upgrades, etc.). Next there was the fact that you had to add software to already overloaded systems. I can remember the first time the company I worked for back then deployed a PATROL agent on an Oracle database. When it was started up it took the database down as it slammed the system with requests. Which leads me to the final point, outside of security issues that arise with an increase in the number of applications running on a system, the moment the system experiences a problem the blame will fall on the agent.

Despite that, agents still seem to proliferate. In part I think it is political. Downloading and installing agents looks like useful work. “Hey, I’m busy monitoring the network with these here agents”. Also in part, it is laziness. I have never met a programmer who liked working on someone else’s code, so why not come up with a proprietary protocol and write agents to implement it?

But what bothers me the most is that it is so unnecessary. The information you need for monitoring, with the possible exception of Windows, is already there. Modern operating systems (again, with the exception of Windows) ship with an SNMP agent, usually based on Net-SNMP. This is a secure, powerful extensible agent that has been tried and tested for many years, and it is maintained directly on server itself. You can use SNMPv3 for secure communications, and the “extend” and “pass” directives to make it easy to customize.

Heck, even Windows ships with an extensible SNMP agent, and you can also access data via WMI and PowerShell.

But what about applications? Don’t you need an agent for that?

Not really. Modern applications tend to have an API, usually based on ReST, that can be queried by a management station for important information. Java applications support JMX, databases support ODBC, and when all that fails you can usually use good ol’ HTTP to query the application directly. And the best part is that the application itself can be written to guard against a monitoring query causing undue load on the system.

At OpenNMS we work with a lot of large customers, and they are loathe to install new software on all of their servers. Plus, many of our customers have devices that can’t support additional agents, such as routers and switches, and IoT devices such as thermostats and door locks. This is the main reason why the OpenNMS monitoring platform is, by design, agentless.

A critic might point out that OpenNMS does have an agent in the remote poller, as well as in the upcoming Minion feature set. True, but those act as “user agents”, giving OpenNMS a view into networks as if it was a user of those networks. The software is not installed on every server but instead it just needs the same access as a user would have. So, it can be installed on an existing system or on a small system purchased for that purpose, at a minimum just one for each network to be monitored.

While some new IT fields may require agents, most successful solutions try to avoid them. Even in newer fields such as IT automation, the best solutions are agentless. They are not necessary, and I strongly suggest that anyone who is asked to install an agent for monitoring question that requirement.

OpenNMS is Sweet Sixteen

It was sixteen years ago today that the first code for OpenNMS was published on Sourceforge. While the project was started in the summer of 1999, no one seems to remember the exact date, so we use March 30th to mark the birthday of the OpenNMS project.

OpenNMS Project Details

While I’ve been closely associated with OpenNMS for a very long time, I didn’t start it. It was started by Steve Giles, Luke Rindfuss and Brian Weaver. They were soon joined by Shane O’Donnell, and while none of them are associated with the project today, they are the reason it exists.

Their company was called Oculan, and I joined them in 2001. They built management appliances marketed as “purple boxes” based on OpenNMS and I was brought on to build a business around just the OpenNMS piece of the solution.

As far as I know, this is the only surviving picture of most of the original team, taken at the OpenNMS 1.0 Release party:

OpenNMS 1.0 Release Team

In 2002 Oculan decided to close source all future work on their product, thus ending their involvement with OpenNMS. I saw the potential, so I talked with Steve Giles and soon left the company to become the OpenNMS project maintainer. When it comes to writing code I am very poorly suited to the job, but my one true talent is getting great people to work with me, and judging by the quality of people involved in OpenNMS, it is almost a superpower.

I worked out of my house and helped maintain the community mainly through the #opennms IRC channel on freenode, and surprisingly the project managed not only to survive, but to grow. When I found out that Steve Giles was leaving Oculan, I applied to be their new CEO, which I’ve been told was the source of a lot of humor among the executives. The man they hired had a track record of snuffing out all potential from a number of startups, but he had the proper credentials that VCs seem to like so he got the job. I have to admit to a bit of schadenfreude when Oculan closed its doors in 2004.

But on a good note, if you look at the two guys in the above picture right next to the cake, Seth Leger and Ben Reed, they still work for OpenNMS today. We’re still here. In fact we have the greatest team I’ve every worked with in my life, and the OpenNMS project has grown tremendously in the last 18 months. This July we’ll have our eleventh (!) annual developers conference, Dev-Jam, which will bring together people dedicated to OpenNMS, both old and new, for a week of hacking and camaraderie.

Our goal is nothing short of making OpenNMS the de facto management platform of choice for everyone, and while we still have a long way to go, we keep getting closer. My heartfelt thanks go out to everyone who made OpenNMS possible, and I look forward to writing many more of these notes in the future.

OpenNMS Horizon 17.1.1 Released

Probably the last Horizon 17 version, 17.1.1, has been released. According to TWiO, the next release will be Horizon 18 at the end of the month, with Horizon 19 following at the end of May.

This release is mainly a maintenance release. It does contain one fix I used (NMS-8199), which allows for the state names in the Jira Trouble Ticketing plugin to be configured. This helps a lot if Jira is not in English.

If you are running Horizon 17, this should help it run a bit smoother.

Bug

  • [NMS-7936] – Chart Servlet Outages model exception
  • [NMS-8010] – Groups config rolled back after deleting a user in web UI
  • [NMS-8034] – Adding com.sun.management.jmxremote.authenticate=true on opennms.conf is ignored by the opennms script
  • [NMS-8048] – org.hibernate.exception.SQLGrammarException with ACLs on V17
  • [NMS-8075] – vacuumd-configuration.xml — Database error executing statement
  • [NMS-8113] – Overview about major releases in the release notes
  • [NMS-8153] – Can't modify the Foreign ID on the Requisitions UI when adding a new node
  • [NMS-8159] – When altering the SNMP Trap NBI config, the externally referenced mapping groups are persisted into the main file.
  • [NMS-8161] – Tooltips are not working on the new Requisitions UI
  • [NMS-8165] – OutageDao ACL support is broken causing web UI failures
  • [NMS-8177] – Install guide should use postgres admin for schema updates
  • [NMS-8199] – Allows state names to be configured in the JIRA Ticketer Plugin

Enhancement

  • [NMS-6404] – Allow send events through ReST
  • [NMS-8148] – Create pull request and contribution template to GitHub project

Task

  • [NMS-8151] – Remove all jersey artifacts from lib classpath