Announcing the OpenVND Project

OpenNMS has many uses, from insuring that customers of a billion dollar pizza business get their food on time to maintaining the machines that guard nuclear fuel, but we all know what we really need.

A way to manage our soda machines.

Nothing says “ugly” like a bunch of geeks, and nothing is uglier than when those same geeks are deprived of caffeine.

Thus today, the OpenNMS Project is happy to announce the Open VeNDing Project (OpenVND), leveraging the power of OpenNMS to address this need for the greater good.

Visit www.openvnd.org today for the full details.

Does Monitoring Really Suck?

I’ve been seeing the phrase “monitoring sucks” lately. Recently, Kris Buytaert organized a “monitoring sucks” hackathon after FOSDEM, and in a similar vein Cliff Moon, the CTO of Boundary (a monitoring service provider), also posted a “Why monitoring sucks – for now” article.

Working with OpenNMS as I have for the last decade, I really can’t share the sentiment that things suck. Having spent the decade before that as a consultant working with products like HP’s OpenView, Micromuse NetCool, Concord Network Health and BMC’s PATROL, we set out with OpenNMS to build the best tool for consultants like me – something that combines the functions of all of these products under one umbrella, with the ability to quickly and easily expand that functionality as needed. That’s why you’ll hear me refer to OpenNMS as a network management application platform instead of just an application.

OpenNMS has been addressing a lot of the concerns raised in Mr. Moon’s article for years now. Unlike point products that focus on data collection or service monitoring or trending, OpenNMS does all of them in one package. It also includes functions, such as inventory, that aren’t usually addressed in a monitoring solution. With easy, API-level integration with trouble ticketing systems (Request Tracker, OTRS, Jira, etc.) and configuration tools like RANCID, OpenNMS can be easily expanded as a given network environment grows.

We realized a long time ago that traditional alerting mechanisms were broken, so in addition to such staples as “high” and “low” thresholding, we added “relative” and “absolute” options as well to better detect anomalies. The built in alarms subsystem allows for complex automations to be created, and the event translator does a great job of enriching basic events with information such as customer impact. Finally, with 1.10 we’ve resurrected and improved the OpenNMS integration with Drools, where extremely complex analysis can be built into the system to streamline alerting. This is a key feature that led Juniper to license OpenNMS as part of their JunOS Space management product.

But I have to ask myself, if OpenNMS is so cool at solving management problems, why do people still think things suck? I can think of two reasons, although I’m sure that there are many more.

The first is that OpenNMS is written in Java, and a lot of those in the “devops” world either have no Java experience or they are prejudiced against it. The second is that OpenNMS is a seriously complex platform, and unlike some of the point products mentioned it really does take an investment of time to get the most out of it.

I can’t do much about the former issue, and history seems to have demonstrated that if people are prejudiced enough against a better solution they will eventually get left behind. I’m not saying that Java is great or even that Java is better than other options, but in many cases OpenNMS is better than the options and if Java is what’s keeping you away from it, then that’s a shame.

But the second issue I can address, and we hope to do so this year in a number of ways. The best way to help people climb the learning curve with OpenNMS is in education, and we even delayed the release of OpenNMS 1.10 in order to get the documentation to a much higher level than it has been in the past. Also this year we are having a couple of users conferences focusing on addressing real world and real time solutions, as well as increasing the number of our training courses. Finally, I hope to put together some videos to jumpstart those interested in coming up to speed with the platform.

So if you think monitoring sucks, please check out OpenNMS. Perhaps we can change your mind.

A Little Microsoft and VMWare Rant

I’m out at a customer site this week, and while the customer is awesome, a couple of things have made me very frustrated.

The first concerns Windows Management Instrumentation (WMI). OpenNMS now supports native WMI (thanks mainly to Matt Raykowski) and this is the first time I got to play with it. Works like a charm and how you would expect with OpenNMS – simply. I edited wmi-config.xml, put in a valid username and password, edited capsd-configuration.xml to discover WMI, and turned it on in collectd-configuration.xml. Restart, and now I’m collecting a ton of WMI stats out of the box.

So far, so good.

One of their concerns is monitoring Exchange 2007. So I think, great, I’ll just configure some WMI classes and objects dealing with Exchange, make some graphs, and we’re done.

Not so fast.

First, there doesn’t seem to be a good place to get a list of all the available WMI classes easily. I did find some rather thick Technet docs, but for the most part it is a lot of digging. It would be nice if there was a MIB-like document that described them.

Second, it turns out that Exchange 2007 doesn’t support WMI. You have to use Powershell “cmdlets” and script it from there.

What?

Okay, so Microsoft decides that SNMP isn’t good enough to use for exchanging data between a manager and an agent, so they invent their own management protocol called WMI, and a few years later decide it isn’t worth supporting.

(sigh)

My second source of frustration deals with VMWare. The client currently uses ESX, so I’m like – hey, just go in, enable the Net-SNMP agent, enable the “dlmod” for the ESX MIB and we’re set.

That is all well and good, but they are migrating everything to ESXi which, wait for it, doesn’t support SNMP. Well, at least GETs.

From the VMWare documentation (PDF), you first get:

… hardware monitoring through SNMP continues to be supported by ESXi, and any third-party management application that supports SNMP can be used to monitor it. For example, Dell OpenManage IT Assistant (version 8.1 or later) has ESXi MIBs pre-compiled and integrated, allowing basic inventory of the server and making it possible to monitor hardware alerts such as a failed power supply. SNMP also lets you monitor aspects of the state of the VMkernel, such as resource usage, as well as the state of virtual machines.

Okay, good, but the next paragraph reads

ESXi ships with an SNMP management agent different from the one that runs in the service console of ESX 3. Currently, the ESXi SNMP agent supports only SNMP traps, not gets.

Again, what?

I mean, okay, traps are great, but how am I supposed to monitor “resource usage” if I can’t do a GET?

In both cases there does exist a non-standard, proprietary API that can be used to mine this data, and if the demand is high enough we’ll definitely put it into OpenNMS. Thank goodness the architecture is abstracted so that it is easy to add such plugins without having to re-write everything.

But, c’mon people, we have standards for a reason. Can’t we all just get along?

OpenNMS in the Cloud

One of the things I hate is the buzzword du jour, be it virtualization, “devops” or “the cloud“. It’s not that there isn’t some nugget of truth in all of the press surrounding such things, but one of the reasons I got into open source in the first place was its focus on results and not fluff.

With a commercial software product it is very difficult to determine if it is the right solution to a particular problem without buying it. With open source software, there is no licensing cost and thus it is possible to easily try it out before making a commitment to use it. Thus the focus is on usefulness and not a flyer saying “we’re the best”.

This isn’t to say that the open source world is completely free of fluff and posturing. With the prevalence of venture-backed open core companies, their ultimate goal is not the proliferation of robust open source code but to be purchased for a large multiplier. The best way for them to create perceived value is to latch on to the latest buzzword, as if to say “hey – you need a piece of this – better hurry up and buy us,” and it is a strategy that has worked well in a number of cases. I just don’t like calling it open source.

So I have been pretty quiet on the use of OpenNMS in “the cloud”. This isn’t to say that we don’t manage cloud resources, but the management challenges of cloud-based services aren’t much different than “normal” ones. The power and flexibility of OpenNMS make it as useful in the cloud as elsewhere.

In fact, one of the major players in cloud computing, Rackspace, uses OpenNMS to manage its Cloud Files system.

We are happy to announce that we are working with another major company BT (British Telecom Group) in developing a trusted cloud management platform called the Cloud Service Broker. In the words of John Gillam, Programme Director, BT Global Services:

The Cloud Service Broker TM Forum Catalyst provides an excellent opportunity to address the barriers to cloud adoption for enterprise customers. Whilst enterprises wish to lever value from the cloud, they are apprehensive over losing control, citing areas of concern such as IT Governance, application performance, runaway costs, inadequate security and technology lock-in. The CSB addresses this by matching cloud services to each enterprise’s needs, enforcing the right policies, and then showing how this can be backed up by an ongoing service level agreement. We believe developments of this nature will be of primary importance in future cloud services.

We will be presenting our work at the TMForum’s Management World conference in Nice, France, this May. In addition to BT’s offering, we will be demonstrating integration with products from Comptel, Square Hoop and Infonova in order to deliver a complete cloud services platform.

Interop 2009

In my commercial software days I used to go to the Interop show in Las Vegas, back when it was held at the main convention center. It was a huge show and pretty much the premiere event for networking gear. I think the last time I went was 2000.

I had the opportunity to return this year. The show has changed, it is now in the Mandalay Bay Convention Center and it is smaller than I remember. The NOC staff, however, is still pretty much the same.

As you can imagine, running a NOC at a show like this is no minor undertaking, but believe it or not the entire NOC is staffed by volunteers. Getting through an ordeal like an Interop show seems to bring people together, as many volunteers have been coming for years (I met one guy who had been coming here since 1996). The only downside was that this Interop marked the first since the passing of Jim “Haggis” Brown, a longtime NOC member. They had a place set out for him, along with a bottle of scotch.

Speaking of bringing people together, this trip has been pretty serendipitous. For example, my plane from RDU to DFW had mechanical problems, so they routed me through Miami. As I was leaving the Admirals Club to walk to my gate, I ended up sharing an elevator with Chris McGugan. Chris is something of a superstar in networking circles. He was at Cisco for many years (based out of North Carolina), and now he is working at Avaya out in California. We used to share a townhouse about 20 years ago, and it had been about that long since I’d seen him. The odds of us running into each other the way we did were pretty long.

Even stranger, Chris used to work in the NOC at Interop, and he knew many of the people I had come to meet.

Another example of serendipity: on our first day at the show, Jeff and I were at a table utilizing some wireless bandwidth when John Willis walked by. He didn’t know we were going to be there, so it was nice to see he had decided to wear his OpenNMS shirt anyway.


Jeff Gehlbach, High Mobley and John Willis

Things have changed a bit in Las Vegas since I was last here. There is no smoking near food (which pretty much leaves the casinos) and coins no longer work in the slot machines. Payouts are given on little slips of paper, and the machines will only accept bills or those little slips. I really miss the sound of the coins clanking around, and it makes the casinos seem quieter.

According to the cab driver, 40% of the usual conventions have cancelled this year, so the area is surviving on tourism. We stayed at the Luxor for $69 a night, and although it was a tower room, it was a deal.

The Luxor is my favorite hotel on the strip. It is not the nicest or the most luxurious, but think about it – it had to have been built by a geek. If I was given a boatload of money and told to build something impressive in the desert, it would be a pyramid. Plus at night its blackness contrasts well with the brightness of the other hotels, even with the sides having been given over to advertising.

However, one of the Luxor’s main acts is Carrot Top, and the dude is just scary looking. His face is everywhere you go in the hotel, even on the keys and the “do not disturb” signs, and it gets creepy after awhile.

Back to Interop: the show had most of the people you would expect. We stopped by the HP booth to look at the latest OpenView. HP must be doing well, because they had some seriously thick padding under the booth carpet, which was awesome (if you have ever worked a show on a concrete floor for a couple of days, you know what I am talking about). I decided to talk a little smack to their folks in the booth. I thanked them for raising their prices so drastically since it helped us out, which caused them to asked about OpenNMS. When I told them it was an open source network management platform, the reply was “yes, but OpenView is for the enterprise.”.

I took that as my cue to bring up that we have customers monitoring over 55,000 devices with OpenNMS (them: “with a single instance?”, me: “yup”) and that we were replacing OpenView at a client in Italy because their devices, which have more than 32,000 interfaces each, break OpenView but work with us. Things got quiet and a little awkward after that, so we left (but the lady kept my card).

Microsoft was a no-show (or at least I didn’t see their booth), but I did get introduced to a company called Xirrus. Xirrus builds wireless arrays that have a high level of built in switching, and their marketing pitch was a face-off between their wireless “switches” and wired ones. They had a boxing ring in the middle of the booth and several times a day held actual bouts. When it wasn’t being used by humans, one corner held your traditional network switch (with lots cables of course), and the other corner held a Xirrus array.

The arrays looked like big roombas with RJ-45 connections, and they had really cool lights (Jeff took a video).

All in all it was a fun time, mainly because we got hang “backstage” with people who really seemed to both love networking as well as knowing a lot about it. What did surprise me were the number of people that were using OpenNMS. When we’d get introduced we were often met with “Oh, we use OpenNMS. It’s great.”

It’s nice to hear. While we have things like the Order of the Blue Polo and the Wall of Cards, we rarely hear from people who use the tool outside of our clients. And while we love our clients, usually when we hear from them it is to ask a question or report a problem. We work hard to make OpenNMS great while remaining 100% open source so it definitely motivates us to meet people who find it useful.

It was a little sad when the show ended and the equipment started coming down. Perhaps we can return next year.

Twitter Outage

There is currently a Twitter outage going on:

However, Jeff thought it would be cool to monitor Twitter, so we all got notified.

Cool, huh? And we’ll know pretty soon after it comes back up.

NOTE: It actually came back up as I was typing this and I got the RESOLVED message. So much fun with network management.

Why People Need Support

I like to think that the people who use our services get value for their money, but I sure many more ask the question “why do I even need support?”

At OpenNMS, we don’t sell software (all our software is free). I like to say we sell time. At the moment, anyone who has found out about OpenNMS, installed it and decided to use it obviously possesses well above average intelligence, impeccable taste and is most likely devilishly attractive. They are capable of figuring out issues without a support contract, either by experimentation, using the free resources such as the mailing lists, or both. But do they have the time?

Normally, most of the trouble tickets we get concern configuration, a few involve actual bugs with OpenNMS itself, and more than you would think are the result of vendors not honoring standards. We spend a lot of time figuring out issues with things like poorly written SNMP agents and even operating system problems.

And then there are the bad MIBs.

Recently I got an e-mail from a person who uses the Anevia Flamingo product. They wanted some help using mib2opennms to convert Flamingo SNMP traps into a format they could use.

Usually I have to politely decline helping people who contact me privately about OpenNMS issues. It wouldn’t be fair to our paying clients if I spent time helping people one-on-one for free, so I point them to free resources like the mailing lists. When I have time I try to help out there, as that gets archived publicly and might help others. The catch is that you may or may not get a timely answer to your question on the list, whereas you can always pester us about support tickets.

But this question involved mib2opennms. I’ve been using that tool for six years and my mib2opennms-fu is strong, so I took the Anevia MIB I was sent, cranked it through the tool and sent back the output.

I received a reply that it wasn’t working and the user was still getting unformatted trap errors like:

Received unformatted enterprise event (enterprise:.
1.3.6.1.4.1.20967.1.12.1.30 generic:6 specific:2). 3 args: .
1.3.6.1.4.1.20967.1.12.1.30="" .1.3.6.1.4.1.20967.1.12.1.30.1="1" .
1.3.6.1.4.1.20967.1.12.1.30.2="10.180.1.232"

I went into the file I had created and noticed that the enterprise id was missing the last “.30″, which is why it wasn’t matching, so it was off to look at the MIB.

It started off normally enough, with some object definitions:

anevia OBJECT IDENTIFIER ::= { enterprises 20967 }
anevia1 OBJECT IDENTIFIER ::= { anevia 1 }
tsnmp OBJECT IDENTIFIER ::= { anevia1 1 }
manager OBJECT IDENTIFIER ::= { anevia1 12 }
aneviaManager1 OBJECT IDENTIFIER ::= { manager 1 }
aneviaManagerTraps1 OBJECT IDENTIFIER ::= { aneviaManager1 30 }

and then later in the MIB came the trap:

inputDownTrap TRAP-TYPE
  ENTERPRISE aneviaManager1
  VARIABLES { streamerInputIndex, streamerAddress }
  DESCRIPTION
    "This trap is sent when an input on a streamer becomes unavailable,
     and can no longer provide any useful data, the provided index is the
     index of this input."
  ::= 2

At least the mystery of the missing “.30″ was solved. The “ENTERPRISE” value for this trap should be “aneviaManagerTraps1″ instead of “aneviaManager1″. Easy enough to fix. But then I noticed that instead of the two varbinds listed in the MIB, the agent was sending three (see above) where the first one was blank (as well as being just the enterprise OID).

Grrrr.

The second varbind value of “1” could easily be the streamerInputIndex and “10.180.1.232” could be the streamerAddress but these won’t be correctly reflected in the events file since they’re off by one due to the mystery blank initial varbind.

This is the case of a poorly written MIB and a poorly implemented agent, and there is little we can do about it but work around it in configuration. I asked the user to make sure we had the latest Anevia MIB and was told we did. I wrote Anevia support but since I don’t have a relationship with them I never got a reply.

This happens way more than you might imagine, and we’ve gained a lot of experience in diagnosing and either correcting or working around such issues. Because we’ve seen stuff like this before, we can do this quickly, which is why I like to say I sell time. It only takes a few issues like this to have a support subscription pay for itself.

[Note: This post isn’t meant to be a pitch for services but a rant about the time I wasted playing with the Anevia MIB, but if it helps sell a support contract, that’s cool too (grin)]

Europe 2008: Nice

Nice is nice.

Okay, got that out of my system.

The trip to Nice was uneventful. Our hotel is very comfortable considering the price, and it’s in a great location.

David and I are in Nice for the TeleManagement Forum conference. This is the premier worldwide telecom event, and we are slowly introducing the concept of free and open software to this market. Craig Gallen (OGP) got us involved a couple of years ago, but this is the first time we’ve been able to attend the conference.

One of the dominating management concepts of the TMForum is Next Generation Operational Systems and Software (NGOSS). This defines a large number of interfaces for various management functions to interact. Through Craig’s work OpenNMS includes support for the “quality of service” (QoS) interface, and we’ve completed a proof of concept implementation using it.

It’s also cool to be in a place where I am the customer vs. the vendor.

If any of the three people who read this blog are also here, please drop me a note so we can meet up. I’m here until Thursday afternoon when we head to Paris.

OpenNMS Takes Bronze

OpenNMS Takes Bronze

Last year OpenNMS was awarded the Gold medal in TechTarget’s 2007 Product Leadership Awards in the “Network and IT management platforms” category. We beat out HP and IBM for the honor, so we were pretty excited.

This year they didn’t have that category, but OpenNMS did take the Bronze in “Applications and network management“. Not nearly as exciting as winning Gold against two of the big four but we’re still honored to be a winner, and one of only two open source projects to win an award (the other being the most excellent Asterisk, which won the Silver in the “IP telephony systems” category).

All of the other winners have much deeper pockets than our project, yet this demonstrates that a strong community can still create an application that can play with the big guys. It comes from driving the value up the chain from the user perspective, and not top down. Many thanks to everyone who supported us in this year’s survey.

The Year of Integration

John Willis mentioned (in a nice way) that I tend to ramble on a bit in my posts and he sometimes misses “the point” so I thought I’d put it right up front. Today Hyperic and OpenNMS are officially announcing an integration between our products. This is exciting, as I am a fan of the Hyperic agent, and this was also a truly cooperative effort on the part of both groups.

But as I am a ramblin’ man …

Back in the early to mid 90s I started to get involved with HP’s OpenView product suite. The main reason was due to an independent users organization called the OpenView Forum (OVForum). The “open” in OpenView didn’t mean “open source” of course, but it did mean that HP focused on having a lot of APIs to make it easy to integrate with their products. For example, the dominant language of administrators at the time was Perl, and these API’s allowed Chip Sutton to create a whole set of Perl libraries to interface with OpenView, totally separate of any official involvement with HP.

This created a whole community around the product and encouraged people to create new functionality, which caused OpenView to become, and some would say remain, the main enterprise management framework.

Unfortunately they couldn’t leave a good thing alone, and HP started to use this community as a sales channel. We were already the best sales channel they had. The Compaq acquisition was the beginning of the end for OpenView as there were several years where it was obvious the OpenView product group was being ignored, and people went on to look for alternatives.

My plan for OpenNMS has always been to replace the OpenView of the 90s with an open source framework that would go beyond what OpenView was able to provide with their proprietary software model. I saw how well it thrived under the OVForum and figured an open source community could be even stronger and more vibrant.

To this end OpenNMS has tried to cover a broad spectrum of management tasks while providing a large number of integration points into the application for other vendors to leverage.

One of the things that differentiates the business model of OpenNMS from a proprietary software company is that quite often the features we choose to develop are driven directly by the end users of the product. We don’t sit around a big conference table wondering what to develop, customers come and tell us. About 40% of the revenue of The OpenNMS Group comes from custom development and 100% of that is put back into the community.

We have done a number of integration projects in the past. The first one that comes to mind was a way to send notifications to Request Tracker (RT). Rather than build RT-specific code, a more generic HTTP notifier was created which allows OpenNMS to interact with any web-based tracking system.

Then a client wanted something more robust, so we developed a Trouble Ticket API class that can allow for integration solely using configuration files. The work was originally done to communicate with CentricCRM (now Concursive), but this was soon extended to work with Jira.

Late last year one of our largest clients approached us about an integration with Hyperic HQ. When Hyperic announced that they were going to open source their code, I wasn’t that thrilled about it. Luckily the people at Hyperic are really cool and we started meeting at various trade shows. I learned that our two products don’t really compete but are very complimentary. The focus of Hyperic has always been to provide a high level of systems and application management. Through the use of their agent, not only can they detect problems but they can perform actions to correct them. OpenNMS, being agent-less, is more passive, although we can aggregate information from a wider range of sources. It’s a really good match.

This required a lot of research into the best way to perform the integration, as well as having Hyperic integrate some code into their product to make things smoother. This they were eager to do and it was fun working with them.

For the full details, check out the white papers section of the wiki, or directly access the PDF. There will also be a webinar on March 11th hosted by Hyperic.

On our IRC channel (#opennms on freenode.net) we often have fun with the topic. In 2006 I declared “2006 – The Year of OpenNMS” and we won an award. In 2007 it was “2007 – The Year of Four Releases” and we managed six. This year we have been hunting around for a new “Year of”.

Perhaps it should be “2008 – The Year of Integration”.