Archive for September, 2009

The Many Uses of Net-SNMP

Wednesday, September 30th, 2009

I am an rabid, drooling Net-SNMP fanboy. It is doubtful that OpenNMS would be able to do as much as it does without it.

While it generates stats like any other SNMP agent, the ability to extend the agent is what makes me love it. With just a small amount of configuration, I can get Net-SNMP to run commands locally and report back the results via SNMP. Thus it is real easy from me to gather data locally on each machine and send that information to OpenNMS, without the need of doing silly things like running a script using ssh. Since Net-SNMP even supports SNMPv3, the data can be secure.

Here’s an illustration.

On one of my servers I host a number of domains for friends and family. For some of those I simply forward all the mail I receive for that domain to, say, a RoadRunner account.

We ran into a problem where some of those addresses became the target of spammers. Since all my server does is relay the mail, Time Warner thought we were a spam server and blocked our IP. This caused the mail queue to get quite large. Had I not happened to look at the logs that day I wouldn’t have noticed.

We fixed the problem by creating IMAP mailboxes for those users on my server, but I wanted a way to be notified if the number of messages in the mail queue was high without having to look at log files.

The first thing I needed to do was find that information. On the server I can run “mailq” and get something at the end like

-- 23 Kbytes in 6 Requests.

which is a great start. I just need to parse out the “6″ which I can do with

mailq | tail -n 1 | awk '{if (NF > 4) {print $5} else {print 0}}'

Note: above command improved by Alex Hoogerhuis – returns zero if there are no queued mails

Now that I have my value, I’ll wrap it in a short script:

# cat /root/bin/mailqstats.sh 
#!/bin/bash
mailq | tail -n 1 | awk '{if (NF > 4) {print $5} else {print 0}}'

At this point I guess I could use ssh to access the box and gather this information, but while that might work with one or two servers, can you imagine trying to do that with a hundred? A thousand? The OpenNMS server would have to fork/exec each one, and the performance would be horrible.

Here comes Net-SNMP to the rescue.

Net-SNMP has a directive called “extend”. All I have to do is add the line

extend mailqstats /root/bin/mailqstats.sh

to /etc/snmp/snmpd.conf and reload snmpd. Now Net-SNMP gives me the ability to query an OID and get the results of running the script:

snmpwalk -v1 -c public mail.example.com .1.3.6.1.4.1.8072.1.3.2

Returns

NET-SNMP-EXTEND-MIB::nsExtendNumEntries.0 = INTEGER: 1
NET-SNMP-EXTEND-MIB::nsExtendCommand."mailqstats" = STRING: /root/bin/mailqstats.sh
NET-SNMP-EXTEND-MIB::nsExtendArgs."mailqstats" = STRING: 
NET-SNMP-EXTEND-MIB::nsExtendInput."mailqstats" = STRING: 
NET-SNMP-EXTEND-MIB::nsExtendCacheTime."mailqstats" = INTEGER: 5
NET-SNMP-EXTEND-MIB::nsExtendExecType."mailqstats" = INTEGER: exec(1)
NET-SNMP-EXTEND-MIB::nsExtendRunType."mailqstats" = INTEGER: run-on-read(1)
NET-SNMP-EXTEND-MIB::nsExtendStorage."mailqstats" = INTEGER: permanent(4)
NET-SNMP-EXTEND-MIB::nsExtendStatus."mailqstats" = INTEGER: active(1)
NET-SNMP-EXTEND-MIB::nsExtendOutput1Line."mailqstats" = STRING: 6
NET-SNMP-EXTEND-MIB::nsExtendOutputFull."mailqstats" = STRING: 6
NET-SNMP-EXTEND-MIB::nsExtendOutNumLines."mailqstats" = INTEGER: 1
NET-SNMP-EXTEND-MIB::nsExtendResult."mailqstats" = INTEGER: 0
NET-SNMP-EXTEND-MIB::nsExtendOutLine."mailqstats".1 = STRING: 6

This tells me a number of things. First, I can have a number of scripts in the extend table. The nsExtendNumEntries value tells me that there is only one on this server.

The next entries are indexed by the name that I used in the extend configuration, in this case “mailqstats”. This will actually be represented by the ASCII equivalent, i.e. for “mailqstats” you get “109.97.105.108.113.115.116.97.116.115″ where 109=m, 97=a, 105=i, etc. (this can be converted using Javascript as well).

There are additional configuration options concerning how long the output of the script will be cached, etc.. By default it is 5 seconds, but you can easily extend it. This is useful if the script you are running is expensive in terms of resources.

To get my value, in this case “6″, I can do a number of things. Net-SNMP has an OID for the first line, the full output, and then a table with each line as a separate OID. I tend to use the nsExtendOutLine value, but it is just a personal choice.

Now that I can get the value of the mailqstats.sh script via SNMP, setting up OpenNMS is simple.

First, I have to discover the service. This is done in capsd-configuration.xml:

<protocol-plugin 
   protocol="Mailq" 
   class-name="org.opennms.netmgt.capsd.plugins.SnmpPlugin" scan="on">
   <property 
      key="vbname" 
      value=".1.3.6.1.4.1.8072.1.3.2.4.1.2.10.109.97.105.108.113.115.116.97.116.115.1" />
   <property key="timeout" value="2000" />
   <property key="retry" value="1" />
</protocol-plugin>

The SnmpPlugin will check for the existence of the OID, and if it exists the “Mailq” service will be added to the interface. It is also possible to use the SnmpPlugin to check the returned value before adding the service (see the “Router” protocol configuration).

Once I have it discovered, I have a couple of choices for monitoring that value.

I could just monitor it like a service. Let’s say I want to know when the mail queue has 300 or more messages in it. I could configure the monitor to check the value every five minutes and mark the service as down if it was greater than 300. I would do this in the poller-configuration.xml file:

<service name="Mailq" interval="300000" user-defined="false" status="on">
   <parameter key="retry" value="1"/>
   <parameter key="timeout" value="3000"/>
   <parameter key="port" value="161"/>
   <parameter key="oid" 
       value=".1.3.6.1.4.1.8072.1.3.2.4.1.2.10.109.97.105.108.113.115.116.97.116.115.1"/>
   <parameter key="operator" value="&lt;"/>
   <parameter key="operand" value="300"/>
</service>

This will get the OID every five minutes (300000ms) and test to make sure it is less than 300. If so, the service is “up”. Note that I had to use an HTML entity, <, to represent the “less than” sign.

If you use this method, remember to add the monitor line to the bottom of that file:

<monitor service="Mailq" 
         class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor"/>

I use this method for monitoring the status of my RAID controllers, since the values I collect there are always the same (assuming the RAID controller has no errors) and it wouldn’t make sense to graph, say, a bunch of values that are all “3″.

But with the mail queue statistics I wanted a graph, so the next step was to add it to datacollection-config.xml and collect it.

<group name="mailq-stats" ifType="ignore">
   <mibObj 
      oid=".1.3.6.1.4.1.8072.1.3.2.4.1.2.10.109.97.105.108.113.115.116.97.116.115"
      instance="1" 
      alias="mailqsize"     
      type="octetstring" />
</group>

This will attempt to collect on the OID, and if successful, store it in a file called mailqsize.jrb.

Note the type of “octetstring”. If you look at the type of this OID from the walk above, you’ll see it is “string”. RRDtool and JRobin can’t store string data, thus it needs to be converted to a number. Setting the type to “octetstring” causes this to happen (it is converted to a gauge). If it was left as a string, OpenNMS would collect it but store it only once in the strings.properties file for the node. It would not be able to graph it.

The next step in the datacollection-config.xml file is to associate this group with the Net-SNMP system definition:

<systemDef name="Net-SNMP (UCD)">
  <sysoidMask>.1.3.6.1.4.1.2021.250.</sysoidMask>
  <collect>
     <includeGroup>mib2-host-resources-system</includeGroup>
     <includeGroup>mib2-host-resources-memory</includeGroup>
     <includeGroup>mib2-host-resources-storage</includeGroup>
     <includeGroup>net-snmp-disk</includeGroup>
     <includeGroup>ucd-loadavg</includeGroup>
     <includeGroup>ucd-memory</includeGroup>
     <includeGroup>ucd-sysstat</includeGroup>
     <includeGroup>ucd-sysstat-raw</includeGroup>
     <includeGroup>ucd-sysstat-raw-more</includeGroup>
     <includeGroup>mailq-stats</includeGroup>
   </collect>
</systemDef>

The last step is to actually create the graph. Edit snmp-graph.properties:

report.netsnmp.mailq.name=Current Mail Queue Size
report.netsnmp.mailq.columns=mailqsize
report.netsnmp.mailq.type=nodeSnmp
report.netsnmp.mailq.command=--title="Currently Queued Messages" \
 DEF:queue={rrd1}:mailqsize:AVERAGE \
 LINE2:queue#0000A0:"Size    " \
 GPRINT:queue:AVERAGE:"Avg  \\: %8.2lf " \
 GPRINT:queue:MIN:"Min  \\: %8.2lf " \
 GPRINT:queue:MAX:"Max  \\: %8.2lf \\n" 

and add the “netsnmp.mailq” report to the “reports=” line at the top of the file.

Restart OpenNMS and now I should be able to see the size of my mail queue.



I can also set up a threshold event which can trigger a notice that will let me know the queue is high.

The extend feature of Net-SNMP is pure awesomeness, and there are a huge number of uses for it. I hope this example will prove useful. If you are using this with OpenNMS, add a comment to this post and tell me how you use it.

The Business of Open Source is Not Software

Wednesday, September 30th, 2009

I’ve been staying out of the free vs. open source wars running around my little corner of the world of late. There is a lot of talk about whether or not open source has “won”. Open source is free software, so it seems silly to try to differentiate the two. The only way to do that is to focus on the people who care about the difference, and that just results in ad hominem attacks.

For years now I’ve been struggling to educate the market on the fact that the business around open source software is not about software. It’s about solutions. The clients I talk to are ultimately not concerned with what software to buy but instead want solutions to a variety of problems facing their business. Unfortunately, many of them only know the process of purchasing software, and they are unable to adapt to a solutions-based purchase.

Think about it. How does, say, the choice of a management solution usually play out?

First, a list of requirements are drawn up. Then, either through a VAR or just by searching the Internet, a list of possible software solutions is drawn up. The next step is to get demo versions of the software or perhaps talk to the vendor and get them to do a proof-of-concept. Finally, a choice is made and a check is written for software licenses.

In the “demo” step the vendor is usually asked to expend some resources on the sale. Those costs are recovered when the customer purchases a software license. Not only will they pay for the software, due to lock-in they will most likely buy maintenance for years to come. It is a nice revenue stream that makes a gamble on free demos worth it.

This doesn’t work very well for open source. Real open source software doesn’t have a licensing cost, so one can’t make up revenue there. Real open source software can’t prevent access to the latest and greatest code, so there is no requirement to purchase maintenance. Since the client isn’t required to purchase anything, that makes the “demo” phase of a sale a lot more risky.

At OpenNMS we are happy to do demos during the pre-sales process, but we have to draw the line when it comes to a large amount of pre-sales consulting. There is a product we offer called the “Getting to Know You” project in which a consultant will come and spend two days demonstrating what OpenNMS can do on their network, allowing the client to kick the tires and ask questions, and we charge for it. That way, regardless of the choice made by the client, our costs are covered. This is important, since our business model is “spend less than you earn”.

The reason I am writing about this now is that over the last two days I have had to deal with a potential client who is asking for a large amount of work above and beyond what we do with a normal sale. We have been trying to meet their needs for several weeks, but they wouldn’t come to training and when I pressed for a Getting to Know You project I was told no. Since the product has not been “approved” they don’t want to spend any money on it, even if by spending a little money they could save a ton in the future.

This reminds me of one of my favorite YouTube videos, where a woman goes to the hairdresser for a new hair style but doesn’t want to pay for it until after all of the work is done and then only if she likes it.

I run into potential clients like this from time to time, and what I’ve found is that it is better to cut and run instead of spending the time to try and win the business. Someone who isn’t willing to pay for your time most likely won’t understand the value you provide, and in these cases they are better off buying something traditional like Solarwinds than investing in OpenNMS.

I sometimes get asked “how do you make money selling free software?” and I have to answer that I have no clue. I don’t sell software, I sell solutions. The prevalence of Software as a Service (SaaS) businesses are making this easier, since people are being introduced to the mindset of getting a solution without having to purchase software, but the biggest challenge to my business is getting people to understand the value free and open software provides in creating a great solution without the “purchase software” mentality.

Luckily, there are enough people out there who “get it” that our business is doing very well this year. Their companies now have a competitive advantage, which, over time, will be demonstrated. Only when these advantages are demonstrated in the market place can open source be said to have “won”.

Have Fun Stormin’ the Castle

Monday, September 28th, 2009

I smell like smoke, so this must be Texas.

I’m in San Antonio this week doing some work for Rackspace Managed Hosting. They were one of my original OpenNMS clients from back in 2002 and they are still going strong, having grown from a small company in the Broadway Bank building downtown to a publicly traded hosting empire.



Today was my first trip to The Castle. Rackspace is continually outgrowing their space and so they decided to end the problem once and for all by buying a nearly one million square foot shopping mall. They are slowly converting it into high tech office space and moving all of their San Antonio staff into the building. The person I’m working with spends half his time there and half his time at the Datapoint location, so today we met at the Castle.



The Castle has another benefit: it is close to a Rudy’s. I’ve been a big fan of Rudy’s BBQ since my first trip to San Antonio, and while the only thing that could have made lunch better would have been a couple of bottles of Shiner, I did manage to pig out on brisket, turkey, ribs and sausage. Oh, don’t forget the creamed corn which probably has more calories per serving than a Snickers bar. Gotta have one’s veggies.

I’m pretty impressed with the new digs. It is much lighter and more open than Datapoint, and the morale seems ever so slightly higher. Plus, there is still some whimsy left over from the old days, like when they gave out a straight jacket for the Employee of the Month.



Vonnegut, The Wire, and Narrative

Sunday, September 27th, 2009

This is one of these introspective Sunday posts that have little or no OpenNMS content. As usual, feel free to skip.

Many years ago I was lucky enough to see Kurt Vonnegut give a talk on narrative. He went to a whiteboard and drew some axes. On one was time and the other ranged from “bad” to “good”.

First, he examined the story of Cinderella. The story starts with her father remarrying due to her mother’s death (bad). She has to deal with her evil step-mother and step-sisters (worse). Then she meets her fairy godmother (better). She goes to the ball (good). She dances with the Prince (great). Then the spell breaks and she has to flee (bad), etc.

The line he drew was a curvy swing from bad to good and back, eventually ending with good. Quite a few of our popular stories, movies and shows have a curve very similar.

Then he talked about Native American narrative. This was much more along the lines of walking through the woods, seeing a stream, spotting a deer, etc. The line he drew was almost completely straight, and pretty much neutral between good and bad.

With these two graphs on the board, he then examined one of the most classic stories of all time: Hamlet.

Hamlet starts out with the death of Hamlet’s father and his uncle’s ascension as king. (bad) Then his father’s ghost tells him that he was murdered by his brother (bad). To see if the ghost is right, Hamlet stages a play about murder and watches his uncle’s reaction. When his uncle leaves the room he believes the ghost is right (bad). Then he accidentally kills his girlfriend’s father (bad), she kills herself (bad) and everyone dies at the end (bad).

It was a straight line, very much like the Native American one, only, well, bad.

I don’t remember anything more about the talk, but it struck me that the serious stories, the real stories, sort of plod along without these great swings.

Which brings me to The Wire. The Wire was an HBO television series set in Baltimore, Maryland (USA). It consists of 60 one-hour episodes over five seasons, and has been called the greatest television show of all time.

I started watching it as a way to pass the time on airplanes. I finished the show today on a flight from Oregon to Texas, and while I will try my best not to spoil anything with my thoughts on the it, if you are strict about such things stop reading now.

Each season focuses on a different aspect of life in the city. You know something is up when the second season departs so radically from the first it may leave you wondering if you missed something, or if you are really watching the same show. As I watched it I always thought it was good, but just like each season could stand on its own, each show could almost stand on its own. There weren’t any great cliffhangers, although there were many times when things happened that shocked the hell out of you. Good guys died; bad guys died. Good guys lived; bad guys lived. The powerful were brought low by the weak, and the powerful were made even more powerful by abusing the weak.

Up until today I would have said that The Wire was good, but not necessarily the greatest television show ever, but the final episode changed that. It wasn’t even that the last episode “revealed all” – for the most part it played out like any other – but the final five minutes consisted of a montage that literally left me shaken. It consisted of short, 10-30 second scenes of various characters in the near future, and it is only then that you get a real idea of the scope of the show and how well it was written and how well everything fit together. It was truly amazing.

And if you were to plot it out on a whiteboard, it would be a straight line.

I keep that image of Vonnegut up at the whiteboard at Pomona College in my mind. It reminds me that the greatest and most powerful things in life don’t come in big swings, but mainly in just moving forward.

Free Software and Baseball Analogies

Friday, September 25th, 2009

We have been crazy busy over the last few months and since fourth quarter is historically our busiest time, I don’t expect it to get any less hectic any time soon. I expect blogging to be very sporadic and out of chronological order (as I’ll get to things as I think about them) so Faulkner fans rejoice.

On second thought, true Faulkner fans would run away screaming and have nightmares at the mere thought of comparing my writing abilities with that of the venerable author, so scratch that.

[Re-reading this, I'm not happy with it, so stop reading and go watch Auto-tune the News. I'm going to post it anyway since it will help me get back in the groove.]

We had training at OpenNMS headquarters this week. This will be the last training of the year, as the next one would need to be scheduled in November or December and rarely do people like to travel during the holidays (although I always end up in Chicago for some reason). Look for training to return in January.

It was great. We had six incredibly smart, amazingly handsome people in for the week. Two of them came all the way from Chile, and one guy rode his Ducati from Pittsburgh. It was fun, and the guys from Chile had replaced Netcool with OpenNMS, so that was even better.

In other news, I’m in Portland (Oregon) for the weekend working with a client, and while I was traveling yesterday I came across a post by Terry Hancock called “Is free software major league or minor?

It’s worth reading.

Open source and free software detractors often try to paint the community as a bunch of zealots who hold ideals over practicality with no room for compromise, and while it is true that there are some notable examples of such behavior, the vast majority of free software users prefer open software over proprietary programs but use a combination of both.

Hancock points out that on one hand we in the community state that open source is just as good as expensive “major league” software, but when we are called on the carpet for a lack of documentation or usability, etc., we cry “but we’re just volunteers” and take a “minor league” stance. Which is it?

Obviously, I liked this article. Hancock had me at:

“Free Software” and “Open Source Software” is the exact same artifact, no matter who is promoting it, nor on what advantages of it they promote.

which is something I’ve been saying for some time now.

I can’t improve on his post but I would like to add a couple of thoughts.

The first is that open source software often gets rid of the cruft associated with proprietary software. In terms of the major league analogy, does one need a huge stadium with luxury skyboxes, designer uniforms and a private team jet to play great baseball? One of the reasons that open source software tends to be more stripped down than commercial software is that we tend to focus on what is important from a functional standpoint versus what looks good. Since open source software is not “sold”, there is no need to make it all bright and shiny.

But at what point in time does this focus on the basics start to impact usability in a meaningful way? To return to the baseball analogy, you can strip away the stadium but you can’t, say, remove the pitcher’s mound.

When it comes to usability, Apple products are hard to beat. They are also experts at controlling the user experience.

They spend lots of money on the little details. I can’t find the post now, but apparently the message displayed while emptying the trash changed from Leopard to Snow Leopard. It was a small change, along the lines of “This action will delete your Trash items permanently” to “This action will permanently delete your Trash items” but it serves as an example of the level of detail they track.

Since Apple customers pay a premium for their products, there is an expectation for this attention to detail. But the “user experience” doesn’t necessarily mean “usability”. While I’m impressed with the changed text in the example above, it doesn’t do a thing to improve usability (there are a number of other features in OS X that do, however).

With OpenNMS, we really need to focus a more on usability. While large changes in the webapp are not coming in the next few months, one of the things that is holding up the release of 1.8 is that it won’t be released without greatly improved documentation. The second most frequent comment we get in our training classes is “I didn’t know OpenNMS could do that” (with the most frequent being “This class is great!”).

The second thought I had comes back to this concept of freedom in free software. Open source is free software (and those that tell you differently are selling something – probably software). However, as Hancock illustrates, people who focus on “open” tend to be more concerned with how open source software can be used to solve problems than those who focus on “free” – who tend to be more concerned with freedom as an ethical issue.

I am more in the “open” camp, but like others I do get concerned about freedom when it extends beyond the realm of code. I don’t care that I can’t have access to the code that runs my microwave oven, but I get a little more concerned when it comes to the code that runs my car. Not because I plan to hack that code, but as cars become more and more dependent on their computer systems it could become impossible to work on the car without access to the software. Plus, this software could be collecting data that I might want kept private. I think it is important to focus on freedom in software since control of software is becoming synonymous with the control of information.

Last weekend at the Atlanta Linux Fest, Jeff’s wife teased me about my old LG phone. I got it over 3 years ago with Sprint. It’s not a bad phone but it is a little dated. The problem is that I can’t decide what phone to get next.

I have an iPod Touch and so the iPhone is a contender. The augmented reality stuff is really cool, and it was only possible because Apple created a phone with a solid SDK, a video camera and a compass. But working on the pre-alpha OpenNMS iPhone showed me what a royal pain it is to develop an open source application on that platform.

Chris Dibona was kind enough to send me a G1 handset. Even though it is open, the only way I could find to sync my contacts was through my GMail account. Now I like Google as a company, but I don’t want anyone to have access to my contact list. Even Apple, as far as I know, lets me keep that information private. So I gave the phone to Ben so he can make an Android OpenNMS app. I’m still waiting on some of the newer Android phones to come out as possible contenders.

The question I’ve been asking myself is: how open and free does my phone have to be? Is it like a microwave – something that can easily be replaced or has substitutes, or is it something I must have to control my information. I’m not sure.

Perhaps the whole idea of using a game analogy to describe what’s going on is a FAIL. People keep talking about open source “winning”. Lots of people find open source software useful and it is a viable alternative to proprietary software in many cases, so it has “won”. I also think it has “won” by delaying, if not preventing, those who would control all of our information. Anyone with hardware and an internet connection can get a web server running and to send mail. But has it replaced all proprietary software? No.

My guess is that we need a better goal than “winning”. For example, while I want OpenNMS to replace OpenView and Tivoli everywhere, my first goal for OpenNMS is to make it easier for first time users to get it installed and know what it can do. When we have achieved that goal I want to make the configuration files easier to modify. Each step is an improvement, each step brings us closer to the “major league” and in such a fashion that we can deal with the pragmatic need to pay for it all.

Eventually, if we do it right, people will see that they should choose OpenNMS, not because it is cheaper or that it is open source, but because it is, quite simply, better.

As Sun Tzu said:

For to win one hundred victories in one hundred battles is not the acme of skill. To subdue the enemy without fighting is the acme of skill.