2019 Dev-Jam – Day 3

Not much to add on Day 3 of Dev-Jam. By now the group has settled into a routine and there’s lots of hacking on OpenNMS.

As part of my role as “cruise director” Mike and I ran out for more snacks.

2019 Dev-Jam: Table with Snacks and Ulf

On the way we stopped by the Science Museum of Minnesota to pick up a hoodie for Dustin. As fans of Stranger Things we thought we should get our Dustin the same hoodie worn by Dustin in the show. The one in the show was apparently an actual hoodie sold by the museum in the 1980s, but it was so popular they brought it back.

2019 Dev-Jam: Dustin and Dustin in Brontosaurus Hoodie

While not exactly the “Upside Down” in the evening the gang descended on Up-Down, a barcade located a few miles away. Jessica organized the trip and folks seemed to have a great time.

2019 Dev-Jam: Selfie of Folks at Up-Down.

The combination bar and arcade features vintage video games

2019 Dev-Jam: People Playing Video Games at Up-Down.

as well as pinball machines

2019 Dev-Jam: Selfie of Folks at Up-Down.

Of course, there was also a bar

2019 Dev-Jam: People at the Bar at Up-Down.

Good times.

2019 Dev-Jam – Day 2

While the OpenNMS team does a pretty good job working remotely, it is so nice to be able to work together on occasion. Here is an example.

I wanted to explore the current status of the OpenNMS Selenium monitor. My conclusion was that while this monitor can probably be made to work, it needs to be deprecated and probably shouldn’t be used.

I started off on the wiki page, and when I didn’t really understand it I just looked at the page’s history. I saw that it was last updated in 2017 by Marcel, and Marcel happened to be just across the room from me. After talking to him for awhile, I understood things much better and then made the decision to deprecate it.

The idea was that one could take the Selenium IDE, record a session and then export that to a JUnit test. Then that output would be minimally modified and added to OpenNMS so that it could periodically run the test.

The main issue is that the raw Selenium test *requires* Firefox, and Firefox requires an entire graphics stack, i.e. Xorg. Most servers don’t have that for a number of good reasons, and if you are trying to run Selenium tests on a large number of sites the memory resources could become prohibitive.

An attempt to address this was made using PhantomJS, another Javascript library that did not require a graphical interface. Unfortunately, it is no longer being maintained since March of 2018.

We’ve made a note of this with an internal OpenNMS issue. Moving forward the option looks like to use “headless Chrome” but neither OpenNMS nor Selenium support that at the moment.

We still have the Page Sequence Monitor. This is very efficient but can be difficult to set up.

Playing with that took up most of my morning. It was hard staying inside because it was a beautiful day in Minneapolis.

2019 Dev-Jam: Picture of Downtown Minneapolis from UMN

Most of my afternoon was spent working with OpenNMS customers (work doesn’t stop just because it is Dev-Jam) but I did wander around to see what other folks were doing.

2019 Dev-Jam: Jesse White with VR headset

Jesse was playing with a VR headset. The OpenNMS AI/Machine Learning module ALEC can create a visualization of the network, and he wrote a tool that lets you move through it in virtual reality (along with other people using other VR headsets). Not sure how useful it would be on a day to day basis, but it is pretty cool.

That evening most of us walked down the street to a pretty amazing Chinese restaurant. I always like bonding over food and we had discovered this place last year and were eager to return. I think the “bonding” continued after the meal at a bar across the street, but I ended up calling it a day.

2019 Dev-Jam: People at a table at a Chinese restaurant

2019 Dev-Jam: People at a table at a Chinese restaurant

2019 Dev-Jam – Day 1

Dev-Jam officially got started Monday morning at 10:00.

I usually kick off the week with a welcome and some housekeeping information, and then I turn it over to Jesse White, our project CTO. We do a roundtable introduction and then folks break off into groups and start working on the projects they find interesting.

This year we did something a little different. The development team scheduled a series of talks about the various things that have been added since the last Dev-Jam, and I spent most of the day listening to them and learning a lot of details about the amazing platform that is OpenNMS. While we had some technical difficulties, most of these presentations were recorded and I’ll add links to the videos once they are available.

2019 Dev-Jam: Graph of Main Projects Over the Last Year

Jesse started with an overview of the main development projects over the last year. Sentinel is a project to use the Kafka messaging bus to distribute OpenNMS functionality over multiple instances. While only implemented for telemetry data at the moment (flows and metrics) the goal is to enable the ability to distribute all of the functionality, such as service assurance polling and data collection, across multiple machines for virtually unlimited scalability.

After the Sentinel work, focus was on both the OpenNMS Integration API (OIA) and the Architecture for Learning Enabled Correlation (ALEC).

The OIA is a Java API to make it easier to add functionality to OpenNMS. While it is used internally, the goal is to make it easier for third parties to integrate with the platform. ALEC is a framework for adding AI and machine learning functions to OpenNMS. It currently supports two methods for the correlation of alarms into situations: DBScan and TensorFlow, but is designed to allow for others to be added.

The current development focus is on the next version of Drift. Drift is the feature that does flow collection, and there are a number of improvements being worked on for “version 2”.

2019 Dev-Jam: Title Slide for the Contributing to OpenNMS talk

Markus von Rüden gave the next talk on contributing to OpenNMS. He covered a number of topics including dealing with our git repository, pull requests, test driven development and our continuous integration systems.

2019 Dev-Jam: Title Slide for the Karaf/OSGi talk

Matt Brooks presented an overview on how to leverage Karaf to add functionality to OpenNMS. Karaf is the OSGi container used by OpenNMS to manage features, and Matt used a simple example to show the process for adding functionality to the platform.

2019 Dev-Jam: Title Slide for the OIA talk

Extending on this was a talk by Chandra Gorantla about using the OIA with an example of creating a trouble ticketing integration. OpenNMS has had a ticketing API for some time but this talk leveraged the improvements added by the new API to make the process easier.

2019 Dev-Jam: Title Slide for the ALEC talk

Following this was a talk by David Smith on ALEC. He demonstrated how to add a simple time-based correlation to OpenNMS which covered a lot of the different pieces implemented by the architecture, including things like feedback.

That ended the development overview part of the presentation but there were two more talks on Docker and Kubernetes.

2019 Dev-Jam: Slide showing Useful Docker Commands for OpenNMS

Ronny Trommer gave a short overview of running OpenNMS in Docker, covering a lot of information about how to deal with the non-immutable (mutable?) aspects of the platform such as configuration.

2019 Dev-Jam: Kubernetes Diagram

This was followed by an in-depth talk by Alejandro Galue on Kubernetes, running OpenNMS using Kubernetes and how OpenNMS can be used to monitor services running in Kubernetes. While Prometheus is the main application people implement for monitoring Kubernetes, it is very temporal and OpenNMS can augment a lot of that information, especially at the services level.

These presentations took up most of the day. Since it is hard to find places where 30 people can eat together, we have a tradition of getting catering from Brasa, and we did that for Monday night’s meal.

2019 Dev-Jam: Table Filled with Food from Brasa

Jessica Hustace, who did the majority of the planning for Dev-Jam, handed out this year’s main swag gift: OpenNMS jackets.

2019 Dev-Jam: OpenNMS logo jacket

Yup, I make this look good.

2019 Dev-Jam – Day 0

For the fourteenth time in fifteen years, a group of core community members and power users are getting together for our annual OpenNMS Developers Conference: Dev-Jam.

This is one of my favorite times of the year, probably second only to Thanksgiving. While we do a good job of working as a distributed team, there is nothing like getting together face-to-face once in awhile.

We’ve tried a number of venues including my house, Georgia Tech and Concordia University in Montréal, but we keep coming back to Yudof Hall on the University of Minnesota campus in Minneapolis. It just works out really well for us and after coming here so many times the whole process is pretty comfortable.

My role in Dev-Jam is pretty much just the “cruise director”. As is normal, other people do all the heavy lifting. I did go on a food and drink run which included getting “Hello Kitty” seaweed snacks.

2019 Dev-Jam: Hello Kitty Seaweed Snacks

Yudof Hall is a dorm. The rooms are pretty nice for dorm rooms and include a small refrigerator, two burner stove, furniture and a sink. You share a bathroom with one other person from the conference. On the ground floor there is a large room called the Club Room. On one side is a kitchen with tables and chairs. On the other side is a large TV/monitor and couches, and in the middle we set up tables. There is a large brick patio that overlooks the Mississippi River.

2019 Dev-Jam: Yudof Hall Club Room

The network access tends to be stellar, and with the Student Union just across the street people can easily take a break to get food.

We tend to eat dinner as a group, and traditionally the kickoff meal is held at Town Hall Brewery across the river.

2019 Dev-Jam: UMN Bridge Over the River

It was a pretty rainy day but it stopped enough for most of us to walk over the bridge to the restaurant. You could feel the excitement for the week start to build as old friends reunited and new friends were made.

2019 Dev-Jam: Town Hall Brewery

When we were setting up the Club Room tables, we found a whiteboard which is sure to be useful. I liked the fact that someone had written “Welcome Home” on it. Although I don’t live here, getting together with these people sure feels coming home.

2019 Dev-Jam: Welcome Home on Whiteboard

Meeting Owl

One of the cool things I get to do working at OpenNMS is to visit customer sites. It is always fun to visit our clients and to work with them to get the most out of the application.

But over the last year I’ve seen a decline in requests for on-site work. This is odd because general interest in OpenNMS is way up, and it finally dawned on me why – fewer and fewer people work in an office.

For example, we work with a large bank in Chicago. However, their monitoring guy moved to Seattle. Rather than lose a great employee, they let him work from home. When I went out for a few days of consulting, we ended up finding a co-working space in which to meet.

Even within our own organization we are distributed. There is the main office in Apex, NC, our Canadian branch in Ottawa, Ontario, our IT guy in Connecticut and our team in Europe (spread out across Germany, Italy and the UK). We tend to communicate via some form of video chat, but that can be a pain if a lot of people are in one room on one end of the conference.

When I was visiting our partner in Australia, R-Group, I got to use this really cool setup they have using Polycom equipment. Video consisted of two cameras. One didn’t move and was focused on the whole room, but the other would move and zoom in on whoever was talking. The view would switch depending on the situation. It really improved the video conferencing experience.

I checked into it when I got back to the United States, and unfortunately it looked real expensive, way more than I could afford to pay. However, in my research I came across something called a Meeting Owl. We bought one for the Apex office and it worked out so well we got another one for Ottawa.

The Meeting Owl consists of a cylindrical speaker topped with a 360° camera. It attaches to any device that can accept a USB camera input. The picture displays a band across the top that shows the whole panorama, but then software “zooms” in on the person talking. The bottom of the screen will split to show up to three people (the last three people who have spoken).

It’s a great solution at a good price, but it had one problem. In the usual setup, the Owl is placed in the center of the conference table, and usually there is a monitor on one side. When the people at the table are listening to someone remote (either via their camera or another Owl), the people seated along the sides end up looking at the large monitor. This means the Owl is pretty much showing everyone’s ear.

It bothers me.

Now, the perfect solution would be to augment the Owl to project a picture as a hologram above the unit so that people could both see the remote person as well as look at the Owl’s camera at the same time.

Barring that, I decided to come up with another solution.

Looking on Amazon I found an inexpensive HDMI signal splitter. This unit will take one HDMI input and split it into four duplicate outputs. I then bought three small 1080p monitors (I wanted the resolution to match the 1080p main screen we already had) which I could then place around the Owl. I set the Owl on the splitter to give it a little height.

Meeting Owl with Three Monitors

Now when someone remote, such as Antonio, is talking, we can look at the small monitors on the table instead of the big one on the side wall. I found that three does a pretty good job of giving everyone a decent view, and if someone is presenting their screen everyone can look at the big monitor in order to make out detail.

Meeting Owl in Call

We tried it this morning and it worked out great. Just thought I’d share in case anyone else is looking for a similar solution.

Review: Serval WS Laptop by System76

TL;DR; When I found myself in the market for a beefy laptop, I immediately ordered the Serval WS from System76. I had always had a great experience dealing with them, but times have changed. It has been sent back.

I can’t remember the first time I heard about the Serval laptop by System76. In a world where laptops were getting smaller and thinner, they were producing a monster of a rig. Weighing ten pounds without the power brick, the goal was to squeeze a high performance desktop into a (somewhat) portable form factor.

I never thought I’d need one, as I tend to use desktops most of the time (including a Wild Dog Pro at the office) and I want a light laptop for travel as it pretty much just serves as a terminal and I keep minimal information on it.

Recently we’ve been experimenting with office layouts, and our latest configuration has me trading my office for a desk with the rest of the team, and I needed something that could be moved in case I need to get on a call, record a video or get some extra privacy.

Heh, I thought, perhaps I could use the Serval after all.

I like voting for open source with my wallet. My last two laptops have been Dell “Sputnik” systems (2nd gen and 5th gen) since I wanted to support Dell shipping Linux systems, and when we decided back in 2015 that the iMacs we used for training needed to be replaced, I ordered six Sable Touch “all in one” systems from System 76. The ordering process was smooth as silk and the devices were awesome. We still get compliments from our students.

A year later when my HP desktop died, I bought the aforementioned Wild Dog Pro. Again, customer service to match if not rival the best in the business, and I was extremely happy with my new computer.

Jump forward to the present. Since I was in the market for a “luggable” system, performance was more important than size or weight, so I ordered a loaded Serval WS, complete with the latest Intel i9 processor, 64GB of speedy RAM, NVidia 1080 graphics card, and oodles of disk space. Bwah ha ha.

When it showed up, even I was surprised at how big it was.

Serval WS and Brick

Here you can see it in comparison to a old Apple keyboard. Solidly built, I was eager to plug it in and turn it on.

Serval WS

The screen was really bright, even though so was my office at the time. You can see from the picture that it was big enough to contain a full-sized keyboard and a numeric keypad. This didn’t really matter much to me as I was planning on using it with an awesome external monitor and keyboard, but it was a nice touch. I still like having a second screen since we rely heavily on Mattermost and I always like to keep a window in view and I figured I could use the laptop screen for that.

I had ordered the system with Ubuntu installed. My current favorite operating system is Linux Mint but I decided to play with Ubuntu for a little bit. This was my first experience with Ubuntu post Unity and I must say, I really liked it. Kind of made me eager to try out Pop!_OS which is the System76 distro based on Ubuntu.

When installing Mint I discovered that I made a small mistake when placing my Serval order. I meant to use a 2TB drive as the primary leaving a 1TB drive for use by TimeShift for backup. I reversed them. No real issue, as I was able to install Mint on the 2TB drive just fine after some creative partition manipulation.

Everything was cool until late afternoon when the sun went away. I was rebooting the system and found myself looking at a blank screen (for some reason the screen stays blank for a minute or so after powering on the laptop, I assume due to it having 64GB of RAM). There was a tremendous amount of “bleed” around the edges of the LCD.

Serval WS LCD Bleed

Damn.

Although it probably wouldn’t have impacted me much in day to day use, especially with an external monitor, I would know about it, and as I’m somewhere on the OCD spectrum it would bother me. Plus I paid a lot of money for this system and want it to be as close to perfect as possible.

For those of you who don’t know, the liquid crystals in LCD displays emit no light of their own and they get their illumination usually from a fluorescent source. If there are issues with the way the LCD panel is constructed, this light can “bleed” around the edges and degrade the display quality (it is also why it is hard to get really black images on LCD displays and this is fueling a lot of the excitement around OLED technology).

I’ve had issues with this before on laptops but nothing this bad. Not to worry, I have System76 to rely on, along with their superlative customer service.

I called the number and soon I was speaking with a support technician. When I described the problem they opened a ticket and asked me to send in a picture. I did and then waited for a response.

And waited.

And waited.

I commented on the ticket.

And I continued to wait.

The next day I waited a bit (Denver is two hours behind where I live) but when I got no response I decided, well, I’ll just return the thing. I called to get an RMA number but this time I wasn’t connected with a person and was asked to leave a message. I did, and I should note that I never got that return call.

At this point I’m frustrated, so I decided an angry tweet was in order. That got a response to my ticket, where they offered to send me a new unit.

Yay, here was a spark of the customer service I was used to getting. I’ve noticed a number of tech companies are willing to deal with defective equipment by sending out a new unit before the old unit is returned. In this day and age of instant gratification it is pretty awesome.

I wrote back that I was willing to try another unit, but would it be possible to put Pop!_OS on the new unit on the 2TB drive so that I could try it out of the box and know that all of the System76 specific packages were installed.

A little while later I got a reply that it wouldn’t be possible to install it on the 2TB drive, so I would end up having to reinstall in any case.

(sigh)

When I complained on Twitter I was told “Sorry to hear this, you’ll receive a phone call before EOD to discuss your case.” I worked until 8pm that night with no phone call, so I just decided to return the thing.

Of course, this would be at my expense and the RMA instructions were strict about requiring shipping insurance: “System76 cannot refund your purchase if the machine arrives damaged. For this reason, it is urgent that you insure your package”. The total cost was well over $100.

So I’m out a chunk of change and I’ve lost faith in a vendor of which I was extremely fond. This is a shame since they are doing some cool things such as building computers in the United States, but since they’ve lost sight of what made them great in the first place I have doubts about their continued success.

In any case, I ordered a Dell Precision 5530, which is one of the models available with Ubuntu. Much smaller and not as powerful as the Serval WS, it is also not as expensive. I’ll post as review in a couple of weeks when I get it.

#OSMC 2018 – Day 3: Hackathon

For several years now the OSMC has been extended by one day in the form of a “hackathon”. As I do not consider myself a developer I usually skip this day, but since I wanted to spend more time with Ronny Trommer and to explore the OpenNMS MQTT plugin, I decided to attend this year.

I’m glad I did, especially because the table where we sat was also home to Dave Kempe, and he brought Tim Tams from Australia:

OSMC 2018 Tim Tams

Yum.

You can find them in the US on occasion, but they aren’t as good.

I have been hearing about MQTT for several years now. According to Wikipedia, MQTT (Message Queuing Telemetry Transport) is a messaging protocol designed for connections with remote locations where a “small code footprint” is required or the network bandwidth is limited, thus making it useful for IoT devices.

Dr. Craig Gallen has been working on a plugin to allow OpenNMS to consume MQTT messages, and I was eager to try it out. First, we needed a MQTT broker.

I found that the OpenHAB project supports an MQTT broker called Mosquitto, so we decided to go with that. This immediately created a discussion about the differences between OpenHAB and Home Assistant, the latter being a favorite of Dave. They looked comparable, but we decided to stick with OpenHAB because a) I already had an instance installed on a Raspberry Pi, and b) it is written in Java, which is probably why others prefer Home Assistant.

Ronny worked on getting the MQTT plugin installed while I created a dummy sensor in OpenHAB called “Gas”.

OSMC 2018 Hackathon

This involved creating a “sitemap” in /etc/openhab2:

sitemap opennms label="My home automation" {
    Frame label="Date" {
        Text item=Date
    }
    Frame label="Gas" {
        Text item=mqtt_kitchen_gas icon="gas"
    }
}

and then an item that we could manipulate with MQTT:

Number mqtt_kitchen_gas "Gas Level [%.1f]" {mqtt="<[mosquitto:Home/Floor1/Kitchen/Gas_Sensor:state:default]"}

To install the MQTT plugin:

Ronny added the following to the configuration to connect to our Mosquitto broker on OpenHAB:

<mqttclients>
  <client clientinstanceid="client1">
    <brokerurl>tcp://172.20.11.8:1883</brokerurl>
    <clientid>opennms</clientid>
   <connectionretryinterval>3000</connectionretryinterval>
    <clientconnectionmaxwait>20000</clientconnectionmaxwait>
    <topiclist>
      <topic qos="0" topic="iot/#">
    </topic>
    <username>openhabian</username>
    <password>openhabian</password>
    </client>
</mqttClients>

Now that we had a connection between our OpenHAB Mosquitto broker and OpenNMS, we could try to send information. The MQTT plugin handles both event information and data collection. To test both we used the mosquitto_pub command on the CLI.

For an event one can use something like this:

#/bin/bash
mosquitto_pub -u openhabian --pw openhabian -t "iot/timtam" -m "{ \"name\": \"6114163\",  \"sensordatavalues\": [ { \"value_type\": \"Gas\", \"value\": \"$RANDOM\"  } ] }"

On the OpenNMS side you need to configure the MQTT plugin to look for it:

<messageEventParsers>
  <messageEventParser foreignSource="$topicLevels[5]" payloadType="JSON" compression="UNCOMPRESSED">
    <subscriptionTopics>
      <topic>iot/timtam/event/kitchen/mysensor/doorlock</topic>
    </subscriptionTopics>

    <xml-groups xmlns="http://xmlns.opennms.org/xsd/config/xml-datacollection">
      <xml-group name="timtam-mqtt-lab" resource-type="sensors" resource-xpath="/" key-xpath="@name">
        <xml-object name="instanceResourceID" type="string" xpath="@name"/>
        <xml-object name="gas" type="gauge" xpath="sensordatavalues[@value_type="Gas"]/value"/>
      </xml-group>
    </xml-groups>
    <ueiRoot>uei.opennms.org/plugin/MqttReceiver/timtam/kitchen
  </messageEventParser>
</messageEventParsers>

Note how Ronny worked our Tim Tam obsession into the configuration.

To make this useful, you would want to configure an event definition for the event with the Unique Event Identifier (UEI) of uei.opennms.org/plugin/MqttReceiver/timtam/kitchen:

<events xmlns="http://xmlns.opennms.org/xsd/eventconf">
  <event>
    <uei>uei.opennms.org/plugin/MqttReceiver/timtam/kitchen</uei>
    <event-label>MQTT: Timtam kitchen lab event</event-label>
    <descr>This is our Timtam kitchen lab event</descr>
    <logmsg dest="logndisplay">
      All the parameters: %parm[all]%
    </logmsg>
    <severity>Normal</severity>
    <alarm-data reduction-key="%uei%:%dpname%:%nodeid%:%interface%:%service%" alarm-type="1" auto-clean="false"/>
  </event>
</events>

Once we had that working, the next step was to use the MQTT plugin to collect performance data from the messages. We used this script:

#!/bin/bash
while [ true ]
do
mosquitto_pub -u openhabian --pw openhabian -t "Home/Floor1/Kitchen/Gas_Sensor" -m "{ \"name\": \"6114163\",  \"sensordatavalues\": [ { \"value_type\": \"Gas\", \"value\": \"$RANDOM\"  } ] }"
sleep 10
done

This will create a message including a random number every ten seconds.

To have OpenNMS look for it, the MQTT configuration is:

<messageDataParsers>
  <messageDataParser foreignSource="$topicLevels[5]" payloadType="JSON" compression="UNCOMPRESSED">
    <subscriptionTopics>
      <topic>iot/timtam/metric/kitchen/mysensor/gas</topic>
    </subscriptionTopics>
    <xml-groups xmlns="http://xmlns.opennms.org/xsd/config/xml-datacollection">
      <xml-group name="timtam-kitchen-sensor" resource-type="sensors" resource-xpath="/" key-xpath="@name">
        <xml-object name="instanceResourceID" type="string" xpath="@name" />
        <xml-object name="gas" type="gauge" xpath="sensordatavalues[@value_type="Gas"]/value"/>
      </xml-group>
    </xml-groups>
    <xmlRrd step="10">
      <rra>RRA:AVERAGE:0.5:1:20160</rra>
      <rra>RRA:AVERAGE:0.5:12:14880</rra>
      <rra>RRA:AVERAGE:0.5:288:3660</rra>
      <rra>RRA:MAX:0.5:288:3660</rra>
      <rra>RRA:MIN:0.5:288:3660</rra>
    </xmlRrd>
  </messageDataParser>
</messageDataParsers>

This will store the values in an RRD file which can then be graphed within OpenNMS or through Grafana with the Helm plugin.

It was pretty straightforward to get the OpenNMS MQTT plugin working. While I’ve focused mainly on what was accomplished, it was a lot of fun chatting with others at our table and in the room. As usual, Netways did a great job with organization and I think everyone had fun.

Plus, I got to be reminded of all the amazing stuff being done by the OpenNMS team, and how the view is great up here while standing on the shoulders of giants like Ronny and Craig.

#OSMC 2018 – Day 2

Despite how long the Tuesday night festivities lasted, quite a few people managed to make the first presentation on Wednesday morning. I’m old so I had gone to bed fairly early and was able to see “Make IT Monitoring Ready for Cloud-native Systems” bright and early.

OSMC 2018 RealOpInsight

This presentation focused on a project called RealOpInsight. This seems to be a sort of “Manager of Managers” for multiple monitoring applications, and I didn’t really see a “cloud-native” focus in the presentation. It is open-source so if you find yourself running many instances of disparate monitoring platforms you may find RealOpInsight useful.

This was followed by a presentation from Uber.

OSMC 2018 Uber

One can imagine the number of metrics an organization like Uber collects (and I did refrain myself from making snarky comments like “what database do you use to track celebrities?” and “where do you count the number of assaults by Uber drivers?”). Rob Skillington seemed pretty cool and I didn’t want to put him on the spot.

Uber used to use Cassandra, which is a storage option for OpenNMS, but they found when they hit around 80,000 metrics per second the system couldn’t keep up (one of the largest OpenNMS deployments is 20,000 metrics/sec so 80K is a lot). Their answer was to create a new storage system called M3DB. While it seems pretty impressive, I did ask some questions about how mature it was because at OpenNMS we are always looking out for ways to make things easier for our users, and Rob admitted that while it works well for Uber it needs some work to be generally useful, which is why they open-sourced it. We’ll keep an eye on it.

The next time slot was the “German only” one I mentioned in my last post, so I engaged in the hallway track until lunch.

OSMC 2018 Rihards Olups

It was lovely to see Rihards Olups again. We met at the first OSMC I attended when he was part of the “Latvian Army” at Zabbix. He gave an entertaining talk on dealing with the alerts from your monitoring system, and he ended with the tag line “Make Alerts Meaningful Again (MAMA)”. Seems like a perfect slogan for a ball cap, preferably in red.

OSMC 2018 Dave Kempe

Another delightful human being I got to see was Dave Kempe, who came all the way from Sydney. While we had met at a prior OSMC, this conference we ended up spending a lot more time together (he was in the Prometheus training as well as the Thursday Hackathon). He gave a talk on being a monitoring consultant, and it was interesting to compare his experiences with my own (they were similar).

For most people the conference ended on Wednesday. I said goodbye to people like Peter Eckel and looked forward to the next OSMC so I could see them again.

Speaking of the next OSMC, we are going to be doing OpenNMS training on that first day, November 4th, so save the date. It is the least we could do since they went to the trouble to advertise OpenNMS Horizon® on all their posters (grin).

OSMC 2018 Horizon

Ronny and I were hanging around for the Hackathon on Thursday, and for those attendees there was a nice dinner at a local restaurant called Tapasitos. It was fun to spend more time with the OSMC gang and to get ready for our last day at the conference.

OSMC 2018 Tapasitos

#OSMC 2018 – Day 1

The 2018 Open Source Monitoring Conference officially got started on Tuesday. This was my fifth OSMC (based on the number of stars on my badge), although I am happy to have been at the very first OSMC conference with that name.

As usual our host and Master of Ceremonies Bernd Erk started off the festivities.

OSMC 2018 Welcome

This year there were three tracks of talks. Usually there are two, and I’m not sure how I feel about more tracks. Recently I have been attending Network Operator Group (NOG) meetings and they are usually one or two days long but only one track. I like that, as I get exposed to things I normally wouldn’t. One of my favorite open source conferences All Things Open has gotten so large that it is unpleasant to navigate the schedule.

In the case of the OSMC, having three tracks was okay, but I still liked the two track format better. One presentation was always in English, although one of the first things Bernd mentioned in his welcome was that Mike Julian was unable to make it for his talk on Wednesday and thus that time slot only had two German language talks.

If they seem interesting I’ll sit in on the German talks, especially if Ronny is there to translate. I am very interested in open source home automation (well, more on the monitoring side than, say, turning lights on and off) so I went to the OpenHAB talk by Marianne Spiller.

OSMC 2018 OpenHAB

I found out that there are mainly two camps in this space: OpenHAB and Home Assistant. The former is in Java which seems to invoke some Java hate, but since I was going to use OpenHAB for our MQTT Hackathon on Thursday I thought I would listen in.

OSMC 2018 Custom MIB

I also went to a talk on using a Python library for instrumenting your own SNMP MIB by Pieter Hollants. We have a drink vending machine that I monitor with OpenNMS. Currently I just output the values to a text file and scrape them via HTTP, but I’d like to propose a formal MIB structure and implement it via SNMP. Pieter’s work looks promising and now I just have to find time to play with it.

Just after lunch I got a call that my luggage had arrived at the hotel. Just in time because otherwise I was going to have to do my talk in the Icinga shirt Bernd gave me. Can’t have that (grin).

My talk was lightly attended, but the people who did come seemed to enjoy it. It was one of the better presentations I’ve created lately, and the first comment was that the talk was much better than the title suggested. I was trying to be funny when I used “OpenNMS Geschäftsbericht” (OpenNMS Annual Report) in my submission. It’s funny because I speak very little German, although it was accurate since I was there to present on all of the cool stuff that has happened with OpenNMS in the past year. It was recorded so I’ll post a link once the videos are available.

In contrast, Bernd’s talk on the current state of Icinga was standing room only.

OSMC 2018 State of Icinga

The OSMC has its roots in Nagios and its fork Icinga, and most people who come to the OSMC are there for Icinga information. It is easy to why this talk was so popular (even though it was basically “Icinga Geschäftsbericht” – sniff). The cool demo was an integration Bernd did using IBM’s Node-RED, Telegram and an Apple Watch, but unfortunately it didn’t work. I’m hoping we can work up an Apple Watch/OpenNMS integration by next year’s conference (should be possible to add hooks to the Watch from the iOS version of Compass).

The evening event was held at a place called Loftwerk. It was some distance from the conference so a number of buses were chartered to take us there. It was fun if a bit loud.

OSMC 2018 Loftwerk

OSMC celebrations are known to last into the night. The bar across the street from the conference hotel (which I believe has changed hands at least three times in the lifetime of the OSMC) becomes “Checkpoint Jenny” once the main party ends and can go on until nearly dawn, which is why I like to speak on the first day.

#OSMC 2018 – Day 0: Prometheus Training

To most people, monitoring is not exciting, but it seems lately that the most exciting thing in monitoring is the Prometheus project. As a project endorsed by the Cloud Native Computing Foundation, Prometheus is getting a lot of attention, especially in the realm of cloud applications and things like monitoring Kubernetes.

At this year’s Open Source Monitoring Conference they offered a one day training course, so I decided to take it to see what all the fuss was about. I apologize in advance that a lot of this post will be comparing Prometheus to OpenNMS, but in case you haven’t guessed I’m biased (and a bit jealous of all the attention Prometheus is getting).

The class was taught by Julien Pivotto who is both a Prometheus user and a decent instructor. The environment consisted of 15 students with laptops set up on a private network to give us something to monitor.

Prometheus is written in Go (I’m never sure if I should call it “Go” or if I need to say “Golang”) which makes it compact and fast. We installed it on our systems by downloading a tarball and simply executing the application.

Like most applications written in the last decade, the user interface is accessed via a browser. The first thing you notice is that the UI is incredibly minimal. At OpenNMS we get a lot of criticism of our UI, but the Prometheus interface is one step above the Google home page. The main use of the web page is for querying collected metrics, and a lot of the configuration is done by editing YAML files from the command line.

Once Prometheus was installed and running, the first thing we looked at was monitoring Prometheus itself. There is no real magic here. Metrics are exposed via a web page that simply lists the variables available and their values. The application will collect all of the values it finds and store them in a time series database called simply the TSDB.

The idea of exposing metrics on a web page is not new. Over a decade ago we at OpenNMS were approached by a company that wanted us to help them create an SNMP agent for their application. We asked them why they needed SNMP and found they just wanted to expose various metrics about their app to monitor its performance. Since it ran on Linux system with an embedded web server, we suggested that they just write the values to a file, put that in the webroot, and we would use the HTTP Collector to retrieve and store them.

The main difference between that method and Prometheus is that the latter expects the data to be presented in a particular format, whereas the OpenNMS method was more free-form. Prometheus will also collect all values presented without extra configuration, whereas you’ll need to define the values of interest within OpenNMS.

In Prometheus there is no real auto-discovery of devices. You edit a file in which you create a “job”, in our case the job was called “Prometheus”, and then you add “targets” based on IP address and port. As we learned in the class, for each different source of metrics there is usually a custom port. Prometheus stats are on port 9100, node data is exposed on 9090 via the node_exporter, etc. When there is an issue, this can be reflected in the status of the job. For example, if we added all 15 Prometheus instances to the job “Prometheus” and one of them went down, then the job itself would show as degraded.

After we got Prometheus running, we installed Grafana to make it easier to display the metrics that Prometheus was capturing. This is a common practice these days and a good move since more and more people are becoming familiar it. OpenNMS was the first third-party datasource created for Grafana, and the Helm application brings bidirectional functionality for managing OpenNMS alarms and displaying collected data.

After that we explored various “components” for Prometheus. While a number of applications are exposing their data in a format that Prometheus can consume, there are also other components that can be installed, such as the node_exporter which displays server-related metrics and to provide data that isn’t otherwise natively available.

The rest of the class was spent extending the application and playing with various use cases. You can “alertmanager” to trigger various actions based on the status of metrics within the system.

One thing I wish we could have covered was the “push” aspect of Prometheus. Modern monitoring is moving from a “pull” model (i.e. SNMP) to a “push” model where applications simply stream data into the monitoring system. OpenNMS supports this type of monitoring through the telemetryd feature, and it would be interesting to see if we could become a sink for the Prometheus push format.

Overall I enjoyed the class but I fail to see what all the fuss is about. It’s nice that developers are exposing their data via specially formatted web pages, but OpenNMS has had the ability to collect data from web pages for over a decade, and I’m eager to see if I can get the XML/JSON collector to work with the native format of Prometheus. Please don’t hate on me if you really like Prometheus – it is 100% open source and if it works for you then great – but for something to manage your entire network (including physical servers and especially networking equipment like routers and switches) you will probably need to use something else.

[Note: Julien reached out to me and asked that I mention the SNMP_Exporter which is how Prometheus gathers data from devices like routers and switches. It works well for them and they are actively using it.]