OpenNMS Could Aggravate Bug on Cisco Switches Running 12.2(44)se1

Tom Powers posted on the install list today information he discovered about HTTP polling on Cisco 3550 switches.

On Cisco 3550 switches, running 12.2(44)SE1, the standard OpenNMS HTTP service polling will kick off a memory leak in these switches.

This is resolved with se2 and higher (se6 is out now).

On a switch with 30+ mb ram…it took about 3 days to kick off the memory leak to where HTTP, HTTPS, Telnet and SSH failed. The switch still runs…but the memory pool shows full (steady decrease from point of monitoring)

This article talks about the http core services memory leak fixed first in SE2.

We found this in an installation with 56 switches and every one of these 3550s (eight of them) with 12.2(44) SE1 leaked out. Older and newer switches all were fine.

Just a heads up. Note that this is not an OpenNMS bug. The fix is to upgrade the Cisco software, and a workaround would be to unmanage HTTP on these devices. To see if you have any Cisco 3550 switches in your network, if you have SNMP enabled on all your devices you should be able to find them in the database:

psql -U opennms opennms

SELECT nodeid, nodelabel, nodesysdescription
     FROM node
     WHERE nodesysdescription LIKE '%3550%';

q

I don’t have any of these switches on my network, so I can’t verify it but my guess is that the sysDescription will include the software version as well.

UPDATE: Tom wrote more on this:

After talking with Cisco this morning, and explaining our situation and findings … this issue is not confined just the 3550 switches…it is all Cisco catalyst switches with the 12.2(44)se1 revision IOS that are affected. We were just lucky enough to have all 3550s in the environment we caught this in.

Turns out the rest of the series switches, 2900’s and such are affected as well since they all use the same IOS.

Cisco also confirmed that polling of the http interface like OpenNMS does would aggravate this leak.

Here’s a picture of what happens.

2 thoughts on “OpenNMS Could Aggravate Bug on Cisco Switches Running 12.2(44)se1”

Here’s an alternate query that should run much faster if you have tens of thousands of nodes in your database (based on information from the latest CISCO-PRODUCTS-MIB:

SELECT nodeid, nodelabel, nodesysdescription
FROM node
WHERE nodesysoid ~ ‘^.1.3.6.1.4.1.9.1.(36[678]|431|45[23]|485)$’;

See also I think my firewalls are trying to kill me. And don’t even mention brocade switches….

Comments are closed.