Ticket #828 (closed defect: fixed)

Opened 2 years ago

Last modified 3 months ago

Suppress "occasional" unknown states to avoid alerts.

Reported by: janl Assigned to: feiner.tom
Priority: normal Milestone: Munin 1.4.4
Component: master Version: 1.4.2
Severity: normal Keywords:
Cc: stevew@purdue.edu

Description

From munin-users list:

Nicolai,

I'm attaching a patch to LimitsOld?.pm v1.4.1 to handle the occasional UNKNOWN state(s) without triggering an alert. It works by adding another line to the "limits" file that maintains a countdown (starting at "unknown_limit" in the plugin configuration) of consecutive UNKNOWN states that have been seen for a given plugin. When the countdown reaches zero, then a state transition is recognized and any requested alert is issued. If the state changes from UNKNOWN to some other state before the countdown reaches zero, then an alert on the UNKNOWN state is never issued.

Hopefully this will be easy to merge into v1.4.2 which I see was just released.

Here is a documentation snippet that could be included in the documentation wiki:

{fieldname}.unknown_limit Instructs munin-limits to ignore N consecutive UNKNOWN values before recognizing a state transition to UNKNOWN and issuing an alert through the contact command (if any). N must be an integer value greater than or equal to one. Available in Munin 1.4.2.

Please let me know if you have any questions.

Attachments

LimitsOld.pm.patch (2.5 kB) - added by janl on 12/29/09 17:15:38.
Config.pm.patch (511 bytes) - added by stevew on 01/08/10 23:07:07.
Patch to Config.pm
LimitsOld.pm.2.patch (4.9 kB) - added by stevew on 01/08/10 23:08:00.
Replacement patch for LimitsOld?.pm
Config.pm.patch_1.4.3 (511 bytes) - added by stevew on 01/11/10 19:39:18.
LimitsOld.pm.patch_1.4.3 (4.4 kB) - added by stevew on 01/11/10 19:41:53.

Change History

12/29/09 17:15:38 changed by janl

  • attachment LimitsOld.pm.patch added.

01/08/10 23:06:16 changed by stevew

I've made some more changes to this patch to improve the concept as well as correct a few bugs. Please use this second set of patches when making changes to the code base.

Thanks!

Steve

01/08/10 23:07:07 changed by stevew

  • attachment Config.pm.patch added.

Patch to Config.pm

01/08/10 23:08:00 changed by stevew

  • attachment LimitsOld.pm.2.patch added.

Replacement patch for LimitsOld?.pm

01/11/10 19:27:56 changed by stevew

Attaching patches for v. 1.4.3.

01/11/10 19:39:18 changed by stevew

  • attachment Config.pm.patch_1.4.3 added.

01/11/10 19:41:53 changed by stevew

  • attachment LimitsOld.pm.patch_1.4.3 added.

01/11/10 20:45:35 changed by feiner.tom

Note: The Config.pm patch refers to the Config.pm in the common/lib/Munin/Common/ directory - not the one under master/lib/Munin/Master/

01/11/10 22:41:38 changed by janl

  • owner changed from nobody to feiner.tom.

01/13/10 14:33:34 changed by feiner.tom

Steve Wilson wrote:

Thanks, Tom. You might want to keep snapshots of the limits file to monitor what's happening. I do this by modifying the munin-cron script to include the following line at the end: cat /var/lib/munin/limits | sed -e "s//$(date +'%F %T') /" >> /var/log/munin/debuglog This will grab the current contents of the limits file, prepend each line with a timestamp, and then add them all to a special log file. I can then watch the log file to see when an UNKNOWN value is seen, how many consecutive UNKNOWN values occur, and if the state appropriately transitions after "unknown_limit" of UNKNOWNS are seen. Steve

Thanks for the tip, I tried this for the last 2 days, and it indeed seems to be working very nicely.

I tried:

* Specifying : "df._dev_hda6.unknown_limit 3" in one of the modes in my munin.conf file and tracked /var/log/munin/debuglog, and it indeed only changed the status to unknown after 3 tries.

However, since munin 1.4 is more sensative to UNKNOWNS (probably because of the defualt 10 second timeout per plugin), maybe it would be a good idea to default to 2 or 3 unknowns before changing the state, I did this by changing line 476 in LimitsOld?.pm from:

from:

my $unknown_limit = munin_get($hash, "unknown_limit", 1);

to:

my $unknown_limit = munin_get($hash, "unknown_limit", 3);

We should also remove the "magic numbers" and give it a meaning full variable name, like $default_unknown_limit or something.

And that finally caused munin to stop sending unknown emails globally.

I guess there are only 2 options to solve this problem, either raise the number of unknowns globaly, or make munin less sensetive (like it was in 1.2.6).

Nicolai, can you comment on this issue?

Thanks, Tom

01/13/10 14:35:40 changed by feiner.tom

  • cc set to stevew@purdue.edu.

01/13/10 16:24:44 changed by feiner.tom

  • summary changed from Supress "occasional" unknown states to avoid alerts. to Suppress "occasional" unknown states to avoid alerts..

Committed the patch to the next debian package 1.4.3-2 with a slight change defaulting to 3 unknowns globally before changing state.

The patch can be found here:

http://munin.projects.linpro.no/browser/branches/debian/squeeze/trunk/debian/patches/101-suppress-occasional-unknown-states-to-avoid-alerts.patch

03/08/10 21:24:39 changed by feiner.tom

  • status changed from new to closed.
  • resolution set to fixed.

The patch was applied to munin 1.4.4. Marking bug as fixed.