I'm running Munin 1.2.3 with the -r911:912 patch (for huge RRD files) applied. Maybe this has been fixed since then, but since it's a quite severe error (data loss) I'm submitting a ticket just to make sure. It's been working fine so far, until I deleted a plugin (/etc/munin/plugins/vlan_bandwidth_apressen-underskog) from the node, and forgot to restart it so it would stop listing it as an available plugin. After that the server process started gathering completely bogus data for a number of the plugins on that host (curiously enough it didn't apply to all of them). It logged things like this:
Feb 28 09:00:18 [32629] - Unable to update hmg9.no.linpro.net -> vr0.hmg9.no.linpro.net -> vlan_bandwidth_braastadklynga -> out_transit: No such field (no "label" field defined when running plugin with "config").
Feb 28 09:00:18 [32629] - Unable to update hmg9.no.linpro.net -> vr0.hmg9.no.linpro.net -> vlan_bandwidth_braastadklynga -> out_telenor: No such field (no "label" field defined when running plugin with "config").
Feb 28 09:00:18 [32629] - Unable to update hmg9.no.linpro.net -> vr0.hmg9.no.linpro.net -> vlan_bandwidth_braastadklynga -> out_nix: No such field (no "label" field defined when running plugin with "config").
The vlan_bandwidth_* plugins never had fields named out_{transit,telenor,nix} - so it doesn't really make any sense. The plugin that was removed did have those fields though. Compare:
config vlan_bandwidth_braastadklynga
host_name vr0.hmg9.no.linpro.net
graph_order in out
graph_category Bandwidth
graph_args --base 1000
graph_vlabel bps in (-) / out (+)
graph_title Braastadklynga dom0 og ilo (702)
in.label bps
in.cdef in,8,*
in.skipdraw 1
in.type DERIVE
in.min 0
out.label bps
out.cdef out,8,*
out.type DERIVE
out.min 0
out.negative in
.
fetch vlan_bandwidth_braastadklynga
in.value 568914
out.value 748979
.
config vlan_bandwidth_splitout_apressen
host_name vr0.hmg9.no.linpro.net
graph_category Bandwidth
graph_args --base 1000 -l 0
graph_vlabel bps out
graph_total Total
graph_title A-Pressen Interaktiv AS (509) outgoing (split)
out_transit.label transit (except telenor)
out_transit.cdef out_transit,8,*
out_transit.type DERIVE
out_transit.min 0
out_transit.draw AREA
out_telenor.label transit (telenor)
out_telenor.cdef out_telenor,8,*
out_telenor.type DERIVE
out_telenor.min 0
out_telenor.draw STACK
out_nix.label nix1
out_nix.cdef out_nix,8,*
out_nix.type DERIVE
out_nix.min 0
out_nix.draw STACK
.
fetch vlan_bandwidth_splitout_apressen
out_transit.value 513406364719
out_telenor.value 3906463
out_nix.value 89245664210
.
Those are wild card plugins (the part after the last underscore changes, the fields and other config does not). Another weird thing is that it also created RRD files for the "out" field for some vlan_bandwidth_splitout_ plugins (which doesn't have any field called "out").
I've got all the logs, and a copy of the RRD files and the generated HTML at stat:~tore/munin-bug/. Not attaching it here due to its semi-confidential nature as well the size (around 600M in total).
I know that the values gathered are completely bogus (they're not based on the correct values somehow). For instance the vlan_bandwidth_nix1 plugin has always had 0 for both the in and out fields - it's actually impossible for it to have anything else due to our network layout (so it's indeed quite useless). However in the period the error lasted the graph did display activity.
Tore