Actions: | Security

AllGoodBits.org

Navigation: Home | Services | Tools | Articles | Other

Fixing Erroneous Data in Ganglia Metrics by Editing RRDtool

Once upon a time, some of my network graphs generated by ganglia showed that some of my machines managed to shunt > 450 Petabytes/second of network traffic for about 45 seconds. Given that these things have a couple of gigabit NICs, I figured that we hadn't broken Physics and that these numbers were Incorrect.

This led me to discover that, contrary to my previous understanding/assumption, the RRDtool files that ganglia uses to store its time-series data are not too difficult to work with. This is because there is a straightforward editing pattern of dump-to-xml, edit, restore-from-xml.

Here's my "one-liner" to fix this mess:

for host in kvm1 kvm2 kvm3;
do host=${host}$( hostname | sed -e 's/^[a-z]*//')
 for m in bytes pkts;
 do for d in in out;
    do rrdtool dump /var/lib/ganglia/rrds/kvm/${host}/${m}_${d}.rrd > /tmp/${m}_${d}.xml;
    sed -i -e 's/e+17 /e+05 /' /tmp/${m}_${d}.xml;
    mv  /var/lib/ganglia/rrds/kvm/${host}/${m}_${d}.rrd /tmp/ ;
    rrdtool restore  /tmp/${m}_${d}.xml  /var/lib/ganglia/rrds/kvm/${host}/${m}_${d}.rrd;
    done;
 done;
done

Now all I have to do is work out why this happened in the first place. But at least I have some more practice and less aversion concerning RRDtool.