[noctool-devel] average "monitor"

Ingvar ingvar at hexapodia.net
Mon Jul 28 15:12:56 UTC 2008


Jim writes:
> Hello,
> 
> I'm wondering if anyone has any thoughts on how one might make an 
> "monitor" that represents the average of several other monitors.

I'd actually be inclined to call that a "view" and either have it be one or 
more "equipment"s or one or more "monitor"s (or, possibly, a composite of 
them) and default to either "min", "avg" or "max" (in the case of 
"equipment"-centred views, I suspect either "avg" or "max" would be the right 
default, for views composed of "monitor"s, I am less sure). I can probably 
think of even more interesting ways of aggregating measures into useful 
values, if given a few more moments to think about it. [1]

Just so that's on record, somewhere. :)

The typical use-case, as I see it, is to slap sufficient inter-related things 
into one or more views, so all you'd look at frequently is the status for the 
view-as-such, then opening the view up to watch components within the view (be 
that one or more equipment objects or one or more monitors; sort of how the 
equipment aggregates monitors).
 
> I'm looking at http://meta.rocksclusters.org/ganglia/ right now where they 
> display a graph of the average load over some 450+ machines.  How might 
> you implement something like that in NOCtool?

Depends on, I would've thought.

> p.s. anyone seen any "good" monitoring UIs?  something they like...  I 
> can't say I ever really have :P

Closest I've seen so far is HP OpenView and Spectrum (no longer Cabletron, but 
surprisingly still alive). Both rely heavily on the admin(s) to set up decent 
views, as uncareful adding of monitored elements tends towards "crowded".

//Ingvar
[1] Off the top of my head, I could probably make a case for:
    minimum measure/alert level
    arithmetic mean of measure/alert level
    median of measure/alert level
    geometric mean of measure/alert level (this'd be "multiply all N 
       measures together, extract the Nth root, this is the geomean)
    maximum of measure/alert level

    Minimum is the one I'd have teh hardest time to defend, but...
    The three averages are variously useful for performance indication 
      (artithmean is useful for a fine-grained load-balancing; median is handy 
       for most practical purposes, I would've thought and the geomean ought 
       to spike as you starts towards having more servers with an issue, while 
       being fairly unresponsive when there's just a small problem)
    Max is handy whenever you have a small number of things aggregated (or not 
    much load-balancing between them).




More information about the Noctool-devel mailing list