[noctool-cvs] CVS web
imattsson
imattsson at common-lisp.net
Wed May 14 20:27:41 UTC 2008
Update of /project/noctool/cvsroot/web
In directory clnet:/tmp/cvs-serv8664
Modified Files:
hacking.html
Log Message:
IM
Added more hacking guidance.
--- /project/noctool/cvsroot/web/hacking.html 2008/05/13 06:01:37 1.1
+++ /project/noctool/cvsroot/web/hacking.html 2008/05/14 20:27:41 1.2
@@ -19,15 +19,15 @@
The scheduler class points to a double-linked queue of
time-slots. These time-slots in turn know when they're supposed to run
and what events that need to be run at that time.
-
+<p>
Adding an event to a scheduler is a two-fold process. First find (or
create) the time-slot that corresponds to the time we want to run a
specific event, then push the event onto the scheduler.
-
+<p>
The way any event is being run is by having the generic function
PROCESS called with the event as argument. In general, events are
monitor objects.
-
+<p>
The scheduler API is fairly limited, consisting of the following
functions:
<dl>
@@ -42,10 +42,78 @@
<dd>Return the next time-slot and remove it from the scheduler.
</dl>
+<h2>Equipment, monitors and graphs</h2>
-
-
-
+Two important concepts in NOCtool are "equipment" and
+"monitors". Essentially, each "equipment" represents a physical
+network element (a server, switch, router, hub or any other piece of
+physical kit we want to monitor) and each "monitor" is one specific
+thing on the equipment we want to check (network interface status,
+network traffic, certain processes running/not running, TCP-based
+services responding, disk utilisation...).
+<p>
+The current severity-of-fault for any equipment is the maximum of the
+severity status for any of its monitors, so in that respect, monitors
+are treated equally. The alert-level is a number in the 0-255 range
+and if ever set to a lower-than-current value will slowly move towards
+the lower value. At the moment, the decay is linear (every time a new
+value is set, the alert level will be the maximum of old-5 and new).
+<p>
+Each equipment class can have 0 or more default monitors associated
+with it. There's a mapping from the equipment class to a list of
+monitors that should always exist for that equipment class.
+<p>
+Monitors can either be graphing or non-graphing. Only graphing monitor
+keep any historical state (beyond "what is my current alert
+level"). There's several types of built-in graphin classes, depending
+on the expected characteristics of the data the monitor retrieves.
+<p>
+The graph classes come in several sub-classes, depending on how they
+transfer data fromshorter-term storage to longer-term storage. Each
+storage class keeps 300 records and every 12th value added to a
+storage-class initiates a transfer of data to longer-term storage.
+<p>
+If we have a graphing monitor that stores data every 5 minutes, this
+means we'll have data with 5-minute granularity in the short-term
+storage, for a maximum of 25 hours of detailed data. Every hour, this
+data will also update the medium-term storage, for roughly 12 days
+worth of data with hourly granulatity and twice a day, the medium-term
+data will be used to populate the long-term storage, where we will
+have 150 days worth of data with a rather coarse granularity. This
+method is obviously inspired by MRTG and RRDTool.
+<p>
+The graph subclasses determine how data is treated as it is shifted
+from one storage to another. Looking at this, what's actually being
+shifted isn't the "latest 12", it's the "the 12 that are about to be
+over-written". <p>
+<table border=1>
+<tr><th>Graph subclass</th><th>Behaviour</th></tr>
+<tr><td>gauge-graph</td><td>The transferred value is the median value in the
+last 12 time units (actually the mean of the 6th and 7th value, when
+the last 12 are sorted in order)</td></tr>
+<tr><td>meter-graph</td><td>The value about to be over-written in the shorter-time storage
+is transferred over. This is intended for data sources that are
+increasing (like, say, an output byte counters)</td></tr>
+<tr><td>max-graph</td><td>This adds the maximum of the last 12 shorter-time
+units to the longer-time storage</td></tr>
+<tr><td>avg-graph</td><td>This averages the 12 oldest records and shifts that
+value into the longer-term storage</td></tr>
+</table>
+
+<h2>Config files</h2>
+
+The config files for NOCtool are intended to be written in
+pseudo-lisp. This is implemented as macros. Each top-level macro
+should have a logical name for the equipment type and create an object
+of a suitable class and bind *CONFIG-OBJECT* this to and re-bind
+*MACRO-NESTING* to a cons of *MACRO-NESTING and a suitable keyword
+symbol for the top-level object.
+<p>
+This is then used to make sure that sub-macros can check that they're
+in the right context for whatever configuration they
+provide. Sub-macros can be defined with the DEFNESTED macro.
+<p>
+All config files are loaded in a scrap package.
</body>
</html>
More information about the noctool-cvs
mailing list