root/eddie/trunk/doc/CHANGES.txt

Revision 918, 62.7 KB (checked in by chris, 8 months ago)

version = '0.37.2' for release and updated CHANGES.txt.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
Line 
1Eddie CHANGES
2(reverse chronological order)
3
4Eddie-0.37.2 (04-Nov-2008)
5 - Updated the eddie-agent SMF manifest, removing the need for a
6   method script.
7 - Bugfix: Solaris filesystem information was not being output properly.
8 - Add the missing vim syntax colouring to the "regfile" statement.
9   Patch submitted by Peter Poeml.
10 - Improved the parsing of the LOGSCAN negate values.
11 - Updated documentation & comments for the DBI directive.
12 - Cleaned up part of the LOGSCAN code, for computing the number
13   of matched and unmatched lines. LOGSCAN now defaults to matching
14   all lines if neither regex or regfile arguments are defined.
15 - Fixed the documentation for LOGSCAN, which was showing examples
16   using the linecount variable as the number of lines matched,
17   instead of matchedcount.  This had lead to some confusion.
18 - Moved eddie_wrapper shell script to contrib directory.
19 - Renamed "HP-UX" to "HP_UX" as the former was not a valid Python
20   module/package name.
21 - Moved all operating system specific modules under eddietool.arch.
22 - Replace characters in osname, osver, osarch that cannot be used
23   in Python module names.
24 - Exceptions are now defined as sub-classes of Exception.
25 - Restructured source as an installable Python package (eddietool) with a
26   console script "eddie-agent".  setuptools is used so Eddie can be
27   distributed as an egg.
28 - HTTP directive gracefully handles the case of cookielib module not
29   being available (i.e. in Python 2.3 and earlier).  The persist_cookies
30   option will be disabled if cookielib cannot be imported.
31
32Eddie-0.36 (04-Dec-2007)
33 - Eddie will now throw an error and exit if a config file cannot be read.
34 - Added persist_cookies option to HTTP directive.  It is used to
35   specify whether to persist server-defined cookies on the client
36   side.  If enabled, Eddie HTTP checks will send back any cookies
37   defined by the server, doing its best to obey expire times.
38   Disabled by default.
39 - Added "server" option to HTTP directive, used to specify the server
40   name to connect to. This will be used instead of the server name
41   from the URL.  The server name from the URL will still be used for
42   the HTTP host header.
43 - SunOS: Changed mem_free and mem_swapfree to return as bytes (although
44   they are rounded up to the nearest kbyte).
45 - Added Solaris SMF method/manifest files to contrib.
46 - Full find & replace of all evil tabs to spaces.
47 - Added some tools to contrib/spread/ to use for testing elvinrrd message
48   passing over Spread. These tools send & receive elvinrrd messages the
49   same way that Eddie and ElvinRRD do.
50 - Added support for Spread messaging as an alternative to Elvin.
51 - Bugfix: make sure body is initialised so MSG parsing doesn't fail if a
52   HTTP check fails before assigning anything to the body.
53 - Bugfix: reason was not defined before actions were called, causing
54   exception in some cases.
55 - Bugfix: make sure status is initialised before generating any alerts.
56 - Changes to the Elvin code to make re-connections more reliable. Use
57   elvin.SyncLoop instead of elvin.ThreadedLoop. Disabled auto-discovery.
58 - Implemented the DiskStatistics data collector for Linux.
59   This uses a new linux_diskio module which has been added to the Eddie
60   distribution.
61 - Correct tcp/udp port bug in SP class: searching for "port=123" was matching
62   to a bound port of 1234 because of use of string.find().
63 - For any var name that contains "_pages_", create a "_bytes_" version.
64 - Added vars: ctr_swap_pages_inactive, ctr_bytes_per_page
65 - New var for "COM" directive: outfields
66 - Added "DBI" directive, for database query checking.
67   Based heavily on the (undocumented) mysql directive.
68 - Solve startup race condition for "checkdependson": initial state cannot be "ok".
69   Create state "unknown", and change "Directive.checkDependencies" to consider
70   all non-"ok" status to be failure (this include "failinitial").
71 - Two important enhancements to Directive.tokenparser:
72   1) When parsing the config file, for every argument in the directive, if its
73    value is a STRING type, then use utils.typeFromString() to set its value,
74    so we get a decent data type for it (int, float, string).  This reduces the
75    typecasting in evaluated expressions.
76   2) When parsing the config file, for every scalar (int, float, string) argument
77    in the directive, put it into the defaultVarDict.  This allows for setting
78    "variables" in the directive, and then using that in the rule.  For example,
79    if the directive (or template) has "maxcpu=30", then the rule can address
80    this like "rule='pcpu > _maxcpu'".
81 - Added "--daemon" command-line option, and supporting "utils.create_child"
82   routine.  Also created brief documentation for all command-line switches.
83 - Changed in logscanning.py: Detect inode number change: if watched file's
84   inode number changes, then read from start of the file.
85 - For the "email" action, convert "\n" strings in the body text into newline
86   characters.  This allows for:
87     email('foo@bar.com', 'host: %(h)s', 'Host: %(h)s\nAge: %(problemage)s')
88   instead of having odd-looking multi-line strings in the config file.
89 - Added "RESCANCONFIGS" config option. Defaults to original behavior.
90   This option allows the disabling of Eddie's constant scanning and reloading
91   if its config files.
92 - Fixed very minor bug where action variables were updated multiple times
93   for no good reason.  Reported by Mark Taylor.
94 - Added "log" action.  Use it to append to a log file, log via syslog, or
95   print on the eddie tty.
96 - Log the ImportError message if a requested data collector module fails
97   to import.  Helps users debug why the module won't load.
98 - Replaced references to whrandom module with random instead. whrandom is
99   being deprecated.
100 - Changed option parsing to use optparse/optik (ticket #5) and added
101   support for specifying an alternate config file from the command line
102   (ticket #6).
103
104Eddie-0.35 (31-Oct-2005)
105 - Linux: Added a dummy diskdevice module for Linux.  The implementation of
106   this is still yet to be done.
107 - Fixed compatibility issue with FILE directive and Python pre 2.3.  Those
108   versions do not have os.path.sep.
109 - Added regfile to LOGSCAN directive, which points to a file containing
110   multiple regular expressions to match against.  Patch submitted by
111   Dougal Scott.
112 - Linux: Fix to handle /proc/stat changes on Linux kernel 2.6.11+.
113 - Enhancements to PRTDIAG directive:
114    * Report details of any hardware failures on U280R.
115    * Added support for U480R hardware.
116   Patch submitted by Dougal Scott.
117 - Improvement to HTTP directive handling if the Python does not support SSL
118   connections.  Patch submitted by Dougal Scott.
119 - Added SMTP directive which provides a simple facility to measure the response
120   time of an SMTP connection to a server.  Submitted by Dougal Scott.
121 - Fixed minor bug where length of time of thread count over threshold was
122   not being shown in minutes when it was expected to be.
123   Patch submitted by Dougal Scott.
124 - System specific Directives are now automatically loaded from a Directives
125   subdirectory beneath the system lib directory if it exists.
126   Example: Linux-specific directive modules will be loaded from:
127     lib/Linux/Directives/
128   Patch submitted by Dougal Scott.
129 - SP directive now supports a bindaddr value of "any".  This will cause the
130   directive to ignore the bind address when testing (ie: compare port only).
131   Patch submitted by Dougal Scott.
132 - Use Python True/False instead of 1/0 for booleans in common directives.
133 - Added 'expectrexp' option to PORT directive.  This allows regular expression
134   matching against the response of a PORT connection.
135   Patch submitted by Dougal Scott.
136 - Added a 'missing' flag to FILE directive which indicates when an existing
137   file has disappeared.
138   Also added a 'lastexists' variable for use in FILE rules.
139 - Improvements to the keepdiff option of the FILE directive.
140   * Keep copies of files being monitored in WORKDIR/FILEprevs/ where
141     WORKDIR is the new option defined in eddie.cf.
142   * If the copy of a file in FILEprevs disappears then set an appropriate
143     message for action output.
144   * If the copy of a file in FILEprevs disappears then make sure another
145     copy is saved.
146   * Use semi-readable unique filenames for the saved copies.
147 - Added get_work_dir() and set_sub_work_dir() functions to utils.py for
148   directive code to call to retrieve the WORKDIR location.  set_sub_work_dir()
149   is used to create a subdirectory within WORKDIR.  It will raise WorkdirError
150   if it fails.  Otherwise it returns the full directory path.
151 - Added config option WORKDIR which defines a location where Eddie can
152   store temporary files.  This can be used by directives that need to
153   save some information or state to the filesystem.  The directory can
154   be safely removed when Eddie is not running.  Eddie does not clean
155   up the directory itself (it may clean up some files before shutting
156   down).  The whole directory tree will be created on startup if it
157   doesn't already exist.  Eddie may create subdirectories within this
158   WORKDIR directory.  Example:
159       WORKDIR="/var/tmp/eddieworkdir"
160 - Win32: Catch an exception that is randomly generatede by
161   win32pdh.GetFormattedCounterValue() sometimes. The returned error is
162   unhelpful,
163   (-2147481640, 'GetFormattedCounterValue', 'No error message is available')
164   so just return None values instead of letting the thread die.
165 - Added capability for FILE directive to keep diffs of changes to a file.
166   The diffs can then be sent in an email when a change is detected.
167   New FILE arguments:
168   keepdiff={true|false}
169    - flag whether to keep a copy of the file to produce diffs
170   context_lines=<integer>
171    - how many context lines to show around the changed lines
172   difftype={context|unified|full}
173    - which diff method to use (see Python difflib module for more information)
174 - Added README.win32.txt for Win32 platform install information.
175 - Added rules/win32_sample.rules - a sample set of Win32 rules.
176 - Win32 df collector: ignore A: and B: drives when collecting stats.
177   Otherwise Windows prompts for the media to be inserted! (Unless a
178   floppy is in the drive ... yeah right)
179 - Win32: Fix win32perf doctest for systems that have an A: drive.
180 - Win32: Added support for Win32 systems with datacollectors: df,
181   diskdevice, netstat, proc and system.  Most of them use win32perf
182   module which is a wrapper for Mark Hammond's win32all package.
183 - Added doctests for FILE directive.
184 - Fetch hostname from platform.node() if os.uname() is not available.
185   (Fix for Win32 compatibility.)
186 - Added a doctest for timeQueue module.
187 - Fixed bug in timeQueue in Python 2.4+ support where head() call was
188   actually performing a get().
189 - Use platform-independent method (ie: os.path) for constructing config
190   paths, rather than assuming '/' is path separator.  (Fix for Win32
191   compatibility.)
192 - Added support for systems that do not support os.uname() - try to use
193   the platform module instead (ie: Win32).  Check that the system handles
194   each signal before trying to register signal handlers for them (Win32
195   doesn't support some of the signals).
196 - Solaris: Catch some more possible errors when parsing 'ps' output for
197   Solaris.  The %CPU field can be a '-' instead of a decimal number (seems
198   to be that way for zombie processes).
199 - Solaris: Handle parsing netstat output for Solaris 10.
200 - Fixed small bug with eddie_wrapper when EDDIE_ADMIN was not defined.
201 - Big improvements to the Redhat init.d script in the contrib directory,
202   making it much more compatible with all new versions of Redhat Linux.
203 - Added chkconfig lines to sample init.d script for Redhat Linux.
204 - Linux: Detecting interpreters in Linux process lists was broken.
205 - Linux: added support for new netstat formats in newer kernels.
206 - Linux: Get VM statistics from /proc/vmstat (on newer kernels).
207 - Added support for Python 2.4 Queue class, which Eddie's timeQueue class is
208   derived from.  The implementation of Queue changed slightly in Python 2.4.
209 - Log the version of Python in use at startup, along with systype.
210 - Added optional definition of EDDIE_ADMIN environment variable in the rc
211   startup scripts to receive Eddie restart/exception notifications from
212   eddie_wrapper.
213 - Eddie now prints no output to stdout by default.  Any global exceptions
214   are printed to stderr on exiting.
215 - eddie_wrapper improvements: eddie output on exit is only emailed to
216   $EDDIE_ADMIN if the Eddie return-code is non-zero.  By default no
217   $EDDIE_ADMIN is set (so no email is sent by default) and $EDDIE_ADMIN
218   can now be defined outside the eddie_wrapper script (ie: in a startup
219   script).
220 - Bugfix: console now shows groups that match special hostnames, those that
221   contain '.' or '-' characters.  A shortcut hack that will be replaced in
222   the future.
223 - FreeBSD: Added fetching of more system counters from '/sbin/sysctl -a'.
224 - FreeBSD: process list parsing was broken.
225 - FreeBSD: proc module needed to import sys so that exceptions could
226   be logged.
227 - Added a bit of a hack (sorry) which allows hostnames containing '-' to be
228   used as group names.  The '-' must be replaced with '_' for the match to
229   work.  This is because group names in the config cannot contain characters
230   like '-'.  This will be resolved in the future when proper matching options
231   are implemented fully.
232 - Solaris: Better handling of Solaris process date/time parsing errors.
233   Patch submitted by Dougal Scott.
234 - Solaris: PRTDIAG directive: added support for Sun Blade servers
235   (SUNW,Serverblade1).  Patch submitted by Dougal Scott.
236 - When sending email by the SMTP method and multiple SMTP servers are
237   available, only log failure if all SMTP servers are unavailable to
238   send the message.  Patch submitted by Dougal Scott.
239 - FreeBSD: Added collecting swap usage stats from '/usr/sbin/pstat -sk'.
240 - Bugfix: Elvin ElvinConnectMaxRetries exceptions were not being caught
241   properly.
242 - Solaris: SunOS df data collector would fail when a CD was inserted, as
243   total files is reported as -1.  Patch submitted by Dougal Scott.
244 - FreeBSD raises a socket exception ('Host is down') when a host is
245   unreachable, which can be safely ignored by the ping code.
246 - Improved the sample config for N COMMONFIXED.
247 - FreedBSD: Added support for FreeBSD system, proc, netstat, df modules.
248 - A quick fix to the config parser which means that Eddie will run on systems
249   that do not yet have system-specific modules.  Non system-specific
250   directives will still work on these systems, such as all the network
251   directives (PING, SNMP, etc) and others like FILE.
252 - Solaris: Fixed DataFailure exception when kstat command cannot be found.
253 - Catch an exception properly in FS directive when filesystem was not
254   found.
255 - Fixed fstpl directive in common.rules example file.
256 - Modified eddie_wrapper to use a Python call to fetch the current time
257   rather than relying on GNU date.  This has improved compatability with
258   more types of systems, as it can be assumed that Python will be available
259   to run EDDIE !
260 - Handle Elvin connection problems more gracefully, backing off before
261   retrying.
262 - Disabled counting of file descriptors in use, which is only needed for
263   debugging on rare occasions.
264 - Bugfix in HTTP when trying to determine error string for some types of
265   exceptions.
266 - Improved PING multi-threaded reliability on platforms that were causing
267   problems because they simply used the current pid as the icmp_id.
268   On platforms where all threads share the same process id this was causing
269   unreliable ping results as the wrong threads would accept the wrong icmp
270   replies.  It now uses the current thread object's memory address for the
271   icmp_id to make them as unique as possible and avoid such confusion.
272 - New directive: TAPE - functions almost exactly like the DISK directive
273   but fetches stats from the TapeStatistics class from the diskdevice
274   module (which is currently only available for Solaris).
275   Example:
276    TAPE st52_thruput:
277        device='st52'
278        scanperiod='5m'
279        rule='1'        # always perform action
280        action='elvinrrd("tape-%(h)s_%(device)s", "rbytes=%(nread)s", "wbytes=%(nwritten)s")'
281 - New directive, DISK.  This uses the new DiskStatistics data collector from
282   a diskdevice module (available for Solaris-only so far) to enable rules
283   to be created using disk device activity stats.
284   Example: a directive which collects bytes read/written to the disk device
285   md20 and sends these counters to elvinrrd
286    DISK md20_thruput:
287        device='md20'
288        scanperiod='5m'
289        rule='1'        # always perform action
290        action='elvinrrd("disk-%(h)s_%(device)s", "rbytes=%(nread)s", "wbytes=%(nwritten)s")'
291 - Solaris: added a new Data Collector, DiskStatistics, in module diskdevice.py
292   (for Solaris only so far).  On Solaris this collects disk activity statistics
293   from a call to kstat, ie, '/usr/bin/kstat -p -c disk'.  All stats generated
294   by that command are collected for each disk and made available to directives.
295 - Solaris: enhanced the network interface statistics collection to fetch
296   more detailed stats from 'netstat -k' for each physical interface.
297   An example of the statistics now available for an interface (hme0 on 5.7)
298   are:
299     ipackets 65360226 ierrors 25 opackets 77502512 oerrors 0 collisions 0
300     defer 0 framing 0 crc 0 sqe 0 code_violations 0 len_errors 0
301     ifspeed 100 buff 0 oflo 0 uflo 0 missed 25 tx_late_collisions 0
302     retry_error 0 first_collisions 0 nocarrier 0 inits 7 nocanput 440
303     allocbfail 0 runt 0 jabber 0 babble 0 tmd_error 0 tx_late_error 0
304     rx_late_error 0 slv_parity_error 0 tx_parity_error 0 rx_parity_error 0
305     slv_error_ack 0 tx_error_ack 0 rx_error_ack 0 tx_tag_error 0
306     rx_tag_error 0 eop_error 0 no_tmds 0 no_tbufs 0 no_rbufs 0
307     rx_late_collisions 0 rbytes 1726897560 obytes 834302609 multircv 7535 multixmt 0
308     brdcstrcv 248816 brdcstxmt 1667 norcvbuf 440 noxmtbuf 0 phy_failures 0
309   as well as info from 'netstat -in' such as mtu, network, etc.
310 - Solaris: now collects more detailed filesystem information in SunOS/df.py,
311   including inode usage, filesystem type, flags, and blocks as well as kBytes
312   used.  The full list of variables now available to directives is:
313     fsname  - filesystem name (string)
314     mountpt - mount point (string)
315     size    - size of filesystem in kBytes (int)
316     used    - kBytes used (int)
317     avail   - kBytes free (int)
318     pctused - percentage of filesystem used (float)
319     totalblocks - total amount of physical blocks (512 Bytes/block) (int)
320     usedblocks - number of physical blocks used (int)
321     availblocks - number of physical blocks available for unprivileged users (int)
322     freeblocks - number of physical blocks free (int)
323     blocksize - filesystem (logical) block size (int)
324     fragsize - filesystem fragmentation size (int)
325     totalinodes - total inodes on filesystem (int)
326     usedinodes - number of inodes used (int)
327     availinodes - number of inodes left available (int)
328     pctinodes - percentage of inodes used (float)
329     filesysid - filesystem id (int)
330     fstype - type of filesystem (string)
331     flag - filesystem flags (string)
332     filelen - max filename length (int)
333   Thanks to Dougal Scott for submitting this patch.
334 - When matching hostnames to group names, ignore any domain parts of the
335   hostname it is fully-qualified.  Group names cannot contain
336   non-alphanumeric characters, so will only match the host part of a FQDN.
337 - Bugfix: clear checkdependson if it is assigned an empty string.
338 - Solaris: improvement to uptime/loadavg stats collection where it is
339   possible for the "day(s)" section of /usr/bin/uptime output to be
340   missing (usually if wtmpx rotated more often than the system boot,
341   thus losing the last 'reboot' entry) so SunOS/system.py now handles
342   this exceptional case.
343
344Eddie-0.34 (13-Sep-2004)
345 - OpenBSD: collect in/out byte counters for network interfaces, which
346   requires an extra netstat call.
347 - OpenBSD: added drops counter to network interface stats.
348 - OpenBSD: fixed some bugs preventing network interface statistics collection
349   from working properly.
350 - Improved handling of exceptions when counting file descriptors in use.
351   Instead of raising a global exception (and causing EDDIE to die) just log
352   the exception and carry on.
353 - Perform global housekeeping duties more often.  Now they are every
354   1 minute instead of every 10 minutes.  This means that changes to
355   config and rules files will be picked up much faster.
356 - Added pysnmp module to Extra dir, which EDDIE uses for making SNMP queries.
357 - Extra 3rd-party modules are now being distributed with EDDIE.  They will
358   live in lib/common/Extra/ and are provided to make installation simpler
359   for commonly-used modules.
360 - HTTP: Make sure 'ip' message variable is initialized in HTTP directives.
361 - HTTP: Some HTTP response exceptions were not being caught properly.
362 - HTTP: Some socket.timeout checks weren't checking for the correct version
363   of Python (which was causing AttributeError exceptions).
364 - HTTP: Changed the logging of response body read() exceptions which were not
365   working for some types of exceptions.
366 - Made eddie_wrapper smarter about finding a date or gdate command to use.
367 - Darwin: Fixed a bug parsing vmstat statistics.  These counters were
368   being truncated (and hence wrong) before.
369 - Darwin: Better handling of parsing errors in the proc data collector.
370 - The COM directive now shares the utils.systemcall_semaphore semaphore
371   rather than relying on its own.  This prevents conflicts between any
372   threads that need to perform a system() (or os.popen() or
373   commands.getstatusoutput()) simultaneously.
374   Thanks to Denis Menshikov for verifying this issue.
375 - Bugfix for SP directive determining the right protocol (Dougal Scott).
376 - Bugfix for a problem that occasionally the get TCPtable returns no entries
377   for no obvious reason. This means that all the SP style checks would
378   start complaining that no one is listening (Dougal Scott).
379 - If ELVINURL and ELVINSCOPE are both undefined in eddie.cf then disable
380   Elvin functionality.
381 - Update to MYSQL directive adding "result#" variable (Dougal Scott).
382 - Converted mysql.py from DOS line endings to UNIX.
383 - Fixed 'daemon' call in contrib init script so it works properly on newer
384   versions of Redhat.
385 - Added new exception DataFailure.
386   Changed exceptions to be subclasses of Exception.
387   Catch DataFailure exceptions from collectData().  These are raised if the
388   Data Collector encounters a major problem collecting the data.
389 - Added support for Redhat Enterprise Linux (or perhaps newer kernels 2.4.21+)
390   which has extra stats added to the cpu fields in /proc/stat.  The cpu counters
391   now available with these kernels are:
392    ctr_cpu_user
393    ctr_cpu_nice
394    ctr_cpu_system
395    ctr_cpu_idle
396    ctr_cpu_iowait
397    ctr_cpu_hardirq
398    ctr_cpu_softirq
399
400Eddie-0.33 (15-Jul-2004)
401 - Handle socket timeout exceptions properly when HTTP response read() fails.
402 - Handle socket.settimeout() not being available on Python pre-2.3 versions.
403 - A new HTTP rule/action variable 'timedout' has been added which will be set
404   to 1 if a socket timeout exception has occurred, otherwise it will be 0.
405 - Added HTTP directive option 'request_timeout' which specifies how long a
406   HTTP(S) connection should wait for a response before timing out with an
407   error.  This makes use of a new Python 2.3 feature where socket timeouts
408   can be configured, hence this option is only available when Eddie is
409   running on Python 2.3+.
410 - Better defaults for SENDMAIL and ELVIN settings in sample eddie.cf.
411 - Added better logging of HTTP directive actions.
412 - Enhancements to HTTP directive:
413   Supports URLs with non-standard ports, eg: http://localhost:8080/
414   Added finer grained timing of four parts of the HTTP connection:
415     time_resolve  - elapsed time to resolve hostname to IP
416     time_connect  - elapsed time to connect to server
417     time_request  - elapsed time to send HTTP/S request to server
418     time_response - elapsed time to retrieve the server response (and close connection)
419     time          - elapsed total time (sum of above)
420 - Added system-specific sample rules for Linux & Solaris.
421 - Added testing ruleset for OpenBSD in development/testing/.
422 - Added initial OpenBSD support, thanks to John McInnes.
423 - DataCollect now logs what module is being requested for import.
424 - Fixed act2ok bug in FILE test.
425 - Remove accidental accented character from nice() comments.
426   It was causing a DeprecationWarning in Python 2.3.3+.
427 - Created a full directive test suite for Darwin (OS X) to provide standard
428   testing of all possible directives (or as many as possible).
429   These live in development/testing/.
430 _ PING: PING directive was logging pktloss as decimal when it should have been
431   a percentage.
432 - SP: Local address IP for SP directives (using netstat data-collector) can now
433   be specified as '*' or '0.0.0.0' for Solaris.  '*' is automatically
434   converted to '0.0.0.0' for consistency.
435 - First version of OS-specific modules ported to Mac OS X (Darwin).
436   Tested on OS X 10.3.3 (Darwin 7.3.0).  Needs plenty more testing.
437 - HTTP: Initialize HTTP directive exception data so variable substitution in
438   messages doesn't fail.
439 - Added new directive argument: checktime
440   Used to restrict directive execution to specified times.  The value
441   is a Python expression which can use various variables representing
442   the current time and day:
443       day ('mon', 'tue', etc); time (HHMM); hour (0-23); minute (0-59); second (0-59).
444   And for shorthands, the fixed lists:
445       weekdays ('mon' - 'fri'), weekend ('sat', 'sun').
446   Examples:
447       checktime='day=="mon" or day=="tue"'
448       checktime='day in weekdays and hour>18'
449 - Only perform act2ok action(s) if some actions were already called.
450   In cases where the check fails but actiondependson causes actions to
451   be skipped, we don't need the act2ok actions to be called.
452 - Added MYSQL directive submitted by Dougal Scott.
453 - PING: Fixed a socket exception for gethostbyname failures.
454 - Added option to disable a directive.  Specify 'disabled=1' in a directive
455   to force it to be disabled.
456 - SNMP directive now supports 64-bit counters split into high/low OIDs.  Specify
457   these as "OIDhigh:OIDlow".
458   Example:
459     oid='1.3.6.1.2.1.2.2.1.10.2:1.3.6.1.2.1.2.2.1.10.3'
460   Where the first OID is the High 32 bits and the second OID is the lower 32 bits.
461 - Added an FS template, fstpl, to sample common.rules.
462
463Eddie-0.32 (21-Apr-2003)
464 - Added an exception handler for httplib read() where it can fail in
465   some circumstances.
466 - Fixed HTTP timing so that the whole HTTP session was timed, not just the
467   connect part.  This was mis-leading before.
468 - If no output from COM directive, set outfield1 anyway so rule
469   strings don't break.  Suggested by Arcady Genkin.
470 - Changed some sample rules to use ALERT_EMAIL alias rather than "alert"
471   fixed email address.  Thanks to Zac Stevens <zts@itga.com.au> for
472   pointing them out.
473 - Added restart option to redhat init.d script in contrib.
474 - Added new directive parameter: actionmaxcalls - defines the maxmimum number
475   of times actions will be called for a particular failure.
476 - Minor bugfix: sendmail_smtp() was returning wrong return codes; successful
477   posts were showing as failures, etc.
478 - Added new directive parameter: excludehosts
479   Directive will be skipped on any hosts specified by excludehosts.
480   Specified as a string containing a comma-separated list of hostnames.
481 - If groups of the same name are defined, merge them together rather than
482   throwing an error.  This allows for more custom rule configurations.
483   Requested by Arcady Genkin <agenkin@cdf.toronto.edu>
484
485Eddie-0.31 (11-Dec-2002)
486 - Increased Linux system counters from int to long.
487 - Fixed bug with isfile/isdir/etc shorthands not working properly.
488 - Console displays "<directive not ready>" for directives which have not
489   yet been initialised, rather than throwing KeyError exception.
490 - Added option to send emails via SMTP servers, rather than relying on
491   a local sendmail binary.  Either option can now be used.
492   Set SMTP_SERVERS in config to use SMTP server option.  This option
493   is now the default, and server defaults to 'localhost'.
494   Based on a submission by Dougal Scott <dwagon@connect.com.au>
495 - Fixed FILE example rule when performing cron test.
496   Noted by Dougal Scott <dwagon@connect.com.au>.
497 - Convert the weird time format that Solaris ps returns for etime and time
498   into plain seconds, which is a lot more useful for rules rather than
499   checking lengths or doing a integer conversion of a subslice of the
500   result and then a comparison based on that.
501   Patched by Dougal Scott <dwagon@connect.com.au>.
502 - Improved error output when parsing rules.
503 - Fixed bug when using Python pre-2.2 versions.
504 - Added some more sample directives.
505 - Added support for remembering historical data in directives.  Rules can
506   reference data from previous samples.
507 - Changed actionperiod slightly, so first actionperiod defaults to scanperiod,
508   then actionperiod expression is used thereafter.
509 - Shift sticky and type bits of mode across, right justified.
510 - Improved handling of tokenization errors.
511 - Directive is cancelled (not re-queued) if there are too many
512   SNMP query failures (usually host not responding or some other
513   network or transport failure).
514 - Added shorthand booleans to FILE directive for checking file types in rules:
515     issocket
516     issymlink
517     isfile
518     isblockdevice
519     isdir
520     ischardevice
521     isfifo
522 - Updated docs with version 0.30 changes (forgot to do this at release time,
523   oops).
524 - Improved handling of sockets errors for console.
525 - Fixed issue with templates not being handled before rest of directive arguments.
526 - Added perm, sticky and type rule variables to the FILE directive.  They are
527   shorthands for the permissions, sticky/setuid/setgid and file type bits
528   of a file's mode.
529 - Improved config syntax error handling of bad directive names.
530 - Implemented check and action dependency definitions.  Two new directive
531   options are: actiondependson and checkdependson.  These can be set to a
532   string containing a list of directives (comma-separated) that this directive
533   is dependent on.  If any of the dependent directives has failed when this
534   directive comes to perform its check or action (depending on which option
535   was used) then that check or action will be skipped.
536 - Added new directive option actionperiod.  This is a string containing an
537   expression which, when evaluated, sets the current period between actions
538   being performed.  This allows for periods between actions to different to
539   the period between checks.  It also allows for the period to be defined by
540   a mathematical expression, so the action period could exponentially increase
541   for example (for actions called during a single failure - the action period
542   will be reset when the failure is fixed).
543 - Enforced unique group and directive names at same group level.
544 - Improved error handling of console connections from bad clients.
545 - Fixed syntax error in sample config.
546 - Changed Linux ctr_interrupts system counter from int to long.
547 - Improved error handling of snmp directive.
548 - Improved handling of group configuration errors.
549 - Finally removed dependency on user-compiled 'top' command for collecting
550   some system stats on Solaris.  All current stats are collected from uptime
551   and vmstat commands now, which should be standard on any Solaris system.
552 - Fetch Linux memory statistics from /proc/meminfo.
553
554Eddie-0.30 (31-May-2002)
555 - Prevented failed calls to 'top' (which will soon be made redundant anyway)
556   from causing system stats collection to fail on Solaris.
557 - Removed fetching WCHAN field from process information on Linux, as this
558   sometimes caused kernel warnings to output or logged.  The field doesn't
559   appear particularly useful.
560 - Changed Linux Context switch counter from an int to a long.
561 - Fixed bug when an error parsing top output locks the system call semaphore
562   on Solaris.
563 - Fixed small bug when parsing string variables and catching exceptions in
564   actions.
565 - Added SENDMAIL config option to specify location of the sendmail binary
566   which EDDIE uses to send all email.
567 - Fixed bug when templates not in same group as directive referencing them.
568 - Changes PID directive argument 'pid' to 'pidfile'.
569 - Better handling of missing pysnmp module in snmp.py.
570 - Added basic SNMP directive based on a module by Dougal Scott
571   <dwagon@connect.com.au>. Requires pysnmp.
572 - Changed Linux 'df' call to 'df -l' which lists all local filesystems.
573   Much friendlier now that there are many alternative filesystems available
574   for Linux.
575 - Added patch by Kees Bakker <kees.bakker@altium.nl> to handle Linux df
576   when it sometimes outputs filesystem information over multiple lines.
577 - Added outfield variables to the COM directive.  The out variable is split
578   by whitespace and the fields are stored in outfieldn variables, e.g.,
579   outfield1, outfield2, etc.  This is to assist rule creation.
580 - Added netsaint action and Elvin notification method, submitted by
581   Dougal Scott <dwagon@connect.com.au>.
582 - Added minor bug-fixes, thanks to pre-release testing by Dougal Scott
583   <dwagon@connect.com.au>.
584 - Linux ctr_cpu_idle variables need to be longs (instead of ints) as the
585   counters are larger than expected.
586 - Created a HTTP directive for performing HTTP (and HTTPS) tests.
587 - Fixed minor bug when displaying config lines that have parsing errors.
588 - Fixed bug in METASTAT directive.
589 - Removed the CRON directive.  It is redundant now that the FILE directive
590   can perform the same test.
591 - Added a new data variable to FILE directive: now, which contains the
592   current time for use in tests with atime/mtime/ctime.
593 - LOGSCAN directive now initalizes data variables on first check, which is
594   only for finding the end of the logfile in question.  This prevents an
595   exception when variables are needed for console strings before second
596   check has run.
597 - Removed optional actionList from being logged by directives also.
598 - Fixed bug with directives trying to log the action list, which is optional
599   now and may not exist.
600 - Moved sample M/MSG definitions to message.rules file.
601 - Added some more sample rules.
602 - Cleaned up sample rules and updated for the latest directive changes.
603   Added some elvinrrd sample rules.
604 - Minor cleanup of base directory path; just found os.path.norm() :)
605 - Fixed small problem with arg parsing handling None values.
606 - Fixed small bug in PORT directive: when a check fails due to a connection
607   timeout, the recv string that wasn't set was still being searched.
608 - Cleaned up config formatting some more so that actions do not need to be
609   inside strings, they can be entered directly in a function call-like
610   format, e.g.,
611     action=ticker("Load on %(h)s is %(out)s", timeout=1)
612   or for a notification object,
613     action=COMMONALERT(commonmsg.fs,1)
614 - Changed PROC argument 'procname' to 'name' and action variable
615   'proc_check_name' to 'name' also, for consistency.
616 - Fixed minor bug with lack of expect argument for PORT directive.
617 - Removed data collection modules which are not required.
618 - Cleaned up all data collection modules and classes to simplify their
619   definitions.  Data collectors should be derived from the DataCollect
620   base class which handles all the data caching and thread-locking.
621 - Changes to parseConfig to simply directive definitions.
622 - Removed old datastore module.
623 - Fixed up console code to handle errors better.
624 - Changed Directive base-class to simplify directive definitions.
625 - New datacollect module which defines DataModules class to handle dynamic
626   importing of architecture-dependent data collection modules, and
627   DataCollect class to provide a base-class for data collectors.
628 - Fixed PING directive to handle un-resolvable addresses.  Also returns ping
629   round-trip-times in seconds as a floating-point number.
630 - Simplified directive definitions by moving most of the common code to
631   Directive base-class.  New directives only need to define __init__,
632   tokenparser and getData methods.
633 - Removed requirement for action variables to be prefixed by directive name.
634   Action variables now have the same name as the rule variables, for
635   consistency.  Changed a few more variable names so they make more sense.
636 - Moved common directive definitions from directive.py to
637   Directives/common.py.
638 - OS-dependent modules are now imported dynamically when needed, not in the
639   main eddie.py anymore.  All data collection modules are handled by the
640   new datacollect module.
641 - Removed old method of determining systype with external script (wasn't used
642   anymore anyway).
643 - Fixed bug with Pinger where it would throw an exception when pinging
644   addresses that did not resolve.
645 - Added extra console argument variables:
646   . lastchecktime - date/time of last directive execution
647   . problemfirstdetect - date/time of current failure first detected (only if
648   state is failed)
649   . problemlastfail - date/time of current failure last detected (only if state
650   is failed)
651 - Cleaned up description of ADMINLEVEL in sample config so it makes more sense.
652 - Added console argument to directives to specify how the console output should
653   look for that directive.  console=None can be specified to hide that directive
654   from console output.
655 - Added support for EXT3 filesystems in Linux filesystem checking code.
656   Patch submitted by Kees Bakker <kees.bakker@altium.nl>
657 - Fixed a minor bug where directives using the eval() function and catching
658   an exception would log a very ugly looking message. This was due to the Python
659   eval() function modifying the user-supplied environment dictionary by adding
660   the __builtin__ dictionary.  When this is printed it looks horrible.
661 - Added 'actelse' directive argument to perform actions if directive state is
662   ok and has not changed with last check.
663   Based on patches submitted by Dougal Scott <dwagon@connect.com.au>
664 - Changed Linux counter variables to have 'ctr_' at start of name, to be
665   consistent with Solaris and HP-UX variables.
666 - Fixed minor bug in HP-UX and Solaris system data collection.
667 - Fixed bug in uptime parsing in HP-UX system.py.
668 - Added a timeout argument to the ticker action.
669 - Re-implemented Elvin connection and notification code using the Elvin
670   ThreadedLoop client and a dedicated Elvin thread which should prevent
671   other threads from blocking on Elvin problems.
672 - Specify full path for solaris 'ps' command to prevent calling wrong version of
673   'ps'.
674 - Started work on a basic Developer's Guide: doc/dev_guide.txt.
675 - Standardised logging levels and tidied up all logging.
676 - Added system performance data collecting from 'uptime' and 'vmstat -s'
677   commands on Solaris.
678 - Improved network interface statistics on Linux by retrieving data from
679   /proc/net/dev.
680
681Eddie-0.29 (non-public release)
682
683Eddie-0.28 (9-Mar-2002)
684 - Cleaned up df code, added data caching and made thread-safe, like other
685   data collectors.
686 - Fixed up eddie_wrapper locating GNU date on Solaris.
687 - Fixed memory-leak in disk-usage code (reported by Dougal Scott
688   <dwagon@connect.com.au>).
689 - Exit with error if all threads are locked (cannot kill threads in current
690   Python implementation).
691   Make eddie_wrapper a little smarter when restarting eddie process.
692 - Added example init.d scripts to contrib for Solaris and Redhat Linux.
693 - Added another vmstat parser to get free memory/swap information for
694   Solaris.
695 - Added a common semaphore for utils.safe_popen()/safe_pclose() and
696   utils.safe_getstatusoutput() to use between them.  It appears that
697   system calls, of any sort - system() calls, popen(), commands module,
698   etc - are not thread-safe and cannot be performed simultaneously by
699   multiple threads at once.  This should prevent such race-conditions as
700   all EDDIE system calls use these functions.
701 - Cleaned up access to the system stats cache so that only one thread at a
702   time will be refreshing the data.
703 - Added some more smarts to eddie_wrapper:
704   - don't start Eddie if one is already running.
705   - don't restart Eddie more than a set number of times in a short period of
706     time (requires GNU date command).
707 - Put semaphore lock around Elvin notify to ensure thread-safe notifications
708   are being sent.  Suspect duplicates were being sent before.
709 - Now logs the current thread name for each log entry for improved debugging.
710 - A lot of cleaning up of system.py for Solaris.
711   Added all counter stats from 'vmstat -s'.
712   Changed gathering of loadavg/uptime stats from '/usr/bin/uptime' rather than
713   '/opt/local/bin/top' - trying to phase out use of 'top'.
714   Improved documentation at top of class, with listing of every stats variable
715   available from the system class.
716 - Added prtdiag parsing for Enterprise class servers (E3500,E6500,etc)
717   for temperature.
718 - Added support for prtdiag for Sun U280R's.
719 - Added list of paths to find metastat command for Solaris METASTAT directive.
720 - Added PRTDIAG directive to provide an interface to the system-specific
721   data provided by prtdiag on Sun machines.
722   Currently only system temperatures are extracted for U450s and U250s.
723 - Added support for VxFS filesystems in df.py for Solaris.
724 - Updated docs to require Python versions 1.6+
725
726Eddie-0.27 (12-Nov-2001)
727 - Put semaphore lock around Elvin connect calls to prevent multiple threads
728   trying to connect at once.
729 - Fixed bug with ELVINURL and ELVINSCOPE config options not being set
730   properly.
731 - Socket errors in Console code are matched with errno error names, rather
732   than assuming the error numbers are the same across platforms.
733   [Bug reported by: Ivar Zarans <iff@alcaron.ee>]
734 - Handle socket errors from PINGs nicely.
735 - Added a reconnect() function to force the elvin connection closed before
736   reconnecting.
737 - Cleaned up eddieElvin4 code, including connecting and auto-reconnecting to
738   Elvin server when connection is lost.
739 - Added better exception handling for "Connection Timed Out" error in PORT
740   directive isalive() function.
741 - Fixed file descriptor leak in PORT directive isalive() function when
742   Connection Refused exception is handled the socket file descriptor was
743   not being closed.
744 - Added more system statistics to the Linux system data collector module.
745   Added most of the stats available from /proc/stat, including:
746     cpu_user      - total cpu in user space
747     cpu_nice      - total cpu in user nice space
748     cpu_system    - total cpu in system space
749     cpu_idle      - total cpu in idle thread
750     cpu%d_user    - per cpu in user space (e.g., cpu0, cpu1, etc)
751     cpu%d_nice    - per cpu in user nice space (e.g., cpu0, cpu1, etc)
752     cpu%d_system  - per cpu in system space (e.g., cpu0, cpu1, etc)
753     cpu%d_idle    - per cpu in idle thread (e.g., cpu0, cpu1, etc)
754     pages_in      - pages read in
755     pages_out     - pages written out
756     pages_swapin  - swap pages read in
757     pages_swapout - swap pages written out
758     interrupts    - number of interrupts received
759     contextswitches - number of context switches
760     boottime      - time of boot (epoch)
761     processes     - number of processes started (I think?)
762   These are now available to directives like SYS.
763 - Cleaned up eddie-adm email headers.
764
765Eddie-0.26 (1-Oct-2001)
766 - Changed elvinrrd() action call arguments slightly.  It is now:
767   elvinrrd( 'rrdkey', 'arg1=val1', 'arg2=val2', ... )
768   The first argument must be the RRD database name to store data into.
769   All arguments following that (one or more) are "variable=data" strings
770   where variable is the name of the variable in the RRD db and data is
771   the data to store in that variable.  RRD dbs can have multiple variables
772   so this allows some or all of them to be updated in one action call.
773 - Wrapped the critical calls in safe_popen(), safe_pclose() and
774   safe_getstatusoutput() in try/except clauses, so that any exceptions are
775   intercepted and the semaphore locks are released (exceptions are then
776   raised again to be handled as normal).  This stops threads being blocked
777   on semaphore acquires which used up the thread pool quickly and was
778   obviously bad.
779 - Added elvinrrd action which is used to send data samples over Elvin to a
780   consumer which stores that data into an RRD database.
781 - Updated elvindb() action and elvindb() Elvin function to support Elvin4.
782   elvindb actions are now working again.
783 - Directive states now transition from "ok" to "failinitial" to "fail".
784   "ok" indicates the directive is fine;
785   "failinitial" indicates the directive is current transitioning to the "fail"
786    state or is waiting on a re-check;
787   "fail" indicates the directive has definitely failed.
788 - Fixed a small bug where a directive performing multiple checks (numchecks>1)
789   which fails one of the first checks but passes a subsequent re-check still
790   performs the act2ok action, which it should not do.
791 - Directive threads are named, for easier debugging.  The name they are given
792   is the ID of the directive they are executing.
793 - Cleaned up ALIAS code to support being passed in action calls properly.
794 - Cleaned up action calling code.  Actions called from action and act2ok now
795   use the same action evaluation function, whether actions are called
796   directly as a function or from Notification objects.  Thus actions can be
797   called directly or Notification objects used from both action and act2ok
798   arguments, and can even be combined.
799 - Added a rule argument to RADIUS directive so rules can be written to test
800   radius auths.  The variable passed is set in the rule environment and is
801   set to either 0 for failed or 1 for passed.
802 - FILE directive now makes the file statistics from the previous check
803   available so rules can compare the current statistics against the previous
804   statistics to see if files or file metadata have changed over time.
805   Variables are same but prepended by 'last', e.g.: rule='md5 != lastmd5'
806 - Fixed bug: Connection not being closed in all cases for PORT isalive()
807   function.
808 - Added new directive, FILE, allowing tests to be made on a file based on
809   standard file statistics (size, mode, ownerships, etc) and md5 hashes.
810 - Update lastfailtime in stateok function so any actions called by act2ok
811   will know the full age of the problem.
812 - Added PING directive to provide network ping checking of hosts.
813 - Added initial HP-UX support.
814 - Fixed bug in PROC R() check.
815
816
817Eddie-0.25 (6-Jul-2001)
818 - Changed where varDict action variables are set in some directives so that
819   they are available for act2ok action calls.
820 - Improved error handling in directive.py
821 - Fixed problem with DF list not refreshing itself properly.
822 - Changed CONSPORT config option to CONSOLE_PORT.
823   I find more verbose to be much user-friendlier than less.
824 - Added two new config settings:
825    EMAIL_FROM='emailaddress'
826    EMAIL_REPLYTO='emailaddress'
827   so the From: and Reply-To: fields in the email action can be set.
828   If these are not set, they default to the current USER for the From: field,
829   and '' for the Reply-To: field.
830 - Cleaned up PORT directive isalive() handling Connection Refused exceptions.
831 - Create a QUICKSTART text document to give the impatient a quick way to
832   get Eddie running.
833 - sockets.py: handle port already in use by exiting and signalling the other
834   non-daemon threads to exit.  If the port is in use the whole program should
835   exit cleanly with an appropriate error message now.
836   Similarly, exit cleanly (and signal other threads to exit) if too many
837   socket errors.
838 - config.py: Improved error handling; if CONSPORT is not a positive integer a
839   ParseFailure is raised.
840 - The console server thread will not be started if CONSPORT=0.  This allows
841   the console feature to be disabled if required.
842 - Main thread will now also exit if please_die Event is set.  This allows
843   other threads to signal that the program should exit.
844 - Added act2ok param - allows you to specify a Notification object
845   to use when Check goes from bad to good
846 - Log accepted connections with remote IP:port, for security or whatever.
847 - directive.py: made directive string representation tidier.
848 - sockets.py: Handle "Interrupted system call" (from CTRL-C) nicely.
849 - Chaged eddie.py - changes include cleaning up the way threads
850   are started and stoped, there is now start_threads() and
851   stop_threads().  I did this so that both the scheduler thread
852   and the console socket thread can be started and stop easily
853   when the config changes.
854 - Added config var CONSPORT - this is the port to listen to
855   console connections on.  The default is 33343.
856 - Added sockets.py - A sockets interface to the current state of
857   all eddie checks, this will be used for a console like interface.
858 - Removed DEFs and replaced by ALIASes which are now used to define string
859   aliases to be substituted during config parsing, or during action argument
860   parsing.  '$' signs are not used anymore, giving a much nicer Python
861   look-and-feel.
862 - Added %(problemage)s %(problemfirstdetect)s to sample MSGs to demonstrate
863   usage.  These are substituted for the age of the current directing being
864   false and the time the first false was detected respectively; or empty
865   strings ("") if the problem age is currently 0.
866 - Added more detailed logging of thread usage, making thread problems easier
867   to track.
868 - Added a utils.safe_getstatusoutput() as a thread-safe wrapper around
869   commands.getstatusoutput().
870   The IPF directive now uses this to avoid deadlocks.
871 - Problem age and First time detected variables are now substitutable values
872   within an email message body, %(problemage)s and %(problemfirstdetect)s,
873   instead of automatically being appended to the bottom of every email.
874   Note, these variables are empty ("") if the problem age is zero.
875 - Changed all os.popen() calls to use the thread-safe utils.safe_popen().
876   This should prevent deadlocks when multiple directives are gathering info.
877 - Added 'negate' option to LOGSCAN - will match lines which do NOT match the
878   regex.
879 - Added formatted exception traceback to safeCheck() logging.
880 - Fixed socket connect() call in pop3.py to support Python 2.1
881 - Email admin logs when exiting due to config parse failure.
882 - Added LOGSCAN examples.
883 - Updated sample rules to reflect new config layout and features.
884 - Log Eddie version and systype.
885   Also log when configuration parsing complete.
886 - Cleaned up pop3.py imports.
887 - Added LOGSCAN directive for monitoring logfiles.
888 - Fixed PROC custom rules setting.
889 - Fixed directives setting their own ID only if none set in config.
890 - parseFailure() logs problem to logfile as well as printing to stdout.
891 - Cleaned up sample eddie.cf and added verbose comments.
892 - Catch any uncaught exceptions around main() so they are logged and displayed
893   nicely, making it easier for the Eddie admin to see and act on them.
894   Hence eddie doesn't have to be run from eddie_wrapper with stderr captured
895   (which didn't really work properly anyway).
896 - Fixed socket connect() call in PORT directive to use tuple as argument
897   rather than two arguments.  This changed in Python-2.1 (but works with
898   older versions).
899 - Removed the old snpp code which wasn't being used.  This should be replaced
900   with updated code.
901 - Elvin config parameters have changed from ELVINHOST and ELVINPORT to
902   ELVINURL and ELVINSCOPE to support Elvin4 properly.
903 - The Elvin tickertape action is now called ticker() [it was just called
904   elvin() before].
905 - Updated Elvin code to support Elvin4 and moved to new file eddieElvin4.py.
906   Elvin3 will no longer be supported.
907 - Replaced any use of old regex module with new re module (using regex causes
908   warnings with Python-2.1).
909 - Tested under Python-2.1.  Had to modify some of the globals to avoid new
910   warnings under 2.1.
911 - Updated system.py to handle 'top' under Solaris 8.
912 - Directive threads are started with safeCheck() which wraps up docheck()
913   in try/except so all un-caught exceptions within that thread will be caught
914   and the thread can exit cleanly.
915 - Cleaned up parsing of 'top' a bit more, so it works better under Solaris 8.
916 - Added support for directive templates.  A directive can be created to be
917   only used as a template for other directives, supplying default settings;
918   as well as standard directives can also be used as templates for other
919   directives.
920   Directive template creation, eg:
921       PROC 'template1':   template=self   scanperiod='5m' checks=2 checkwait=30
922       PROC 'cron':        template='template1'    procname='crond' action="..."
923  special template=self means this directive is a template and not to
924  schedule it.
925  Can use other working directives as templates also.
926  Template should be same directive type as directive using it - but this is
927  not enforced because it shouldn't hurt.... directives ignore any arguments
928  they don't need.
929 - Added support for new Directive arguments:
930    numchecks=<int>
931    checkwait=<time>
932   numchecks specifies how many checks a directive should perform before
933   calling its actions.  By default this will be 1.  Setting this to 2
934   will force 2 checks before actions are called.  It can be set to any
935   positive integer, include 0.  0 is a special case which indicates that
936   this directive will not perform any checks.  This could be used to
937   temporarily disabled a directive, for example.
938   checkwait specifies how long the directive will wait before performing
939   its next re-check if numchecks>1.  Its value is a standard time specification
940   eg: '5' = 5 seconds; '5s' = 5 seconds; '2m' = 2 minutes; '5h' = 5 hours.
941   By default checkwait is 0 which means the next re-check will run instantly.
942   checkwait should normally be set to a meaningful value if numchecks>1.
943 - Added ALIAS definition.  Similar to DEFs but ALIASes are replaced inside
944   action calls, etc.  Whereas DEFs are only translated during config file
945   parsing time.
946   Note: DEFs break the Python-like look&feel of the config file and may
947   disappear in the future if they can be replaced neatly.
948 - Cleaned up logging in config.py.  LOGFILE should be the first option
949   in eddie.cf so logs end up in the right place.
950 - Handle scanperiod argument in directives so scanperiod can be overrided
951   for each directive.
952 - Signals received during a time.sleep() under Linux cause an IOError
953   exception so just catch these and move on.  Main thread should be
954   handling the shutdown cleanly anyway.
955 - Cleaned up directive tokenparsing so base Directive class does as
956   much of the work as possible and user-written directive objects
957   only have to test existance of arguments and setup.
958 - New config format, which is not compatable with old format.
959   All arguments to a directive are now named arguments.
960 - Max number of threads to use can be limited in eddie.cf with the
961   NUMTHREADS variable now.  Should be set > 5 for normal use.
962   If set too low checks will never be allowed to run.
963 - Created Radius auth checking directive.
964 - Added clean exiting code to SIGINT, same as SIGTERM.
965 - Cleaned up exiting on SIGTERM signal.  The scheduler thread is signalled to
966   die and the main thread will wait for the scheduler thread to receive the
967   signal and exit before exiting cleanly itself.  All "worker" threads are
968   ignored and should die of their own accord.
969 - Put semaphores around COM checks which do os.system() calls.
970   Only one COM check will execute at a time.
971 - Made proc.py thread-friendly.
972 - timeQueue is the queueing class derived from Python's Queue class.  It is as
973   thread-friendly as Queue, the major difference being objects are inserted
974   into the queue based on a given time.  Objects with the lowest times are
975   closest to the front of the queue.
976   To support this, objects have to be added along with their time, so a
977   2-tuple must be added, eg: q.put( (obj, time) ).  Similarly q.get()
978   returns the same 2-tuple.
979   An extra public method has been added, over what Queue offers, q.head().
980   This method returns the item (and time) from the front of the queue,
981   exactly as q.get(), but does not remove it from the queue.
982 - To support the new queueing of jobs, all directives must end by submitting
983   themselves back into the queue.  A
984   Config.q.put(self,time.time()+self.scanperiod) will submit itself back
985   into the queue and schedule itself to be run in self.scanperiod seconds.
986   If a directive does not put itself back into the queue it will not be
987   called again (this can be useful if there is some sort of error and the
988   directive should not be called again).
989 - os.popen() appears to cause problems when used by multiple threads at once,
990   so all such calls now use a wrapper, utils.safe_popen() which performs
991   a semaphore lock around os.popen().  utils.safe_pclose() _MUST_ be called
992   after the pipe has been finished with or the semaphore will not be released
993   and all other calls will be blocked forever.
994 - Core of Eddie is now multi-threaded using a scheduler thread to run each
995   check in its own thread.  Thread usage is limited so things don't get out
996   of control.
997   The scheduler tracks jobs with a derivative of Python's Queue class which
998   orders items by time, so that the job to be started soonest will be at the
999   front of the queue.  This will now allow directives to specify their own
1000   scanperiod and execute as often or as little as desired, indepentently of
1001   other directives.
1002   Modified config files are still automatically detected (sometime within a 10
1003   minute period by the "Housecleaning" thread (main process)) which causes the
1004   scheduler to be signalled to exit and then the configs are re-read and a new
1005   scheduler will be started up.
1006
1007
1008Eddie-0.24 (1-Oct-2000)
1009 - Added custom disksuite check to alert if any metadevices require
1010   maintenance.  Skips checking if /usr/opt/SUNWmd/sbin/metastat not found.
1011 - Separate system.py for Solaris 5.8 because of differences.
1012 - Better logging of problem states for debugging.  Problem states track
1013   the current "state" of a problem (ok, failed, etc), and the time when
1014   the problem was first detected.
1015 - Email action now includes the age of the problem and when the problem was
1016   first detected (if not the first time) in the email.
1017 - Added handler for when pop3 connections are failing or not authing.
1018 - Fixed bug with parsing config when indentations are incorrect.
1019   Handles it better now by raising a ParseFailure exception and pointing
1020   out the error line in the config file.
1021 - Changed string variable substitution method from %variable to Python's
1022   format %(variable)s.  This lets us use Python's built-in variable
1023   substitution on strings and makes the implementation much simpler.
1024 - Log parsing failures in parseVars()
1025 - Fixed small bug with pop3 error checking.
1026
1027
1028Eddie-0.23 (19-Jun-2000)
1029 - cleaned up Solaris x86 support (still fairly untested).
1030 - changed Linux system.py to get statistics from /proc rather than
1031   parsing 'top' output.
1032 - added close() method to kstat object and removed kstat_close() call
1033   from kstat object initialization function which was possibly causing
1034   seg faults in solkstatmodule.so.
1035 - elvindb() action now takes optional string argument containing the
1036   column/value pairs to store in the database (via Elvin).
1037 - added POP3TIMING directive for checking and timing pop3 connections
1038   in new pop3 directive module.
1039 - added CRON directive for cron checks in solaris.py directive module.
1040 - added IPF directive for ipfilter tests.
1041 - added support for custom Directive imports from new Directive directory.
1042 - added count of filedescriptors for debugging.
1043 - fixed bug with PROC check which would only perform check on first
1044   process found with the name specified.  It now performs checks on every
1045   process with the specified name.
1046 - added a kstat_close() to fix a file-descriptor leak in solkstatmodule.so.
1047 - added a default class, DataStore, for storage subclasses to use, which
1048   automatically caches data.
1049 - added support for iostat data to STORE directive.
1050 - find OS-specific modules in multiple directories from most specific
1051   to least specific (eg: OS/version/architecture, OS/version, then OS).
1052 - changed auto system type determination to internal code rather than
1053   calling separate 3rd-party 'systype' script.
1054 - added iostat objects for collecting iostat data under Solaris.  Uses
1055   a shared library, solkstatmodule.so, created by the Eddie developers
1056   and included.
1057
1058
1059Eddie-0.22 (15-May-2000)
1060 - added defaults for out and err in COM directive to stop an exception
1061   when the executed command did not write stdout or stderr files.
1062 - fixed SIGALRM so it works properly under Linux.  (Works slightly
1063   differently to Solaris).
1064 - COM did not use the return value of a os.system() call properly.  This
1065   has been fixed as per the wait (2) call.
1066 - email() action takes an optional 3rd argument which is the body of
1067   the message.  Otherwise the 2nd argument is used as both subject and body.
1068 - msg is copied to subject for simple email() call with only subject given.
1069 - changed import for new Elvin.py module.
1070 - now using Python 1.5.2
1071 - fixed process name hash keys.
1072 - added Solaris x86 support.
1073 - added Linux support (tested with RedHat 6).
1074
1075
1076Eddie-0.21 (4-Oct-1999)
1077 - catch timeouts while trying to stat config files and skip the config file
1078   modified checks.
1079 - added '-n' to 'netstat -i' coz resolving interfaces on some hosts were
1080   taking forever.
1081 - fixed Elvin messaging with new ElvinConnection object.
1082 - added Elvin db data-storage consumer daemon.
1083 - separated Solaris 2.5 and 2.7-specific lib areas.
1084 - added caching functionality to data gathering code.
1085 - added functions to return hashes of network information.
1086 - changed eddie-elvin interface to maintain single shared connection to Elvin
1087   server.
1088 - added double-checking for SP directive.
1089 - added STORE directive to enable configuration of data to be sent via
1090   elvindb().
1091 - added elvindb() functionality to send database objects over Elvin to
1092   database consumer.
1093 - added support for Solaris 2.7.
1094 - created sample config files in config.sample.
1095 - fixed quote problem with address specified in SP check.
1096 - PORT directive handles multiple lines sent to destination.
1097 - added estored development to contrib area.
1098 - added creation of system objects for collecting system stats.
1099 - added SYS directive to allow detailed checks to be performed on system
1100   data such as load-average, memory/swap usage, cpu idle %, etc.
1101 - parses email address strings for variable definitions.
1102 - fixed int overflow errors in netstat data collection.
1103 - elvin() will use subject as message if message is blank for Tickertape.
1104 - fixed small bug with actions parsing '%' at end of line.
1105 - added directive 'NET' to allow detailed checks on current network
1106   statistics.
1107 - netstat object now obtains all current network statistics from host
1108   (ie: 'netstat -s') under Solaris.
1109 - added 'IF' directive to enable detailed network interface checks.
1110 - added new rule for process checking to allow complex checks to be performed
1111   on running processes.
1112 - automatically re-load config files if any have changed.
1113 - during process (and pid) checks, if process isn't found, sleep a bit then
1114   double check.
1115 - now only pulls in process info when called, and caches that info for a
1116   set time before fetching it again.
1117 - now pulls in every bit of process info that ps can provide.
1118 - added a wrapper for eddie to capture major exceptions and auto restart.
Note: See TracBrowser for help on using the browser.