| 1 |
Eddie CHANGES |
|---|
| 2 |
(reverse chronological order) |
|---|
| 3 |
|
|---|
| 4 |
Eddie-0.36 (04-Dec-2007) |
|---|
| 5 |
- Eddie will now throw an error and exit if a config file cannot be read. |
|---|
| 6 |
- Added persist_cookies option to HTTP directive. It is used to |
|---|
| 7 |
specify whether to persist server-defined cookies on the client |
|---|
| 8 |
side. If enabled, Eddie HTTP checks will send back any cookies |
|---|
| 9 |
defined by the server, doing its best to obey expire times. |
|---|
| 10 |
Disabled by default. |
|---|
| 11 |
- Added "server" option to HTTP directive, used to specify the server |
|---|
| 12 |
name to connect to. This will be used instead of the server name |
|---|
| 13 |
from the URL. The server name from the URL will still be used for |
|---|
| 14 |
the HTTP host header. |
|---|
| 15 |
- SunOS: Changed mem_free and mem_swapfree to return as bytes (although |
|---|
| 16 |
they are rounded up to the nearest kbyte). |
|---|
| 17 |
- Added Solaris SMF method/manifest files to contrib. |
|---|
| 18 |
- Full find & replace of all evil tabs to spaces. |
|---|
| 19 |
- Added some tools to contrib/spread/ to use for testing elvinrrd message |
|---|
| 20 |
passing over Spread. These tools send & receive elvinrrd messages the |
|---|
| 21 |
same way that Eddie and ElvinRRD do. |
|---|
| 22 |
- Added support for Spread messaging as an alternative to Elvin. |
|---|
| 23 |
- Bugfix: make sure body is initialised so MSG parsing doesn't fail if a |
|---|
| 24 |
HTTP check fails before assigning anything to the body. |
|---|
| 25 |
- Bugfix: reason was not defined before actions were called, causing |
|---|
| 26 |
exception in some cases. |
|---|
| 27 |
- Bugfix: make sure status is initialised before generating any alerts. |
|---|
| 28 |
- Changes to the Elvin code to make re-connections more reliable. Use |
|---|
| 29 |
elvin.SyncLoop instead of elvin.ThreadedLoop. Disabled auto-discovery. |
|---|
| 30 |
- Implemented the DiskStatistics data collector for Linux. |
|---|
| 31 |
This uses a new linux_diskio module which has been added to the Eddie |
|---|
| 32 |
distribution. |
|---|
| 33 |
- Correct tcp/udp port bug in SP class: searching for "port=123" was matching |
|---|
| 34 |
to a bound port of 1234 because of use of string.find(). |
|---|
| 35 |
- For any var name that contains "_pages_", create a "_bytes_" version. |
|---|
| 36 |
- Added vars: ctr_swap_pages_inactive, ctr_bytes_per_page |
|---|
| 37 |
- New var for "COM" directive: outfields |
|---|
| 38 |
- Added "DBI" directive, for database query checking. |
|---|
| 39 |
Based heavily on the (undocumented) mysql directive. |
|---|
| 40 |
- Solve startup race condition for "checkdependson": initial state cannot be "ok". |
|---|
| 41 |
Create state "unknown", and change "Directive.checkDependencies" to consider |
|---|
| 42 |
all non-"ok" status to be failure (this include "failinitial"). |
|---|
| 43 |
- Two important enhancements to Directive.tokenparser: |
|---|
| 44 |
1) When parsing the config file, for every argument in the directive, if its |
|---|
| 45 |
value is a STRING type, then use utils.typeFromString() to set its value, |
|---|
| 46 |
so we get a decent data type for it (int, float, string). This reduces the |
|---|
| 47 |
typecasting in evaluated expressions. |
|---|
| 48 |
2) When parsing the config file, for every scalar (int, float, string) argument |
|---|
| 49 |
in the directive, put it into the defaultVarDict. This allows for setting |
|---|
| 50 |
"variables" in the directive, and then using that in the rule. For example, |
|---|
| 51 |
if the directive (or template) has "maxcpu=30", then the rule can address |
|---|
| 52 |
this like "rule='pcpu > _maxcpu'". |
|---|
| 53 |
- Added "--daemon" command-line option, and supporting "utils.create_child" |
|---|
| 54 |
routine. Also created brief documentation for all command-line switches. |
|---|
| 55 |
- Changed in logscanning.py: Detect inode number change: if watched file's |
|---|
| 56 |
inode number changes, then read from start of the file. |
|---|
| 57 |
- For the "email" action, convert "\n" strings in the body text into newline |
|---|
| 58 |
characters. This allows for: |
|---|
| 59 |
email('foo@bar.com', 'host: %(h)s', 'Host: %(h)s\nAge: %(problemage)s') |
|---|
| 60 |
instead of having odd-looking multi-line strings in the config file. |
|---|
| 61 |
- Added "RESCANCONFIGS" config option. Defaults to original behavior. |
|---|
| 62 |
This option allows the disabling of Eddie's constant scanning and reloading |
|---|
| 63 |
if its config files. |
|---|
| 64 |
- Fixed very minor bug where action variables were updated multiple times |
|---|
| 65 |
for no good reason. Reported by Mark Taylor. |
|---|
| 66 |
- Added "log" action. Use it to append to a log file, log via syslog, or |
|---|
| 67 |
print on the eddie tty. |
|---|
| 68 |
- Log the ImportError message if a requested data collector module fails |
|---|
| 69 |
to import. Helps users debug why the module won't load. |
|---|
| 70 |
- Replaced references to whrandom module with random instead. whrandom is |
|---|
| 71 |
being deprecated. |
|---|
| 72 |
- Changed option parsing to use optparse/optik (ticket #5) and added |
|---|
| 73 |
support for specifying an alternate config file from the command line |
|---|
| 74 |
(ticket #6). |
|---|
| 75 |
|
|---|
| 76 |
Eddie-0.35 (31-Oct-2005) |
|---|
| 77 |
- Linux: Added a dummy diskdevice module for Linux. The implementation of |
|---|
| 78 |
this is still yet to be done. |
|---|
| 79 |
- Fixed compatibility issue with FILE directive and Python pre 2.3. Those |
|---|
| 80 |
versions do not have os.path.sep. |
|---|
| 81 |
- Added regfile to LOGSCAN directive, which points to a file containing |
|---|
| 82 |
multiple regular expressions to match against. Patch submitted by |
|---|
| 83 |
Dougal Scott. |
|---|
| 84 |
- Linux: Fix to handle /proc/stat changes on Linux kernel 2.6.11+. |
|---|
| 85 |
- Enhancements to PRTDIAG directive: |
|---|
| 86 |
* Report details of any hardware failures on U280R. |
|---|
| 87 |
* Added support for U480R hardware. |
|---|
| 88 |
Patch submitted by Dougal Scott. |
|---|
| 89 |
- Improvement to HTTP directive handling if the Python does not support SSL |
|---|
| 90 |
connections. Patch submitted by Dougal Scott. |
|---|
| 91 |
- Added SMTP directive which provides a simple facility to measure the response |
|---|
| 92 |
time of an SMTP connection to a server. Submitted by Dougal Scott. |
|---|
| 93 |
- Fixed minor bug where length of time of thread count over threshold was |
|---|
| 94 |
not being shown in minutes when it was expected to be. |
|---|
| 95 |
Patch submitted by Dougal Scott. |
|---|
| 96 |
- System specific Directives are now automatically loaded from a Directives |
|---|
| 97 |
subdirectory beneath the system lib directory if it exists. |
|---|
| 98 |
Example: Linux-specific directive modules will be loaded from: |
|---|
| 99 |
lib/Linux/Directives/ |
|---|
| 100 |
Patch submitted by Dougal Scott. |
|---|
| 101 |
- SP directive now supports a bindaddr value of "any". This will cause the |
|---|
| 102 |
directive to ignore the bind address when testing (ie: compare port only). |
|---|
| 103 |
Patch submitted by Dougal Scott. |
|---|
| 104 |
- Use Python True/False instead of 1/0 for booleans in common directives. |
|---|
| 105 |
- Added 'expectrexp' option to PORT directive. This allows regular expression |
|---|
| 106 |
matching against the response of a PORT connection. |
|---|
| 107 |
Patch submitted by Dougal Scott. |
|---|
| 108 |
- Added a 'missing' flag to FILE directive which indicates when an existing |
|---|
| 109 |
file has disappeared. |
|---|
| 110 |
Also added a 'lastexists' variable for use in FILE rules. |
|---|
| 111 |
- Improvements to the keepdiff option of the FILE directive. |
|---|
| 112 |
* Keep copies of files being monitored in WORKDIR/FILEprevs/ where |
|---|
| 113 |
WORKDIR is the new option defined in eddie.cf. |
|---|
| 114 |
* If the copy of a file in FILEprevs disappears then set an appropriate |
|---|
| 115 |
message for action output. |
|---|
| 116 |
* If the copy of a file in FILEprevs disappears then make sure another |
|---|
| 117 |
copy is saved. |
|---|
| 118 |
* Use semi-readable unique filenames for the saved copies. |
|---|
| 119 |
- Added get_work_dir() and set_sub_work_dir() functions to utils.py for |
|---|
| 120 |
directive code to call to retrieve the WORKDIR location. set_sub_work_dir() |
|---|
| 121 |
is used to create a subdirectory within WORKDIR. It will raise WorkdirError |
|---|
| 122 |
if it fails. Otherwise it returns the full directory path. |
|---|
| 123 |
- Added config option WORKDIR which defines a location where Eddie can |
|---|
| 124 |
store temporary files. This can be used by directives that need to |
|---|
| 125 |
save some information or state to the filesystem. The directory can |
|---|
| 126 |
be safely removed when Eddie is not running. Eddie does not clean |
|---|
| 127 |
up the directory itself (it may clean up some files before shutting |
|---|
| 128 |
down). The whole directory tree will be created on startup if it |
|---|
| 129 |
doesn't already exist. Eddie may create subdirectories within this |
|---|
| 130 |
WORKDIR directory. Example: |
|---|
| 131 |
WORKDIR="/var/tmp/eddieworkdir" |
|---|
| 132 |
- Win32: Catch an exception that is randomly generatede by |
|---|
| 133 |
win32pdh.GetFormattedCounterValue() sometimes. The returned error is |
|---|
| 134 |
unhelpful, |
|---|
| 135 |
(-2147481640, 'GetFormattedCounterValue', 'No error message is available') |
|---|
| 136 |
so just return None values instead of letting the thread die. |
|---|
| 137 |
- Added capability for FILE directive to keep diffs of changes to a file. |
|---|
| 138 |
The diffs can then be sent in an email when a change is detected. |
|---|
| 139 |
New FILE arguments: |
|---|
| 140 |
keepdiff={true|false} |
|---|
| 141 |
- flag whether to keep a copy of the file to produce diffs |
|---|
| 142 |
context_lines=<integer> |
|---|
| 143 |
- how many context lines to show around the changed lines |
|---|
| 144 |
difftype={context|unified|full} |
|---|
| 145 |
- which diff method to use (see Python difflib module for more information) |
|---|
| 146 |
- Added README.win32.txt for Win32 platform install information. |
|---|
| 147 |
- Added rules/win32_sample.rules - a sample set of Win32 rules. |
|---|
| 148 |
- Win32 df collector: ignore A: and B: drives when collecting stats. |
|---|
| 149 |
Otherwise Windows prompts for the media to be inserted! (Unless a |
|---|
| 150 |
floppy is in the drive ... yeah right) |
|---|
| 151 |
- Win32: Fix win32perf doctest for systems that have an A: drive. |
|---|
| 152 |
- Win32: Added support for Win32 systems with datacollectors: df, |
|---|
| 153 |
diskdevice, netstat, proc and system. Most of them use win32perf |
|---|
| 154 |
module which is a wrapper for Mark Hammond's win32all package. |
|---|
| 155 |
- Added doctests for FILE directive. |
|---|
| 156 |
- Fetch hostname from platform.node() if os.uname() is not available. |
|---|
| 157 |
(Fix for Win32 compatibility.) |
|---|
| 158 |
- Added a doctest for timeQueue module. |
|---|
| 159 |
- Fixed bug in timeQueue in Python 2.4+ support where head() call was |
|---|
| 160 |
actually performing a get(). |
|---|
| 161 |
- Use platform-independent method (ie: os.path) for constructing config |
|---|
| 162 |
paths, rather than assuming '/' is path separator. (Fix for Win32 |
|---|
| 163 |
compatibility.) |
|---|
| 164 |
- Added support for systems that do not support os.uname() - try to use |
|---|
| 165 |
the platform module instead (ie: Win32). Check that the system handles |
|---|
| 166 |
each signal before trying to register signal handlers for them (Win32 |
|---|
| 167 |
doesn't support some of the signals). |
|---|
| 168 |
- Solaris: Catch some more possible errors when parsing 'ps' output for |
|---|
| 169 |
Solaris. The %CPU field can be a '-' instead of a decimal number (seems |
|---|
| 170 |
to be that way for zombie processes). |
|---|
| 171 |
- Solaris: Handle parsing netstat output for Solaris 10. |
|---|
| 172 |
- Fixed small bug with eddie_wrapper when EDDIE_ADMIN was not defined. |
|---|
| 173 |
- Big improvements to the Redhat init.d script in the contrib directory, |
|---|
| 174 |
making it much more compatible with all new versions of Redhat Linux. |
|---|
| 175 |
- Added chkconfig lines to sample init.d script for Redhat Linux. |
|---|
| 176 |
- Linux: Detecting interpreters in Linux process lists was broken. |
|---|
| 177 |
- Linux: added support for new netstat formats in newer kernels. |
|---|
| 178 |
- Linux: Get VM statistics from /proc/vmstat (on newer kernels). |
|---|
| 179 |
- Added support for Python 2.4 Queue class, which Eddie's timeQueue class is |
|---|
| 180 |
derived from. The implementation of Queue changed slightly in Python 2.4. |
|---|
| 181 |
- Log the version of Python in use at startup, along with systype. |
|---|
| 182 |
- Added optional definition of EDDIE_ADMIN environment variable in the rc |
|---|
| 183 |
startup scripts to receive Eddie restart/exception notifications from |
|---|
| 184 |
eddie_wrapper. |
|---|
| 185 |
- Eddie now prints no output to stdout by default. Any global exceptions |
|---|
| 186 |
are printed to stderr on exiting. |
|---|
| 187 |
- eddie_wrapper improvements: eddie output on exit is only emailed to |
|---|
| 188 |
$EDDIE_ADMIN if the Eddie return-code is non-zero. By default no |
|---|
| 189 |
$EDDIE_ADMIN is set (so no email is sent by default) and $EDDIE_ADMIN |
|---|
| 190 |
can now be defined outside the eddie_wrapper script (ie: in a startup |
|---|
| 191 |
script). |
|---|
| 192 |
- Bugfix: console now shows groups that match special hostnames, those that |
|---|
| 193 |
contain '.' or '-' characters. A shortcut hack that will be replaced in |
|---|
| 194 |
the future. |
|---|
| 195 |
- FreeBSD: Added fetching of more system counters from '/sbin/sysctl -a'. |
|---|
| 196 |
- FreeBSD: process list parsing was broken. |
|---|
| 197 |
- FreeBSD: proc module needed to import sys so that exceptions could |
|---|
| 198 |
be logged. |
|---|
| 199 |
- Added a bit of a hack (sorry) which allows hostnames containing '-' to be |
|---|
| 200 |
used as group names. The '-' must be replaced with '_' for the match to |
|---|
| 201 |
work. This is because group names in the config cannot contain characters |
|---|
| 202 |
like '-'. This will be resolved in the future when proper matching options |
|---|
| 203 |
are implemented fully. |
|---|
| 204 |
- Solaris: Better handling of Solaris process date/time parsing errors. |
|---|
| 205 |
Patch submitted by Dougal Scott. |
|---|
| 206 |
- Solaris: PRTDIAG directive: added support for Sun Blade servers |
|---|
| 207 |
(SUNW,Serverblade1). Patch submitted by Dougal Scott. |
|---|
| 208 |
- When sending email by the SMTP method and multiple SMTP servers are |
|---|
| 209 |
available, only log failure if all SMTP servers are unavailable to |
|---|
| 210 |
send the message. Patch submitted by Dougal Scott. |
|---|
| 211 |
- FreeBSD: Added collecting swap usage stats from '/usr/sbin/pstat -sk'. |
|---|
| 212 |
- Bugfix: Elvin ElvinConnectMaxRetries exceptions were not being caught |
|---|
| 213 |
properly. |
|---|
| 214 |
- Solaris: SunOS df data collector would fail when a CD was inserted, as |
|---|
| 215 |
total files is reported as -1. Patch submitted by Dougal Scott. |
|---|
| 216 |
- FreeBSD raises a socket exception ('Host is down') when a host is |
|---|
| 217 |
unreachable, which can be safely ignored by the ping code. |
|---|
| 218 |
- Improved the sample config for N COMMONFIXED. |
|---|
| 219 |
- FreedBSD: Added support for FreeBSD system, proc, netstat, df modules. |
|---|
| 220 |
- A quick fix to the config parser which means that Eddie will run on systems |
|---|
| 221 |
that do not yet have system-specific modules. Non system-specific |
|---|
| 222 |
directives will still work on these systems, such as all the network |
|---|
| 223 |
directives (PING, SNMP, etc) and others like FILE. |
|---|
| 224 |
- Solaris: Fixed DataFailure exception when kstat command cannot be found. |
|---|
| 225 |
- Catch an exception properly in FS directive when filesystem was not |
|---|
| 226 |
found. |
|---|
| 227 |
- Fixed fstpl directive in common.rules example file. |
|---|
| 228 |
- Modified eddie_wrapper to use a Python call to fetch the current time |
|---|
| 229 |
rather than relying on GNU date. This has improved compatability with |
|---|
| 230 |
more types of systems, as it can be assumed that Python will be available |
|---|
| 231 |
to run EDDIE ! |
|---|
| 232 |
- Handle Elvin connection problems more gracefully, backing off before |
|---|
| 233 |
retrying. |
|---|
| 234 |
- Disabled counting of file descriptors in use, which is only needed for |
|---|
| 235 |
debugging on rare occasions. |
|---|
| 236 |
- Bugfix in HTTP when trying to determine error string for some types of |
|---|
| 237 |
exceptions. |
|---|
| 238 |
- Improved PING multi-threaded reliability on platforms that were causing |
|---|
| 239 |
problems because they simply used the current pid as the icmp_id. |
|---|
| 240 |
On platforms where all threads share the same process id this was causing |
|---|
| 241 |
unreliable ping results as the wrong threads would accept the wrong icmp |
|---|
| 242 |
replies. It now uses the current thread object's memory address for the |
|---|
| 243 |
icmp_id to make them as unique as possible and avoid such confusion. |
|---|
| 244 |
- New directive: TAPE - functions almost exactly like the DISK directive |
|---|
| 245 |
but fetches stats from the TapeStatistics class from the diskdevice |
|---|
| 246 |
module (which is currently only available for Solaris). |
|---|
| 247 |
Example: |
|---|
| 248 |
TAPE st52_thruput: |
|---|
| 249 |
device='st52' |
|---|
| 250 |
scanperiod='5m' |
|---|
| 251 |
rule='1' # always perform action |
|---|
| 252 |
action='elvinrrd("tape-%(h)s_%(device)s", "rbytes=%(nread)s", "wbytes=%(nwritten)s")' |
|---|
| 253 |
- New directive, DISK. This uses the new DiskStatistics data collector from |
|---|
| 254 |
a diskdevice module (available for Solaris-only so far) to enable rules |
|---|
| 255 |
to be created using disk device activity stats. |
|---|
| 256 |
Example: a directive which collects bytes read/written to the disk device |
|---|
| 257 |
md20 and sends these counters to elvinrrd |
|---|
| 258 |
DISK md20_thruput: |
|---|
| 259 |
device='md20' |
|---|
| 260 |
scanperiod='5m' |
|---|
| 261 |
rule='1' # always perform action |
|---|
| 262 |
action='elvinrrd("disk-%(h)s_%(device)s", "rbytes=%(nread)s", "wbytes=%(nwritten)s")' |
|---|
| 263 |
- Solaris: added a new Data Collector, DiskStatistics, in module diskdevice.py |
|---|
| 264 |
(for Solaris only so far). On Solaris this collects disk activity statistics |
|---|
| 265 |
from a call to kstat, ie, '/usr/bin/kstat -p -c disk'. All stats generated |
|---|
| 266 |
by that command are collected for each disk and made available to directives. |
|---|
| 267 |
- Solaris: enhanced the network interface statistics collection to fetch |
|---|
| 268 |
more detailed stats from 'netstat -k' for each physical interface. |
|---|
| 269 |
An example of the statistics now available for an interface (hme0 on 5.7) |
|---|
| 270 |
are: |
|---|
| 271 |
ipackets 65360226 ierrors 25 opackets 77502512 oerrors 0 collisions 0 |
|---|
| 272 |
defer 0 framing 0 crc 0 sqe 0 code_violations 0 len_errors 0 |
|---|
| 273 |
ifspeed 100 buff 0 oflo 0 uflo 0 missed 25 tx_late_collisions 0 |
|---|
| 274 |
retry_error 0 first_collisions 0 nocarrier 0 inits 7 nocanput 440 |
|---|
| 275 |
allocbfail 0 runt 0 jabber 0 babble 0 tmd_error 0 tx_late_error 0 |
|---|
| 276 |
rx_late_error 0 slv_parity_error 0 tx_parity_error 0 rx_parity_error 0 |
|---|
| 277 |
slv_error_ack 0 tx_error_ack 0 rx_error_ack 0 tx_tag_error 0 |
|---|
| 278 |
rx_tag_error 0 eop_error 0 no_tmds 0 no_tbufs 0 no_rbufs 0 |
|---|
| 279 |
rx_late_collisions 0 rbytes 1726897560 obytes 834302609 multircv 7535 multixmt 0 |
|---|
| 280 |
brdcstrcv 248816 brdcstxmt 1667 norcvbuf 440 noxmtbuf 0 phy_failures 0 |
|---|
| 281 |
as well as info from 'netstat -in' such as mtu, network, etc. |
|---|
| 282 |
- Solaris: now collects more detailed filesystem information in SunOS/df.py, |
|---|
| 283 |
including inode usage, filesystem type, flags, and blocks as well as kBytes |
|---|
| 284 |
used. The full list of variables now available to directives is: |
|---|
| 285 |
fsname - filesystem name (string) |
|---|
| 286 |
mountpt - mount point (string) |
|---|
| 287 |
size - size of filesystem in kBytes (int) |
|---|
| 288 |
used - kBytes used (int) |
|---|
| 289 |
avail - kBytes free (int) |
|---|
| 290 |
pctused - percentage of filesystem used (float) |
|---|
| 291 |
totalblocks - total amount of physical blocks (512 Bytes/block) (int) |
|---|
| 292 |
usedblocks - number of physical blocks used (int) |
|---|
| 293 |
availblocks - number of physical blocks available for unprivileged users (int) |
|---|
| 294 |
freeblocks - number of physical blocks free (int) |
|---|
| 295 |
blocksize - filesystem (logical) block size (int) |
|---|
| 296 |
fragsize - filesystem fragmentation size (int) |
|---|
| 297 |
totalinodes - total inodes on filesystem (int) |
|---|
| 298 |
usedinodes - number of inodes used (int) |
|---|
| 299 |
availinodes - number of inodes left available (int) |
|---|
| 300 |
pctinodes - percentage of inodes used (float) |
|---|
| 301 |
filesysid - filesystem id (int) |
|---|
| 302 |
fstype - type of filesystem (string) |
|---|
| 303 |
flag - filesystem flags (string) |
|---|
| 304 |
filelen - max filename length (int) |
|---|
| 305 |
Thanks to Dougal Scott for submitting this patch. |
|---|
| 306 |
- When matching hostnames to group names, ignore any domain parts of the |
|---|
| 307 |
hostname it is fully-qualified. Group names cannot contain |
|---|
| 308 |
non-alphanumeric characters, so will only match the host part of a FQDN. |
|---|
| 309 |
- Bugfix: clear checkdependson if it is assigned an empty string. |
|---|
| 310 |
- Solaris: improvement to uptime/loadavg stats collection where it is |
|---|
| 311 |
possible for the "day(s)" section of /usr/bin/uptime output to be |
|---|
| 312 |
missing (usually if wtmpx rotated more often than the system boot, |
|---|
| 313 |
thus losing the last 'reboot' entry) so SunOS/system.py now handles |
|---|
| 314 |
this exceptional case. |
|---|
| 315 |
|
|---|
| 316 |
Eddie-0.34 (13-Sep-2004) |
|---|
| 317 |
- OpenBSD: collect in/out byte counters for network interfaces, which |
|---|
| 318 |
requires an extra netstat call. |
|---|
| 319 |
- OpenBSD: added drops counter to network interface stats. |
|---|
| 320 |
- OpenBSD: fixed some bugs preventing network interface statistics collection |
|---|
| 321 |
from working properly. |
|---|
| 322 |
- Improved handling of exceptions when counting file descriptors in use. |
|---|
| 323 |
Instead of raising a global exception (and causing EDDIE to die) just log |
|---|
| 324 |
the exception and carry on. |
|---|
| 325 |
- Perform global housekeeping duties more often. Now they are every |
|---|
| 326 |
1 minute instead of every 10 minutes. This means that changes to |
|---|
| 327 |
config and rules files will be picked up much faster. |
|---|
| 328 |
- Added pysnmp module to Extra dir, which EDDIE uses for making SNMP queries. |
|---|
| 329 |
- Extra 3rd-party modules are now being distributed with EDDIE. They will |
|---|
| 330 |
live in lib/common/Extra/ and are provided to make installation simpler |
|---|
| 331 |
for commonly-used modules. |
|---|
| 332 |
- HTTP: Make sure 'ip' message variable is initialized in HTTP directives. |
|---|
| 333 |
- HTTP: Some HTTP response exceptions were not being caught properly. |
|---|
| 334 |
- HTTP: Some socket.timeout checks weren't checking for the correct version |
|---|
| 335 |
of Python (which was causing AttributeError exceptions). |
|---|
| 336 |
- HTTP: Changed the logging of response body read() exceptions which were not |
|---|
| 337 |
working for some types of exceptions. |
|---|
| 338 |
- Made eddie_wrapper smarter about finding a date or gdate command to use. |
|---|
| 339 |
- Darwin: Fixed a bug parsing vmstat statistics. These counters were |
|---|
| 340 |
being truncated (and hence wrong) before. |
|---|
| 341 |
- Darwin: Better handling of parsing errors in the proc data collector. |
|---|
| 342 |
- The COM directive now shares the utils.systemcall_semaphore semaphore |
|---|
| 343 |
rather than relying on its own. This prevents conflicts between any |
|---|
| 344 |
threads that need to perform a system() (or os.popen() or |
|---|
| 345 |
commands.getstatusoutput()) simultaneously. |
|---|
| 346 |
Thanks to Denis Menshikov for verifying this issue. |
|---|
| 347 |
- Bugfix for SP directive determining the right protocol (Dougal Scott). |
|---|
| 348 |
- Bugfix for a problem that occasionally the get TCPtable returns no entries |
|---|
| 349 |
for no obvious reason. This means that all the SP style checks would |
|---|
| 350 |
start complaining that no one is listening (Dougal Scott). |
|---|
| 351 |
- If ELVINURL and ELVINSCOPE are both undefined in eddie.cf then disable |
|---|
| 352 |
Elvin functionality. |
|---|
| 353 |
- Update to MYSQL directive adding "result#" variable (Dougal Scott). |
|---|
| 354 |
- Converted mysql.py from DOS line endings to UNIX. |
|---|
| 355 |
- Fixed 'daemon' call in contrib init script so it works properly on newer |
|---|
| 356 |
versions of Redhat. |
|---|
| 357 |
- Added new exception DataFailure. |
|---|
| 358 |
Changed exceptions to be subclasses of Exception. |
|---|
| 359 |
Catch DataFailure exceptions from collectData(). These are raised if the |
|---|
| 360 |
Data Collector encounters a major problem collecting the data. |
|---|
| 361 |
- Added support for Redhat Enterprise Linux (or perhaps newer kernels 2.4.21+) |
|---|
| 362 |
which has extra stats added to the cpu fields in /proc/stat. The cpu counters |
|---|
| 363 |
now available with these kernels are: |
|---|
| 364 |
ctr_cpu_user |
|---|
| 365 |
ctr_cpu_nice |
|---|
| 366 |
ctr_cpu_system |
|---|
| 367 |
ctr_cpu_idle |
|---|
| 368 |
ctr_cpu_iowait |
|---|
| 369 |
ctr_cpu_hardirq |
|---|
| 370 |
ctr_cpu_softirq |
|---|
| 371 |
|
|---|
| 372 |
Eddie-0.33 (15-Jul-2004) |
|---|
| 373 |
- Handle socket timeout exceptions properly when HTTP response read() fails. |
|---|
| 374 |
- Handle socket.settimeout() not being available on Python pre-2.3 versions. |
|---|
| 375 |
- A new HTTP rule/action variable 'timedout' has been added which will be set |
|---|
| 376 |
to 1 if a socket timeout exception has occurred, otherwise it will be 0. |
|---|
| 377 |
- Added HTTP directive option 'request_timeout' which specifies how long a |
|---|
| 378 |
HTTP(S) connection should wait for a response before timing out with an |
|---|
| 379 |
error. This makes use of a new Python 2.3 feature where socket timeouts |
|---|
| 380 |
can be configured, hence this option is only available when Eddie is |
|---|
| 381 |
running on Python 2.3+. |
|---|
| 382 |
- Better defaults for SENDMAIL and ELVIN settings in sample eddie.cf. |
|---|
| 383 |
- Added better logging of HTTP directive actions. |
|---|
| 384 |
- Enhancements to HTTP directive: |
|---|
| 385 |
Supports URLs with non-standard ports, eg: http://localhost:8080/ |
|---|
| 386 |
Added finer grained timing of four parts of the HTTP connection: |
|---|
| 387 |
time_resolve - elapsed time to resolve hostname to IP |
|---|
| 388 |
time_connect - elapsed time to connect to server |
|---|
| 389 |
time_request - elapsed time to send HTTP/S request to server |
|---|
| 390 |
time_response - elapsed time to retrieve the server response (and close connection) |
|---|
| 391 |
time - elapsed total time (sum of above) |
|---|
| 392 |
- Added system-specific sample rules for Linux & Solaris. |
|---|
| 393 |
- Added testing ruleset for OpenBSD in development/testing/. |
|---|
| 394 |
- Added initial OpenBSD support, thanks to John McInnes. |
|---|
| 395 |
- DataCollect now logs what module is being requested for import. |
|---|
| 396 |
- Fixed act2ok bug in FILE test. |
|---|
| 397 |
- Remove accidental accented character from nice() comments. |
|---|
| 398 |
It was causing a DeprecationWarning in Python 2.3.3+. |
|---|
| 399 |
- Created a full directive test suite for Darwin (OS X) to provide standard |
|---|
| 400 |
testing of all possible directives (or as many as possible). |
|---|
| 401 |
These live in development/testing/. |
|---|
| 402 |
_ PING: PING directive was logging pktloss as decimal when it should have been |
|---|
| 403 |
a percentage. |
|---|
| 404 |
- SP: Local address IP for SP directives (using netstat data-collector) can now |
|---|
| 405 |
be specified as '*' or '0.0.0.0' for Solaris. '*' is automatically |
|---|
| 406 |
converted to '0.0.0.0' for consistency. |
|---|
| 407 |
- First version of OS-specific modules ported to Mac OS X (Darwin). |
|---|
| 408 |
Tested on OS X 10.3.3 (Darwin 7.3.0). Needs plenty more testing. |
|---|
| 409 |
- HTTP: Initialize HTTP directive exception data so variable substitution in |
|---|
| 410 |
messages doesn't fail. |
|---|
| 411 |
- Added new directive argument: checktime |
|---|
| 412 |
Used to restrict directive execution to specified times. The value |
|---|
| 413 |
is a Python expression which can use various variables representing |
|---|
| 414 |
the current time and day: |
|---|
| 415 |
day ('mon', 'tue', etc); time (HHMM); hour (0-23); minute (0-59); second (0-59). |
|---|
| 416 |
And for shorthands, the fixed lists: |
|---|
| 417 |
weekdays ('mon' - 'fri'), weekend ('sat', 'sun'). |
|---|
| 418 |
Examples: |
|---|
| 419 |
checktime='day=="mon" or day=="tue"' |
|---|
| 420 |
checktime='day in weekdays and hour>18' |
|---|
| 421 |
- Only perform act2ok action(s) if some actions were already called. |
|---|
| 422 |
In cases where the check fails but actiondependson causes actions to |
|---|
| 423 |
be skipped, we don't need the act2ok actions to be called. |
|---|
| 424 |
- Added MYSQL directive submitted by Dougal Scott. |
|---|
| 425 |
- PING: Fixed a socket exception for gethostbyname failures. |
|---|
| 426 |
- Added option to disable a directive. Specify 'disabled=1' in a directive |
|---|
| 427 |
to force it to be disabled. |
|---|
| 428 |
- SNMP directive now supports 64-bit counters split into high/low OIDs. Specify |
|---|
| 429 |
these as "OIDhigh:OIDlow". |
|---|
| 430 |
Example: |
|---|
| 431 |
oid='1.3.6.1.2.1.2.2.1.10.2:1.3.6.1.2.1.2.2.1.10.3' |
|---|
| 432 |
Where the first OID is the High 32 bits and the second OID is the lower 32 bits. |
|---|
| 433 |
- Added an FS template, fstpl, to sample common.rules. |
|---|
| 434 |
|
|---|
| 435 |
Eddie-0.32 (21-Apr-2003) |
|---|
| 436 |
- Added an exception handler for httplib read() where it can fail in |
|---|
| 437 |
some circumstances. |
|---|
| 438 |
- Fixed HTTP timing so that the whole HTTP session was timed, not just the |
|---|
| 439 |
connect part. This was mis-leading before. |
|---|
| 440 |
- If no output from COM directive, set outfield1 anyway so rule |
|---|
| 441 |
strings don't break. Suggested by Arcady Genkin. |
|---|
| 442 |
- Changed some sample rules to use ALERT_EMAIL alias rather than "alert" |
|---|
| 443 |
fixed email address. Thanks to Zac Stevens <zts@itga.com.au> for |
|---|
| 444 |
pointing them out. |
|---|
| 445 |
- Added restart option to redhat init.d script in contrib. |
|---|
| 446 |
- Added new directive parameter: actionmaxcalls - defines the maxmimum number |
|---|
| 447 |
of times actions will be called for a particular failure. |
|---|
| 448 |
- Minor bugfix: sendmail_smtp() was returning wrong return codes; successful |
|---|
| 449 |
posts were showing as failures, etc. |
|---|
| 450 |
- Added new directive parameter: excludehosts |
|---|
| 451 |
Directive will be skipped on any hosts specified by excludehosts. |
|---|
| 452 |
Specified as a string containing a comma-separated list of hostnames. |
|---|
| 453 |
- If groups of the same name are defined, merge them together rather than |
|---|
| 454 |
throwing an error. This allows for more custom rule configurations. |
|---|
| 455 |
Requested by Arcady Genkin <agenkin@cdf.toronto.edu> |
|---|
| 456 |
|
|---|
| 457 |
Eddie-0.31 (11-Dec-2002) |
|---|
| 458 |
- Increased Linux system counters from int to long. |
|---|
| 459 |
- Fixed bug with isfile/isdir/etc shorthands not working properly. |
|---|
| 460 |
- Console displays "<directive not ready>" for directives which have not |
|---|
| 461 |
yet been initialised, rather than throwing KeyError exception. |
|---|
| 462 |
- Added option to send emails via SMTP servers, rather than relying on |
|---|
| 463 |
a local sendmail binary. Either option can now be used. |
|---|
| 464 |
Set SMTP_SERVERS in config to use SMTP server option. This option |
|---|
| 465 |
is now the default, and server defaults to 'localhost'. |
|---|
| 466 |
Based on a submission by Dougal Scott <dwagon@connect.com.au> |
|---|
| 467 |
- Fixed FILE example rule when performing cron test. |
|---|
| 468 |
Noted by Dougal Scott <dwagon@connect.com.au>. |
|---|
| 469 |
- Convert the weird time format that Solaris ps returns for etime and time |
|---|
| 470 |
into plain seconds, which is a lot more useful for rules rather than |
|---|
| 471 |
checking lengths or doing a integer conversion of a subslice of the |
|---|
| 472 |
result and then a comparison based on that. |
|---|
| 473 |
Patched by Dougal Scott <dwagon@connect.com.au>. |
|---|
| 474 |
- Improved error output when parsing rules. |
|---|
| 475 |
- Fixed bug when using Python pre-2.2 versions. |
|---|
| 476 |
- Added some more sample directives. |
|---|
| 477 |
- Added support for remembering historical data in directives. Rules can |
|---|
| 478 |
reference data from previous samples. |
|---|
| 479 |
- Changed actionperiod slightly, so first actionperiod defaults to scanperiod, |
|---|
| 480 |
then actionperiod expression is used thereafter. |
|---|
| 481 |
- Shift sticky and type bits of mode across, right justified. |
|---|
| 482 |
- Improved handling of tokenization errors. |
|---|
| 483 |
- Directive is cancelled (not re-queued) if there are too many |
|---|
| 484 |
SNMP query failures (usually host not responding or some other |
|---|
| 485 |
network or transport failure). |
|---|
| 486 |
- Added shorthand booleans to FILE directive for checking file types in rules: |
|---|
| 487 |
issocket |
|---|
| 488 |
issymlink |
|---|
| 489 |
isfile |
|---|
| 490 |
isblockdevice |
|---|
| 491 |
isdir |
|---|
| 492 |
ischardevice |
|---|
| 493 |
isfifo |
|---|
| 494 |
- Updated docs with version 0.30 changes (forgot to do this at release time, |
|---|
| 495 |
oops). |
|---|
| 496 |
- Improved handling of sockets errors for console. |
|---|
| 497 |
- Fixed issue with templates not being handled before rest of directive arguments. |
|---|
| 498 |
- Added perm, sticky and type rule variables to the FILE directive. They are |
|---|
| 499 |
shorthands for the permissions, sticky/setuid/setgid and file type bits |
|---|
| 500 |
of a file's mode. |
|---|
| 501 |
- Improved config syntax error handling of bad directive names. |
|---|
| 502 |
- Implemented check and action dependency definitions. Two new directive |
|---|
| 503 |
options are: actiondependson and checkdependson. These can be set to a |
|---|
| 504 |
string containing a list of directives (comma-separated) that this directive |
|---|
| 505 |
is dependent on. If any of the dependent directives has failed when this |
|---|
| 506 |
directive comes to perform its check or action (depending on which option |
|---|
| 507 |
was used) then that check or action will be skipped. |
|---|
| 508 |
- Added new directive option actionperiod. This is a string containing an |
|---|
| 509 |
expression which, when evaluated, sets the current period between actions |
|---|
| 510 |
being performed. This allows for periods between actions to different to |
|---|
| 511 |
the period between checks. It also allows for the period to be defined by |
|---|
| 512 |
a mathematical expression, so the action period could exponentially increase |
|---|
| 513 |
for example (for actions called during a single failure - the action period |
|---|
| 514 |
will be reset when the failure is fixed). |
|---|
| 515 |
- Enforced unique group and directive names at same group level. |
|---|
| 516 |
- Improved error handling of console connections from bad clients. |
|---|
| 517 |
- Fixed syntax error in sample config. |
|---|
| 518 |
- Changed Linux ctr_interrupts system counter from int to long. |
|---|
| 519 |
- Improved error handling of snmp directive. |
|---|
| 520 |
- Improved handling of group configuration errors. |
|---|
| 521 |
- Finally removed dependency on user-compiled 'top' command for collecting |
|---|
| 522 |
some system stats on Solaris. All current stats are collected from uptime |
|---|
| 523 |
and vmstat commands now, which should be standard on any Solaris system. |
|---|
| 524 |
- Fetch Linux memory statistics from /proc/meminfo. |
|---|
| 525 |
|
|---|
| 526 |
Eddie-0.30 (31-May-2002) |
|---|
| 527 |
- Prevented failed calls to 'top' (which will soon be made redundant anyway) |
|---|
| 528 |
from causing system stats collection to fail on Solaris. |
|---|
| 529 |
- Removed fetching WCHAN field from process information on Linux, as this |
|---|
| 530 |
sometimes caused kernel warnings to output or logged. The field doesn't |
|---|
| 531 |
appear particularly useful. |
|---|
| 532 |
- Changed Linux Context switch counter from an int to a long. |
|---|
| 533 |
- Fixed bug when an error parsing top output locks the system call semaphore |
|---|
| 534 |
on Solaris. |
|---|
| 535 |
- Fixed small bug when parsing string variables and catching exceptions in |
|---|
| 536 |
actions. |
|---|
| 537 |
- Added SENDMAIL config option to specify location of the sendmail binary |
|---|
| 538 |
which EDDIE uses to send all email. |
|---|
| 539 |
- Fixed bug when templates not in same group as directive referencing them. |
|---|
| 540 |
- Changes PID directive argument 'pid' to 'pidfile'. |
|---|
| 541 |
- Better handling of missing pysnmp module in snmp.py. |
|---|
| 542 |
- Added basic SNMP directive based on a module by Dougal Scott |
|---|
| 543 |
<dwagon@connect.com.au>. Requires pysnmp. |
|---|
| 544 |
- Changed Linux 'df' call to 'df -l' which lists all local filesystems. |
|---|
| 545 |
Much friendlier now that there are many alternative filesystems available |
|---|
| 546 |
for Linux. |
|---|
| 547 |
- Added patch by Kees Bakker <kees.bakker@altium.nl> to handle Linux df |
|---|
| 548 |
when it sometimes outputs filesystem information over multiple lines. |
|---|
| 549 |
- Added outfield variables to the COM directive. The out variable is split |
|---|
| 550 |
by whitespace and the fields are stored in outfieldn variables, e.g., |
|---|
| 551 |
outfield1, outfield2, etc. This is to assist rule creation. |
|---|
| 552 |
- Added netsaint action and Elvin notification method, submitted by |
|---|
| 553 |
Dougal Scott <dwagon@connect.com.au>. |
|---|
| 554 |
- Added minor bug-fixes, thanks to pre-release testing by Dougal Scott |
|---|
| 555 |
<dwagon@connect.com.au>. |
|---|
| 556 |
- Linux ctr_cpu_idle variables need to be longs (instead of ints) as the |
|---|
| 557 |
counters are larger than expected. |
|---|
| 558 |
- Created a HTTP directive for performing HTTP (and HTTPS) tests. |
|---|
| 559 |
- Fixed minor bug when displaying config lines that have parsing errors. |
|---|
| 560 |
- Fixed bug in METASTAT directive. |
|---|
| 561 |
- Removed the CRON directive. It is redundant now that the FILE directive |
|---|
| 562 |
can perform the same test. |
|---|
| 563 |
- Added a new data variable to FILE directive: now, which contains the |
|---|
| 564 |
current time for use in tests with atime/mtime/ctime. |
|---|
| 565 |
- LOGSCAN directive now initalizes data variables on first check, which is |
|---|
| 566 |
only for finding the end of the logfile in question. This prevents an |
|---|
| 567 |
exception when variables are needed for console strings before second |
|---|
| 568 |
check has run. |
|---|
| 569 |
- Removed optional actionList from being logged by directives also. |
|---|
| 570 |
- Fixed bug with directives trying to log the action list, which is optional |
|---|
| 571 |
now and may not exist. |
|---|
| 572 |
- Moved sample M/MSG definitions to message.rules file. |
|---|
| 573 |
- Added some more sample rules. |
|---|
| 574 |
- Cleaned up sample rules and updated for the latest directive changes. |
|---|
| 575 |
Added some elvinrrd sample rules. |
|---|
| 576 |
- Minor cleanup of base directory path; just found os.path.norm() :) |
|---|
| 577 |
- Fixed small problem with arg parsing handling None values. |
|---|
| 578 |
- Fixed small bug in PORT directive: when a check fails due to a connection |
|---|
| 579 |
timeout, the recv string that wasn't set was still being searched. |
|---|
| 580 |
- Cleaned up config formatting some more so that actions do not need to be |
|---|
| 581 |
inside strings, they can be entered directly in a function call-like |
|---|
| 582 |
format, e.g., |
|---|
| 583 |
action=ticker("Load on %(h)s is %(out)s", timeout=1) |
|---|
| 584 |
or for a notification object, |
|---|
| 585 |
action=COMMONALERT(commonmsg.fs,1) |
|---|
| 586 |
- Changed PROC argument 'procname' to 'name' and action variable |
|---|
| 587 |
'proc_check_name' to 'name' also, for consistency. |
|---|
| 588 |
- Fixed minor bug with lack of expect argument for PORT directive. |
|---|
| 589 |
- Removed data collection modules which are not required. |
|---|
| 590 |
- Cleaned up all data collection modules and classes to simplify their |
|---|
| 591 |
definitions. Data collectors should be derived from the DataCollect |
|---|
| 592 |
base class which handles all the data caching and thread-locking. |
|---|
| 593 |
- Changes to parseConfig to simply directive definitions. |
|---|
| 594 |
- Removed old datastore module. |
|---|
| 595 |
- Fixed up console code to handle errors better. |
|---|
| 596 |
- Changed Directive base-class to simplify directive definitions. |
|---|
| 597 |
- New datacollect module which defines DataModules class to handle dynamic |
|---|
| 598 |
importing of architecture-dependent data collection modules, and |
|---|
| 599 |
DataCollect class to provide a base-class for data collectors. |
|---|
| 600 |
- Fixed PING directive to handle un-resolvable addresses. Also returns ping |
|---|
| 601 |
round-trip-times in seconds as a floating-point number. |
|---|
| 602 |
- Simplified directive definitions by moving most of the common code to |
|---|
| 603 |
Directive base-class. New directives only need to define __init__, |
|---|
| 604 |
tokenparser and getData methods. |
|---|
| 605 |
- Removed requirement for action variables to be prefixed by directive name. |
|---|
| 606 |
Action variables now have the same name as the rule variables, for |
|---|
| 607 |
consistency. Changed a few more variable names so they make more sense. |
|---|
| 608 |
- Moved common directive definitions from directive.py to |
|---|
| 609 |
Directives/common.py. |
|---|
| 610 |
- OS-dependent modules are now imported dynamically when needed, not in the |
|---|
| 611 |
main eddie.py anymore. All data collection modules are handled by the |
|---|
| 612 |
new datacollect module. |
|---|
| 613 |
- Removed old method of determining systype with external script (wasn't used |
|---|
| 614 |
anymore anyway). |
|---|
| 615 |
- Fixed bug with Pinger where it would throw an exception when pinging |
|---|
| 616 |
addresses that did not resolve. |
|---|
| 617 |
- Added extra console argument variables: |
|---|
| 618 |
. lastchecktime - date/time of last directive execution |
|---|
| 619 |
. problemfirstdetect - date/time of current failure first detected (only if |
|---|
| 620 |
state is failed) |
|---|
| 621 |
. problemlastfail - date/time of current failure last detected (only if state |
|---|
| 622 |
is failed) |
|---|
| 623 |
- Cleaned up description of ADMINLEVEL in sample config so it makes more sense. |
|---|
| 624 |
- Added console argument to directives to specify how the console output should |
|---|
| 625 |
look for that directive. console=None can be specified to hide that directive |
|---|
| 626 |
from console output. |
|---|
| 627 |
- Added support for EXT3 filesystems in Linux filesystem checking code. |
|---|
| 628 |
Patch submitted by Kees Bakker <kees.bakker@altium.nl> |
|---|
| 629 |
- Fixed a minor bug where directives using the eval() function and catching |
|---|
| 630 |
an exception would log a very ugly looking message. This was due to the Python |
|---|
| 631 |
eval() function modifying the user-supplied environment dictionary by adding |
|---|
| 632 |
the __builtin__ dictionary. When this is printed it looks horrible. |
|---|
| 633 |
- Added 'actelse' directive argument to perform actions if directive state is |
|---|
| 634 |
ok and has not changed with last check. |
|---|
| 635 |
Based on patches submitted by Dougal Scott <dwagon@connect.com.au> |
|---|
| 636 |
- Changed Linux counter variables to have 'ctr_' at start of name, to be |
|---|
| 637 |
consistent with Solaris and HP-UX variables. |
|---|
| 638 |
- Fixed minor bug in HP-UX and Solaris system data collection. |
|---|
| 639 |
- Fixed bug in uptime parsing in HP-UX system.py. |
|---|
| 640 |
- Added a timeout argument to the ticker action. |
|---|
| 641 |
- Re-implemented Elvin connection and notification code using the Elvin |
|---|
| 642 |
ThreadedLoop client and a dedicated Elvin thread which should prevent |
|---|
| 643 |
other threads from blocking on Elvin problems. |
|---|
| 644 |
- Specify full path for solaris 'ps' command to prevent calling wrong version of |
|---|
| 645 |
'ps'. |
|---|
| 646 |
- Started work on a basic Developer's Guide: doc/dev_guide.txt. |
|---|
| 647 |
- Standardised logging levels and tidied up all logging. |
|---|
| 648 |
- Added system performance data collecting from 'uptime' and 'vmstat -s' |
|---|
| 649 |
commands on Solaris. |
|---|
| 650 |
- Improved network interface statistics on Linux by retrieving data from |
|---|
| 651 |
/proc/net/dev. |
|---|
| 652 |
|
|---|
| 653 |
Eddie-0.29 (non-public release) |
|---|
| 654 |
|
|---|
| 655 |
Eddie-0.28 (9-Mar-2002) |
|---|
| 656 |
- Cleaned up df code, added data caching and made thread-safe, like other |
|---|
| 657 |
data collectors. |
|---|
| 658 |
- Fixed up eddie_wrapper locating GNU date on Solaris. |
|---|
| 659 |
- Fixed memory-leak in disk-usage code (reported by Dougal Scott |
|---|
| 660 |
<dwagon@connect.com.au>). |
|---|
| 661 |
- Exit with error if all threads are locked (cannot kill threads in current |
|---|
| 662 |
Python implementation). |
|---|
| 663 |
Make eddie_wrapper a little smarter when restarting eddie process. |
|---|
| 664 |
- Added example init.d scripts to contrib for Solaris and Redhat Linux. |
|---|
| 665 |
- Added another vmstat parser to get free memory/swap information for |
|---|
| 666 |
Solaris. |
|---|
| 667 |
- Added a common semaphore for utils.safe_popen()/safe_pclose() and |
|---|
| 668 |
utils.safe_getstatusoutput() to use between them. It appears that |
|---|
| 669 |
system calls, of any sort - system() calls, popen(), commands module, |
|---|
| 670 |
etc - are not thread-safe and cannot be performed simultaneously by |
|---|
| 671 |
multiple threads at once. This should prevent such race-conditions as |
|---|
| 672 |
all EDDIE system calls use these functions. |
|---|
| 673 |
- Cleaned up access to the system stats cache so that only one thread at a |
|---|
| 674 |
time will be refreshing the data. |
|---|
| 675 |
- Added some more smarts to eddie_wrapper: |
|---|
| 676 |
- don't start Eddie if one is already running. |
|---|
| 677 |
- don't restart Eddie more than a set number of times in a short period of |
|---|
| 678 |
time (requires GNU date command). |
|---|
| 679 |
- Put semaphore lock around Elvin notify to ensure thread-safe notifications |
|---|
| 680 |
are being sent. Suspect duplicates were being sent before. |
|---|
| 681 |
- Now logs the current thread name for each log entry for improved debugging. |
|---|
| 682 |
- A lot of cleaning up of system.py for Solaris. |
|---|
| 683 |
Added all counter stats from 'vmstat -s'. |
|---|
| 684 |
Changed gathering of loadavg/uptime stats from '/usr/bin/uptime' rather than |
|---|
| 685 |
'/opt/local/bin/top' - trying to phase out use of 'top'. |
|---|
| 686 |
Improved documentation at top of class, with listing of every stats variable |
|---|
| 687 |
available from the system class. |
|---|
| 688 |
- Added prtdiag parsing for Enterprise class servers (E3500,E6500,etc) |
|---|
| 689 |
for temperature. |
|---|
| 690 |
- Added support for prtdiag for Sun U280R's. |
|---|
| 691 |
- Added list of paths to find metastat command for Solaris METASTAT directive. |
|---|
| 692 |
- Added PRTDIAG directive to provide an interface to the system-specific |
|---|
| 693 |
data provided by prtdiag on Sun machines. |
|---|
| 694 |
Currently only system temperatures are extracted for U450s and U250s. |
|---|
| 695 |
- Added support for VxFS filesystems in df.py for Solaris. |
|---|
| 696 |
- Updated docs to require Python versions 1.6+ |
|---|
| 697 |
|
|---|
| 698 |
Eddie-0.27 (12-Nov-2001) |
|---|
| 699 |
- Put semaphore lock around Elvin connect calls to prevent multiple threads |
|---|
| 700 |
trying to connect at once. |
|---|
| 701 |
- Fixed bug with ELVINURL and ELVINSCOPE config options not being set |
|---|
| 702 |
properly. |
|---|
| 703 |
- Socket errors in Console code are matched with errno error names, rather |
|---|
| 704 |
than assuming the error numbers are the same across platforms. |
|---|
| 705 |
[Bug reported by: Ivar Zarans <iff@alcaron.ee>] |
|---|
| 706 |
- Handle socket errors from PINGs nicely. |
|---|
| 707 |
- Added a reconnect() function to force the elvin connection closed before |
|---|
| 708 |
reconnecting. |
|---|
| 709 |
- Cleaned up eddieElvin4 code, including connecting and auto-reconnecting to |
|---|
| 710 |
Elvin server when connection is lost. |
|---|
| 711 |
- Added better exception handling for "Connection Timed Out" error in PORT |
|---|
| 712 |
directive isalive() function. |
|---|
| 713 |
- Fixed file descriptor leak in PORT directive isalive() function when |
|---|
| 714 |
Connection Refused exception is handled the socket file descriptor was |
|---|
| 715 |
not being closed. |
|---|
| 716 |
- Added more system statistics to the Linux system data collector module. |
|---|
| 717 |
Added most of the stats available from /proc/stat, including: |
|---|
| 718 |
cpu_user - total cpu in user space |
|---|
| 719 |
cpu_nice - total cpu in user nice space |
|---|
| 720 |
cpu_system - total cpu in system space |
|---|
| 721 |
cpu_idle - total cpu in idle thread |
|---|
| 722 |
cpu%d_user - per cpu in user space (e.g., cpu0, cpu1, etc) |
|---|
| 723 |
cpu%d_nice - per cpu in user nice space (e.g., cpu0, cpu1, etc) |
|---|
| 724 |
cpu%d_system - per cpu in system space (e.g., cpu0, cpu1, etc) |
|---|
| 725 |
cpu%d_idle - per cpu in idle thread (e.g., cpu0, cpu1, etc) |
|---|
| 726 |
pages_in - pages read in |
|---|
| 727 |
pages_out - pages written out |
|---|
| 728 |
pages_swapin - swap pages read in |
|---|
| 729 |
pages_swapout - swap pages written out |
|---|
| 730 |
interrupts - number of interrupts received |
|---|
| 731 |
contextswitches - number of context switches |
|---|
| 732 |
boottime - time of boot (epoch) |
|---|
| 733 |
processes - number of processes started (I think?) |
|---|
| 734 |
These are now available to directives like SYS. |
|---|
| 735 |
- Cleaned up eddie-adm email headers. |
|---|
| 736 |
|
|---|
| 737 |
Eddie-0.26 (1-Oct-2001) |
|---|
| 738 |
- Changed elvinrrd() action call arguments slightly. It is now: |
|---|
| 739 |
elvinrrd( 'rrdkey', 'arg1=val1', 'arg2=val2', ... ) |
|---|
| 740 |
The first argument must be the RRD database name to store data into. |
|---|
| 741 |
All arguments following that (one or more) are "variable=data" strings |
|---|
| 742 |
where variable is the name of the variable in the RRD db and data is |
|---|
| 743 |
the data to store in that variable. RRD dbs can have multiple variables |
|---|
| 744 |
so this allows some or all of them to be updated in one action call. |
|---|
| 745 |
- Wrapped the critical calls in safe_popen(), safe_pclose() and |
|---|
| 746 |
safe_getstatusoutput() in try/except clauses, so that any exceptions are |
|---|
| 747 |
intercepted and the semaphore locks are released (exceptions are then |
|---|
| 748 |
raised again to be handled as normal). This stops threads being blocked |
|---|
| 749 |
on semaphore acquires which used up the thread pool quickly and was |
|---|
| 750 |
obviously bad. |
|---|
| 751 |
- Added elvinrrd action which is used to send data samples over Elvin to a |
|---|
| 752 |
consumer which stores that data into an RRD database. |
|---|
| 753 |
- Updated elvindb() action and elvindb() Elvin function to support Elvin4. |
|---|
| 754 |
elvindb actions are now working again. |
|---|
| 755 |
- Directive states now transition from "ok" to "failinitial" to "fail". |
|---|
| 756 |
"ok" indicates the directive is fine; |
|---|
| 757 |
"failinitial" indicates the directive is current transitioning to the "fail" |
|---|
| 758 |
state or is waiting on a re-check; |
|---|
| 759 |
"fail" indicates the directive has definitely failed. |
|---|
| 760 |
- Fixed a small bug where a directive performing multiple checks (numchecks>1) |
|---|
| 761 |
which fails one of the first checks but passes a subsequent re-check still |
|---|
| 762 |
performs the act2ok action, which it should not do. |
|---|
| 763 |
- Directive threads are named, for easier debugging. The name they are given |
|---|
| 764 |
is the ID of the directive they are executing. |
|---|
| 765 |
- Cleaned up ALIAS code to support being passed in action calls properly. |
|---|
| 766 |
- Cleaned up action calling code. Actions called from action and act2ok now |
|---|
| 767 |
use the same action evaluation function, whether actions are called |
|---|
| 768 |
directly as a function or from Notification objects. Thus actions can be |
|---|
| 769 |
called directly or Notification objects used from both action and act2ok |
|---|
| 770 |
arguments, and can even be combined. |
|---|
| 771 |
- Added a rule argument to RADIUS directive so rules can be written to test |
|---|
| 772 |
radius auths. The variable passed is set in the rule environment and is |
|---|
| 773 |
set to either 0 for failed or 1 for passed. |
|---|
| 774 |
- FILE directive now makes the file statistics from the previous check |
|---|
| 775 |
available so rules can compare the current statistics against the previous |
|---|
| 776 |
statistics to see if files or file metadata have changed over time. |
|---|
| 777 |
Variables are same but prepended by 'last', e.g.: rule='md5 != lastmd5' |
|---|
| 778 |
- Fixed bug: Connection not being closed in all cases for PORT isalive() |
|---|
| 779 |
function. |
|---|
| 780 |
- Added new directive, FILE, allowing tests to be made on a file based on |
|---|
| 781 |
standard file statistics (size, mode, ownerships, etc) and md5 hashes. |
|---|
| 782 |
- Update lastfailtime in stateok function so any actions called by act2ok |
|---|
| 783 |
will know the full age of the problem. |
|---|
| 784 |
- Added PING directive to provide network ping checking of hosts. |
|---|
| 785 |
- Added initial HP-UX support. |
|---|
| 786 |
- Fixed bug in PROC R() check. |
|---|
| 787 |
|
|---|
| 788 |
|
|---|
| 789 |
Eddie-0.25 (6-Jul-2001) |
|---|
| 790 |
- Changed where varDict action variables are set in some directives so that |
|---|
| 791 |
they are available for act2ok action calls. |
|---|
| 792 |
- Improved error handling in directive.py |
|---|
| 793 |
- Fixed problem with DF list not refreshing itself properly. |
|---|
| 794 |
- Changed CONSPORT config option to CONSOLE_PORT. |
|---|
| 795 |
I find more verbose to be much user-friendlier than less. |
|---|
| 796 |
- Added two new config settings: |
|---|
| 797 |
EMAIL_FROM='emailaddress' |
|---|
| 798 |
EMAIL_REPLYTO='emailaddress' |
|---|
| 799 |
so the From: and Reply-To: fields in the email action can be set. |
|---|
| 800 |
If these are not set, they default to the current USER for the From: field, |
|---|
| 801 |
and '' for the Reply-To: field. |
|---|
| 802 |
- Cleaned up PORT directive isalive() handling Connection Refused exceptions. |
|---|
| 803 |
- Create a QUICKSTART text document to give the impatient a quick way to |
|---|
| 804 |
get Eddie running. |
|---|
| 805 |
- sockets.py: handle port already in use by exiting and signalling the other |
|---|
| 806 |
non-daemon threads to exit. If the port is in use the whole program should |
|---|
| 807 |
exit cleanly with an appropriate error message now. |
|---|
| 808 |
Similarly, exit cleanly (and signal other threads to exit) if too many |
|---|
| 809 |
socket errors. |
|---|
| 810 |
- config.py: Improved error handling; if CONSPORT is not a positive integer a |
|---|
| 811 |
ParseFailure is raised. |
|---|
| 812 |
- The console server thread will not be started if CONSPORT=0. This allows |
|---|
| 813 |
the console feature to be disabled if required. |
|---|
| 814 |
- Main thread will now also exit if please_die Event is set. This allows |
|---|
| 815 |
other threads to signal that the program should exit. |
|---|
| 816 |
- Added act2ok param - allows you to specify a Notification object |
|---|
| 817 |
to use when Check goes from bad to good |
|---|
| 818 |
- Log accepted connections with remote IP:port, for security or whatever. |
|---|
| 819 |
- directive.py: made directive string representation tidier. |
|---|
| 820 |
- sockets.py: Handle "Interrupted system call" (from CTRL-C) nicely. |
|---|
| 821 |
- Chaged eddie.py - changes include cleaning up the way threads |
|---|
| 822 |
are started and stoped, there is now start_threads() and |
|---|
| 823 |
stop_threads(). I did this so that both the scheduler thread |
|---|
| 824 |
and the console socket thread can be started and stop easily |
|---|
| 825 |
when the config changes. |
|---|
| 826 |
- Added config var CONSPORT - this is the port to listen to |
|---|
| 827 |
console connections on. The default is 33343. |
|---|
| 828 |
- Added sockets.py - A sockets interface to the current state of |
|---|
| 829 |
all eddie checks, this will be used for a console like interface. |
|---|
| 830 |
- Removed DEFs and replaced by ALIASes which are now used to define string |
|---|
| 831 |
aliases to be substituted during config parsing, or during action argument |
|---|
| 832 |
parsing. '$' signs are not used anymore, giving a much nicer Python |
|---|
| 833 |
look-and-feel. |
|---|
| 834 |
- Added %(problemage)s %(problemfirstdetect)s to sample MSGs to demonstrate |
|---|
| 835 |
usage. These are substituted for the age of the current directing being |
|---|
| 836 |
false and the time the first false was detected respectively; or empty |
|---|
| 837 |
strings ("") if the problem age is currently 0. |
|---|
| 838 |
- Added more detailed logging of thread usage, making thread problems easier |
|---|
| 839 |
to track. |
|---|
| 840 |
- Added a utils.safe_getstatusoutput() as a thread-safe wrapper around |
|---|
| 841 |
commands.getstatusoutput(). |
|---|
| 842 |
The IPF directive now uses this to avoid deadlocks. |
|---|
| 843 |
- Problem age and First time detected variables are now substitutable values |
|---|
| 844 |
within an email message body, %(problemage)s and %(problemfirstdetect)s, |
|---|
| 845 |
instead of automatically being appended to the bottom of every email. |
|---|
| 846 |
Note, these variables are empty ("") if the problem age is zero. |
|---|
| 847 |
- Changed all os.popen() calls to use the thread-safe utils.safe_popen(). |
|---|
| 848 |
This should prevent deadlocks when multiple directives are gathering info. |
|---|
| 849 |
- Added 'negate' option to LOGSCAN - will match lines which do NOT match the |
|---|
| 850 |
regex. |
|---|
| 851 |
- Added formatted exception traceback to safeCheck() logging. |
|---|
| 852 |
- Fixed socket connect() call in pop3.py to support Python 2.1 |
|---|
| 853 |
- Email admin logs when exiting due to config parse failure. |
|---|
| 854 |
- Added LOGSCAN examples. |
|---|
| 855 |
- Updated sample rules to reflect new config layout and features. |
|---|
| 856 |
- Log Eddie version and systype. |
|---|
| 857 |
Also log when configuration parsing complete. |
|---|
| 858 |
- Cleaned up pop3.py imports. |
|---|
| 859 |
- Added LOGSCAN directive for monitoring logfiles. |
|---|
| 860 |
- Fixed PROC custom rules setting. |
|---|
| 861 |
- Fixed directives setting their own ID only if none set in config. |
|---|
| 862 |
- parseFailure() logs problem to logfile as well as printing to stdout. |
|---|
| 863 |
- Cleaned up sample eddie.cf and added verbose comments. |
|---|
| 864 |
- Catch any uncaught exceptions around main() so they are logged and displayed |
|---|
| 865 |
nicely, making it easier for the Eddie admin to see and act on them. |
|---|
| 866 |
Hence eddie doesn't have to be run from eddie_wrapper with stderr captured |
|---|
| 867 |
(which didn't really work properly anyway). |
|---|
| 868 |
- Fixed socket connect() call in PORT directive to use tuple as argument |
|---|
| 869 |
rather than two arguments. This changed in Python-2.1 (but works with |
|---|
| 870 |
older versions). |
|---|
| 871 |
- Removed the old snpp code which wasn't being used. This should be replaced |
|---|
| 872 |
with updated code. |
|---|
| 873 |
- Elvin config parameters have changed from ELVINHOST and ELVINPORT to |
|---|
| 874 |
ELVINURL and ELVINSCOPE to support Elvin4 properly. |
|---|
| 875 |
- The Elvin tickertape action is now called ticker() [it was just called |
|---|
| 876 |
elvin() before]. |
|---|
| 877 |
- Updated Elvin code to support Elvin4 and moved to new file eddieElvin4.py. |
|---|
| 878 |
Elvin3 will no longer be supported. |
|---|
| 879 |
- Replaced any use of old regex module with new re module (using regex causes |
|---|
| 880 |
warnings with Python-2.1). |
|---|
| 881 |
- Tested under Python-2.1. Had to modify some of the globals to avoid new |
|---|
| 882 |
warnings under 2.1. |
|---|
| 883 |
- Updated system.py to handle 'top' under Solaris 8. |
|---|
| 884 |
- Directive threads are started with safeCheck() which wraps up docheck() |
|---|
| 885 |
in try/except so all un-caught exceptions within that thread will be caught |
|---|
| 886 |
and the thread can exit cleanly. |
|---|
| 887 |
- Cleaned up parsing of 'top' a bit more, so it works better under Solaris 8. |
|---|
| 888 |
- Added support for directive templates. A directive can be created to be |
|---|
| 889 |
only used as a template for other directives, supplying default settings; |
|---|
| 890 |
as well as standard directives can also be used as templates for other |
|---|
| 891 |
directives. |
|---|
| 892 |
Directive template creation, eg: |
|---|
| 893 |
PROC 'template1': template=self scanperiod='5m' checks=2 checkwait=30 |
|---|
| 894 |
PROC 'cron': template='template1' procname='crond' action="..." |
|---|
| 895 |
special template=self means this directive is a template and not to |
|---|
| 896 |
schedule it. |
|---|
| 897 |
Can use other working directives as templates also. |
|---|
| 898 |
Template should be same directive type as directive using it - but this is |
|---|
| 899 |
not enforced because it shouldn't hurt.... directives ignore any arguments |
|---|
| 900 |
they don't need. |
|---|
| 901 |
- Added support for new Directive arguments: |
|---|
| 902 |
numchecks=<int> |
|---|
| 903 |
checkwait=<time> |
|---|
| 904 |
numchecks specifies how many checks a directive should perform before |
|---|
| 905 |
calling its actions. By default this will be 1. Setting this to 2 |
|---|
| 906 |
will force 2 checks before actions are called. It can be set to any |
|---|
| 907 |
positive integer, include 0. 0 is a special case which indicates that |
|---|
| 908 |
this directive will not perform any checks. This could be used to |
|---|
| 909 |
temporarily disabled a directive, for example. |
|---|
| 910 |
checkwait specifies how long the directive will wait before performing |
|---|
| 911 |
its next re-check if numchecks>1. Its value is a standard time specification |
|---|
| 912 |
eg: '5' = 5 seconds; '5s' = 5 seconds; '2m' = 2 minutes; '5h' = 5 hours. |
|---|
| 913 |
By default checkwait is 0 which means the next re-check will run instantly. |
|---|
| 914 |
checkwait should normally be set to a meaningful value if numchecks>1. |
|---|
| 915 |
- Added ALIAS definition. Similar to DEFs but ALIASes are replaced inside |
|---|
| 916 |
action calls, etc. Whereas DEFs are only translated during config file |
|---|
| 917 |
parsing time. |
|---|
| 918 |
Note: DEFs break the Python-like look&feel of the config file and may |
|---|
| 919 |
disappear in the future if they can be replaced neatly. |
|---|
| 920 |
- Cleaned up logging in config.py. LOGFILE should be the first option |
|---|
| 921 |
in eddie.cf so logs end up in the right place. |
|---|
| 922 |
- Handle scanperiod argument in directives so scanperiod can be overrided |
|---|
| 923 |
for each directive. |
|---|
| 924 |
- Signals received during a time.sleep() under Linux cause an IOError |
|---|
| 925 |
exception so just catch these and move on. Main thread should be |
|---|
| 926 |
handling the shutdown cleanly anyway. |
|---|
| 927 |
- Cleaned up directive tokenparsing so base Directive class does as |
|---|
| 928 |
much of the work as possible and user-written directive objects |
|---|
| 929 |
only have to test existance of arguments and setup. |
|---|
| 930 |
- New config format, which is not compatable with old format. |
|---|
| 931 |
All arguments to a directive are now named arguments. |
|---|
| 932 |
- Max number of threads to use can be limited in eddie.cf with the |
|---|
| 933 |
NUMTHREADS variable now. Should be set > 5 for normal use. |
|---|
| 934 |
If set too low checks will never be allowed to run. |
|---|
| 935 |
- Created Radius auth checking directive. |
|---|
| 936 |
- Added clean exiting code to SIGINT, same as SIGTERM. |
|---|
| 937 |
- Cleaned up exiting on SIGTERM signal. The scheduler thread is signalled to |
|---|
| 938 |
die and the main thread will wait for the scheduler thread to receive the |
|---|
| 939 |
signal and exit before exiting cleanly itself. All "worker" threads are |
|---|
| 940 |
ignored and should die of their own accord. |
|---|
| 941 |
- Put semaphores around COM checks which do os.system() calls. |
|---|
| 942 |
Only one COM check will execute at a time. |
|---|
| 943 |
- Made proc.py thread-friendly. |
|---|
| 944 |
- timeQueue is the queueing class derived from Python's Queue class. It is as |
|---|
| 945 |
thread-friendly as Queue, the major difference being objects are inserted |
|---|
| 946 |
into the queue based on a given time. Objects with the lowest times are |
|---|
| 947 |
closest to the front of the queue. |
|---|
| 948 |
To support this, objects have to be added along with their time, so a |
|---|
| 949 |
2-tuple must be added, eg: q.put( (obj, time) ). Similarly q.get() |
|---|
| 950 |
returns the same 2-tuple. |
|---|
| 951 |
An extra public method has been added, over what Queue offers, q.head(). |
|---|
| 952 |
This method returns the item (and time) from the front of the queue, |
|---|
| 953 |
exactly as q.get(), but does not remove it from the queue. |
|---|
| 954 |
- To support the new queueing of jobs, all directives must end by submitting |
|---|
| 955 |
themselves back into the queue. A |
|---|
| 956 |
Config.q.put(self,time.time()+self.scanperiod) will submit itself back |
|---|
| 957 |
into the queue and schedule itself to be run in self.scanperiod seconds. |
|---|
| 958 |
If a directive does not put itself back into the queue it will not be |
|---|
| 959 |
called again (this can be useful if there is some sort of error and the |
|---|
| 960 |
directive should not be called again). |
|---|
| 961 |
- os.popen() appears to cause problems when used by multiple threads at once, |
|---|
| 962 |
so all such calls now use a wrapper, utils.safe_popen() which performs |
|---|
| 963 |
a semaphore lock around os.popen(). utils.safe_pclose() _MUST_ be called |
|---|
| 964 |
after the pipe has been finished with or the semaphore will not be released |
|---|
| 965 |
and all other calls will be blocked forever. |
|---|
| 966 |
- Core of Eddie is now multi-threaded using a scheduler thread to run each |
|---|
| 967 |
check in its own thread. Thread usage is limited so things don't get out |
|---|
| 968 |
of control. |
|---|
| 969 |
The scheduler tracks jobs with a derivative of Python's Queue class which |
|---|
| 970 |
orders items by time, so that the job to be started soonest will be at the |
|---|
| 971 |
front of the queue. This will now allow directives to specify their own |
|---|
| 972 |
scanperiod and execute as often or as little as desired, indepentently of |
|---|
| 973 |
other directives. |
|---|
| 974 |
Modified config files are still automatically detected (sometime within a 10 |
|---|
| 975 |
minute period by the "Housecleaning" thread (main process)) which causes the |
|---|
| 976 |
scheduler to be signalled to exit and then the configs are re-read and a new |
|---|
| 977 |
scheduler will be started up. |
|---|
| 978 |
|
|---|
| 979 |
|
|---|
| 980 |
Eddie-0.24 (1-Oct-2000) |
|---|
| 981 |
- Added custom disksuite check to alert if any metadevices require |
|---|
| 982 |
maintenance. Skips checking if /usr/opt/SUNWmd/sbin/metastat not found. |
|---|
| 983 |
<
|---|