| 1 | Eddie CHANGES |
|---|
| 2 | (reverse chronological order) |
|---|
| 3 | |
|---|
| 4 | Eddie-0.37.2 (04-Nov-2008) |
|---|
| 5 | - Updated the eddie-agent SMF manifest, removing the need for a |
|---|
| 6 | method script. |
|---|
| 7 | - Bugfix: Solaris filesystem information was not being output properly. |
|---|
| 8 | - Add the missing vim syntax colouring to the "regfile" statement. |
|---|
| 9 | Patch submitted by Peter Poeml. |
|---|
| 10 | - Improved the parsing of the LOGSCAN negate values. |
|---|
| 11 | - Updated documentation & comments for the DBI directive. |
|---|
| 12 | - Cleaned up part of the LOGSCAN code, for computing the number |
|---|
| 13 | of matched and unmatched lines. LOGSCAN now defaults to matching |
|---|
| 14 | all lines if neither regex or regfile arguments are defined. |
|---|
| 15 | - Fixed the documentation for LOGSCAN, which was showing examples |
|---|
| 16 | using the linecount variable as the number of lines matched, |
|---|
| 17 | instead of matchedcount. This had lead to some confusion. |
|---|
| 18 | - Moved eddie_wrapper shell script to contrib directory. |
|---|
| 19 | - Renamed "HP-UX" to "HP_UX" as the former was not a valid Python |
|---|
| 20 | module/package name. |
|---|
| 21 | - Moved all operating system specific modules under eddietool.arch. |
|---|
| 22 | - Replace characters in osname, osver, osarch that cannot be used |
|---|
| 23 | in Python module names. |
|---|
| 24 | - Exceptions are now defined as sub-classes of Exception. |
|---|
| 25 | - Restructured source as an installable Python package (eddietool) with a |
|---|
| 26 | console script "eddie-agent". setuptools is used so Eddie can be |
|---|
| 27 | distributed as an egg. |
|---|
| 28 | - HTTP directive gracefully handles the case of cookielib module not |
|---|
| 29 | being available (i.e. in Python 2.3 and earlier). The persist_cookies |
|---|
| 30 | option will be disabled if cookielib cannot be imported. |
|---|
| 31 | |
|---|
| 32 | Eddie-0.36 (04-Dec-2007) |
|---|
| 33 | - Eddie will now throw an error and exit if a config file cannot be read. |
|---|
| 34 | - Added persist_cookies option to HTTP directive. It is used to |
|---|
| 35 | specify whether to persist server-defined cookies on the client |
|---|
| 36 | side. If enabled, Eddie HTTP checks will send back any cookies |
|---|
| 37 | defined by the server, doing its best to obey expire times. |
|---|
| 38 | Disabled by default. |
|---|
| 39 | - Added "server" option to HTTP directive, used to specify the server |
|---|
| 40 | name to connect to. This will be used instead of the server name |
|---|
| 41 | from the URL. The server name from the URL will still be used for |
|---|
| 42 | the HTTP host header. |
|---|
| 43 | - SunOS: Changed mem_free and mem_swapfree to return as bytes (although |
|---|
| 44 | they are rounded up to the nearest kbyte). |
|---|
| 45 | - Added Solaris SMF method/manifest files to contrib. |
|---|
| 46 | - Full find & replace of all evil tabs to spaces. |
|---|
| 47 | - Added some tools to contrib/spread/ to use for testing elvinrrd message |
|---|
| 48 | passing over Spread. These tools send & receive elvinrrd messages the |
|---|
| 49 | same way that Eddie and ElvinRRD do. |
|---|
| 50 | - Added support for Spread messaging as an alternative to Elvin. |
|---|
| 51 | - Bugfix: make sure body is initialised so MSG parsing doesn't fail if a |
|---|
| 52 | HTTP check fails before assigning anything to the body. |
|---|
| 53 | - Bugfix: reason was not defined before actions were called, causing |
|---|
| 54 | exception in some cases. |
|---|
| 55 | - Bugfix: make sure status is initialised before generating any alerts. |
|---|
| 56 | - Changes to the Elvin code to make re-connections more reliable. Use |
|---|
| 57 | elvin.SyncLoop instead of elvin.ThreadedLoop. Disabled auto-discovery. |
|---|
| 58 | - Implemented the DiskStatistics data collector for Linux. |
|---|
| 59 | This uses a new linux_diskio module which has been added to the Eddie |
|---|
| 60 | distribution. |
|---|
| 61 | - Correct tcp/udp port bug in SP class: searching for "port=123" was matching |
|---|
| 62 | to a bound port of 1234 because of use of string.find(). |
|---|
| 63 | - For any var name that contains "_pages_", create a "_bytes_" version. |
|---|
| 64 | - Added vars: ctr_swap_pages_inactive, ctr_bytes_per_page |
|---|
| 65 | - New var for "COM" directive: outfields |
|---|
| 66 | - Added "DBI" directive, for database query checking. |
|---|
| 67 | Based heavily on the (undocumented) mysql directive. |
|---|
| 68 | - Solve startup race condition for "checkdependson": initial state cannot be "ok". |
|---|
| 69 | Create state "unknown", and change "Directive.checkDependencies" to consider |
|---|
| 70 | all non-"ok" status to be failure (this include "failinitial"). |
|---|
| 71 | - Two important enhancements to Directive.tokenparser: |
|---|
| 72 | 1) When parsing the config file, for every argument in the directive, if its |
|---|
| 73 | value is a STRING type, then use utils.typeFromString() to set its value, |
|---|
| 74 | so we get a decent data type for it (int, float, string). This reduces the |
|---|
| 75 | typecasting in evaluated expressions. |
|---|
| 76 | 2) When parsing the config file, for every scalar (int, float, string) argument |
|---|
| 77 | in the directive, put it into the defaultVarDict. This allows for setting |
|---|
| 78 | "variables" in the directive, and then using that in the rule. For example, |
|---|
| 79 | if the directive (or template) has "maxcpu=30", then the rule can address |
|---|
| 80 | this like "rule='pcpu > _maxcpu'". |
|---|
| 81 | - Added "--daemon" command-line option, and supporting "utils.create_child" |
|---|
| 82 | routine. Also created brief documentation for all command-line switches. |
|---|
| 83 | - Changed in logscanning.py: Detect inode number change: if watched file's |
|---|
| 84 | inode number changes, then read from start of the file. |
|---|
| 85 | - For the "email" action, convert "\n" strings in the body text into newline |
|---|
| 86 | characters. This allows for: |
|---|
| 87 | email('foo@bar.com', 'host: %(h)s', 'Host: %(h)s\nAge: %(problemage)s') |
|---|
| 88 | instead of having odd-looking multi-line strings in the config file. |
|---|
| 89 | - Added "RESCANCONFIGS" config option. Defaults to original behavior. |
|---|
| 90 | This option allows the disabling of Eddie's constant scanning and reloading |
|---|
| 91 | if its config files. |
|---|
| 92 | - Fixed very minor bug where action variables were updated multiple times |
|---|
| 93 | for no good reason. Reported by Mark Taylor. |
|---|
| 94 | - Added "log" action. Use it to append to a log file, log via syslog, or |
|---|
| 95 | print on the eddie tty. |
|---|
| 96 | - Log the ImportError message if a requested data collector module fails |
|---|
| 97 | to import. Helps users debug why the module won't load. |
|---|
| 98 | - Replaced references to whrandom module with random instead. whrandom is |
|---|
| 99 | being deprecated. |
|---|
| 100 | - Changed option parsing to use optparse/optik (ticket #5) and added |
|---|
| 101 | support for specifying an alternate config file from the command line |
|---|
| 102 | (ticket #6). |
|---|
| 103 | |
|---|
| 104 | Eddie-0.35 (31-Oct-2005) |
|---|
| 105 | - Linux: Added a dummy diskdevice module for Linux. The implementation of |
|---|
| 106 | this is still yet to be done. |
|---|
| 107 | - Fixed compatibility issue with FILE directive and Python pre 2.3. Those |
|---|
| 108 | versions do not have os.path.sep. |
|---|
| 109 | - Added regfile to LOGSCAN directive, which points to a file containing |
|---|
| 110 | multiple regular expressions to match against. Patch submitted by |
|---|
| 111 | Dougal Scott. |
|---|
| 112 | - Linux: Fix to handle /proc/stat changes on Linux kernel 2.6.11+. |
|---|
| 113 | - Enhancements to PRTDIAG directive: |
|---|
| 114 | * Report details of any hardware failures on U280R. |
|---|
| 115 | * Added support for U480R hardware. |
|---|
| 116 | Patch submitted by Dougal Scott. |
|---|
| 117 | - Improvement to HTTP directive handling if the Python does not support SSL |
|---|
| 118 | connections. Patch submitted by Dougal Scott. |
|---|
| 119 | - Added SMTP directive which provides a simple facility to measure the response |
|---|
| 120 | time of an SMTP connection to a server. Submitted by Dougal Scott. |
|---|
| 121 | - Fixed minor bug where length of time of thread count over threshold was |
|---|
| 122 | not being shown in minutes when it was expected to be. |
|---|
| 123 | Patch submitted by Dougal Scott. |
|---|
| 124 | - System specific Directives are now automatically loaded from a Directives |
|---|
| 125 | subdirectory beneath the system lib directory if it exists. |
|---|
| 126 | Example: Linux-specific directive modules will be loaded from: |
|---|
| 127 | lib/Linux/Directives/ |
|---|
| 128 | Patch submitted by Dougal Scott. |
|---|
| 129 | - SP directive now supports a bindaddr value of "any". This will cause the |
|---|
| 130 | directive to ignore the bind address when testing (ie: compare port only). |
|---|
| 131 | Patch submitted by Dougal Scott. |
|---|
| 132 | - Use Python True/False instead of 1/0 for booleans in common directives. |
|---|
| 133 | - Added 'expectrexp' option to PORT directive. This allows regular expression |
|---|
| 134 | matching against the response of a PORT connection. |
|---|
| 135 | Patch submitted by Dougal Scott. |
|---|
| 136 | - Added a 'missing' flag to FILE directive which indicates when an existing |
|---|
| 137 | file has disappeared. |
|---|
| 138 | Also added a 'lastexists' variable for use in FILE rules. |
|---|
| 139 | - Improvements to the keepdiff option of the FILE directive. |
|---|
| 140 | * Keep copies of files being monitored in WORKDIR/FILEprevs/ where |
|---|
| 141 | WORKDIR is the new option defined in eddie.cf. |
|---|
| 142 | * If the copy of a file in FILEprevs disappears then set an appropriate |
|---|
| 143 | message for action output. |
|---|
| 144 | * If the copy of a file in FILEprevs disappears then make sure another |
|---|
| 145 | copy is saved. |
|---|
| 146 | * Use semi-readable unique filenames for the saved copies. |
|---|
| 147 | - Added get_work_dir() and set_sub_work_dir() functions to utils.py for |
|---|
| 148 | directive code to call to retrieve the WORKDIR location. set_sub_work_dir() |
|---|
| 149 | is used to create a subdirectory within WORKDIR. It will raise WorkdirError |
|---|
| 150 | if it fails. Otherwise it returns the full directory path. |
|---|
| 151 | - Added config option WORKDIR which defines a location where Eddie can |
|---|
| 152 | store temporary files. This can be used by directives that need to |
|---|
| 153 | save some information or state to the filesystem. The directory can |
|---|
| 154 | be safely removed when Eddie is not running. Eddie does not clean |
|---|
| 155 | up the directory itself (it may clean up some files before shutting |
|---|
| 156 | down). The whole directory tree will be created on startup if it |
|---|
| 157 | doesn't already exist. Eddie may create subdirectories within this |
|---|
| 158 | WORKDIR directory. Example: |
|---|
| 159 | WORKDIR="/var/tmp/eddieworkdir" |
|---|
| 160 | - Win32: Catch an exception that is randomly generatede by |
|---|
| 161 | win32pdh.GetFormattedCounterValue() sometimes. The returned error is |
|---|
| 162 | unhelpful, |
|---|
| 163 | (-2147481640, 'GetFormattedCounterValue', 'No error message is available') |
|---|
| 164 | so just return None values instead of letting the thread die. |
|---|
| 165 | - Added capability for FILE directive to keep diffs of changes to a file. |
|---|
| 166 | The diffs can then be sent in an email when a change is detected. |
|---|
| 167 | New FILE arguments: |
|---|
| 168 | keepdiff={true|false} |
|---|
| 169 | - flag whether to keep a copy of the file to produce diffs |
|---|
| 170 | context_lines=<integer> |
|---|
| 171 | - how many context lines to show around the changed lines |
|---|
| 172 | difftype={context|unified|full} |
|---|
| 173 | - which diff method to use (see Python difflib module for more information) |
|---|
| 174 | - Added README.win32.txt for Win32 platform install information. |
|---|
| 175 | - Added rules/win32_sample.rules - a sample set of Win32 rules. |
|---|
| 176 | - Win32 df collector: ignore A: and B: drives when collecting stats. |
|---|
| 177 | Otherwise Windows prompts for the media to be inserted! (Unless a |
|---|
| 178 | floppy is in the drive ... yeah right) |
|---|
| 179 | - Win32: Fix win32perf doctest for systems that have an A: drive. |
|---|
| 180 | - Win32: Added support for Win32 systems with datacollectors: df, |
|---|
| 181 | diskdevice, netstat, proc and system. Most of them use win32perf |
|---|
| 182 | module which is a wrapper for Mark Hammond's win32all package. |
|---|
| 183 | - Added doctests for FILE directive. |
|---|
| 184 | - Fetch hostname from platform.node() if os.uname() is not available. |
|---|
| 185 | (Fix for Win32 compatibility.) |
|---|
| 186 | - Added a doctest for timeQueue module. |
|---|
| 187 | - Fixed bug in timeQueue in Python 2.4+ support where head() call was |
|---|
| 188 | actually performing a get(). |
|---|
| 189 | - Use platform-independent method (ie: os.path) for constructing config |
|---|
| 190 | paths, rather than assuming '/' is path separator. (Fix for Win32 |
|---|
| 191 | compatibility.) |
|---|
| 192 | - Added support for systems that do not support os.uname() - try to use |
|---|
| 193 | the platform module instead (ie: Win32). Check that the system handles |
|---|
| 194 | each signal before trying to register signal handlers for them (Win32 |
|---|
| 195 | doesn't support some of the signals). |
|---|
| 196 | - Solaris: Catch some more possible errors when parsing 'ps' output for |
|---|
| 197 | Solaris. The %CPU field can be a '-' instead of a decimal number (seems |
|---|
| 198 | to be that way for zombie processes). |
|---|
| 199 | - Solaris: Handle parsing netstat output for Solaris 10. |
|---|
| 200 | - Fixed small bug with eddie_wrapper when EDDIE_ADMIN was not defined. |
|---|
| 201 | - Big improvements to the Redhat init.d script in the contrib directory, |
|---|
| 202 | making it much more compatible with all new versions of Redhat Linux. |
|---|
| 203 | - Added chkconfig lines to sample init.d script for Redhat Linux. |
|---|
| 204 | - Linux: Detecting interpreters in Linux process lists was broken. |
|---|
| 205 | - Linux: added support for new netstat formats in newer kernels. |
|---|
| 206 | - Linux: Get VM statistics from /proc/vmstat (on newer kernels). |
|---|
| 207 | - Added support for Python 2.4 Queue class, which Eddie's timeQueue class is |
|---|
| 208 | derived from. The implementation of Queue changed slightly in Python 2.4. |
|---|
| 209 | - Log the version of Python in use at startup, along with systype. |
|---|
| 210 | - Added optional definition of EDDIE_ADMIN environment variable in the rc |
|---|
| 211 | startup scripts to receive Eddie restart/exception notifications from |
|---|
| 212 | eddie_wrapper. |
|---|
| 213 | - Eddie now prints no output to stdout by default. Any global exceptions |
|---|
| 214 | are printed to stderr on exiting. |
|---|
| 215 | - eddie_wrapper improvements: eddie output on exit is only emailed to |
|---|
| 216 | $EDDIE_ADMIN if the Eddie return-code is non-zero. By default no |
|---|
| 217 | $EDDIE_ADMIN is set (so no email is sent by default) and $EDDIE_ADMIN |
|---|
| 218 | can now be defined outside the eddie_wrapper script (ie: in a startup |
|---|
| 219 | script). |
|---|
| 220 | - Bugfix: console now shows groups that match special hostnames, those that |
|---|
| 221 | contain '.' or '-' characters. A shortcut hack that will be replaced in |
|---|
| 222 | the future. |
|---|
| 223 | - FreeBSD: Added fetching of more system counters from '/sbin/sysctl -a'. |
|---|
| 224 | - FreeBSD: process list parsing was broken. |
|---|
| 225 | - FreeBSD: proc module needed to import sys so that exceptions could |
|---|
| 226 | be logged. |
|---|
| 227 | - Added a bit of a hack (sorry) which allows hostnames containing '-' to be |
|---|
| 228 | used as group names. The '-' must be replaced with '_' for the match to |
|---|
| 229 | work. This is because group names in the config cannot contain characters |
|---|
| 230 | like '-'. This will be resolved in the future when proper matching options |
|---|
| 231 | are implemented fully. |
|---|
| 232 | - Solaris: Better handling of Solaris process date/time parsing errors. |
|---|
| 233 | Patch submitted by Dougal Scott. |
|---|
| 234 | - Solaris: PRTDIAG directive: added support for Sun Blade servers |
|---|
| 235 | (SUNW,Serverblade1). Patch submitted by Dougal Scott. |
|---|
| 236 | - When sending email by the SMTP method and multiple SMTP servers are |
|---|
| 237 | available, only log failure if all SMTP servers are unavailable to |
|---|
| 238 | send the message. Patch submitted by Dougal Scott. |
|---|
| 239 | - FreeBSD: Added collecting swap usage stats from '/usr/sbin/pstat -sk'. |
|---|
| 240 | - Bugfix: Elvin ElvinConnectMaxRetries exceptions were not being caught |
|---|
| 241 | properly. |
|---|
| 242 | - Solaris: SunOS df data collector would fail when a CD was inserted, as |
|---|
| 243 | total files is reported as -1. Patch submitted by Dougal Scott. |
|---|
| 244 | - FreeBSD raises a socket exception ('Host is down') when a host is |
|---|
| 245 | unreachable, which can be safely ignored by the ping code. |
|---|
| 246 | - Improved the sample config for N COMMONFIXED. |
|---|
| 247 | - FreedBSD: Added support for FreeBSD system, proc, netstat, df modules. |
|---|
| 248 | - A quick fix to the config parser which means that Eddie will run on systems |
|---|
| 249 | that do not yet have system-specific modules. Non system-specific |
|---|
| 250 | directives will still work on these systems, such as all the network |
|---|
| 251 | directives (PING, SNMP, etc) and others like FILE. |
|---|
| 252 | - Solaris: Fixed DataFailure exception when kstat command cannot be found. |
|---|
| 253 | - Catch an exception properly in FS directive when filesystem was not |
|---|
| 254 | found. |
|---|
| 255 | - Fixed fstpl directive in common.rules example file. |
|---|
| 256 | - Modified eddie_wrapper to use a Python call to fetch the current time |
|---|
| 257 | rather than relying on GNU date. This has improved compatability with |
|---|
| 258 | more types of systems, as it can be assumed that Python will be available |
|---|
| 259 | to run EDDIE ! |
|---|
| 260 | - Handle Elvin connection problems more gracefully, backing off before |
|---|
| 261 | retrying. |
|---|
| 262 | - Disabled counting of file descriptors in use, which is only needed for |
|---|
| 263 | debugging on rare occasions. |
|---|
| 264 | - Bugfix in HTTP when trying to determine error string for some types of |
|---|
| 265 | exceptions. |
|---|
| 266 | - Improved PING multi-threaded reliability on platforms that were causing |
|---|
| 267 | problems because they simply used the current pid as the icmp_id. |
|---|
| 268 | On platforms where all threads share the same process id this was causing |
|---|
| 269 | unreliable ping results as the wrong threads would accept the wrong icmp |
|---|
| 270 | replies. It now uses the current thread object's memory address for the |
|---|
| 271 | icmp_id to make them as unique as possible and avoid such confusion. |
|---|
| 272 | - New directive: TAPE - functions almost exactly like the DISK directive |
|---|
| 273 | but fetches stats from the TapeStatistics class from the diskdevice |
|---|
| 274 | module (which is currently only available for Solaris). |
|---|
| 275 | Example: |
|---|
| 276 | TAPE st52_thruput: |
|---|
| 277 | device='st52' |
|---|
| 278 | scanperiod='5m' |
|---|
| 279 | rule='1' # always perform action |
|---|
| 280 | action='elvinrrd("tape-%(h)s_%(device)s", "rbytes=%(nread)s", "wbytes=%(nwritten)s")' |
|---|
| 281 | - New directive, DISK. This uses the new DiskStatistics data collector from |
|---|
| 282 | a diskdevice module (available for Solaris-only so far) to enable rules |
|---|
| 283 | to be created using disk device activity stats. |
|---|
| 284 | Example: a directive which collects bytes read/written to the disk device |
|---|
| 285 | md20 and sends these counters to elvinrrd |
|---|
| 286 | DISK md20_thruput: |
|---|
| 287 | device='md20' |
|---|
| 288 | scanperiod='5m' |
|---|
| 289 | rule='1' # always perform action |
|---|
| 290 | action='elvinrrd("disk-%(h)s_%(device)s", "rbytes=%(nread)s", "wbytes=%(nwritten)s")' |
|---|
| 291 | - Solaris: added a new Data Collector, DiskStatistics, in module diskdevice.py |
|---|
| 292 | (for Solaris only so far). On Solaris this collects disk activity statistics |
|---|
| 293 | from a call to kstat, ie, '/usr/bin/kstat -p -c disk'. All stats generated |
|---|
| 294 | by that command are collected for each disk and made available to directives. |
|---|
| 295 | - Solaris: enhanced the network interface statistics collection to fetch |
|---|
| 296 | more detailed stats from 'netstat -k' for each physical interface. |
|---|
| 297 | An example of the statistics now available for an interface (hme0 on 5.7) |
|---|
| 298 | are: |
|---|
| 299 | ipackets 65360226 ierrors 25 opackets 77502512 oerrors 0 collisions 0 |
|---|
| 300 | defer 0 framing 0 crc 0 sqe 0 code_violations 0 len_errors 0 |
|---|
| 301 | ifspeed 100 buff 0 oflo 0 uflo 0 missed 25 tx_late_collisions 0 |
|---|
| 302 | retry_error 0 first_collisions 0 nocarrier 0 inits 7 nocanput 440 |
|---|
| 303 | allocbfail 0 runt 0 jabber 0 babble 0 tmd_error 0 tx_late_error 0 |
|---|
| 304 | rx_late_error 0 slv_parity_error 0 tx_parity_error 0 rx_parity_error 0 |
|---|
| 305 | slv_error_ack 0 tx_error_ack 0 rx_error_ack 0 tx_tag_error 0 |
|---|
| 306 | rx_tag_error 0 eop_error 0 no_tmds 0 no_tbufs 0 no_rbufs 0 |
|---|
| 307 | rx_late_collisions 0 rbytes 1726897560 obytes 834302609 multircv 7535 multixmt 0 |
|---|
| 308 | brdcstrcv 248816 brdcstxmt 1667 norcvbuf 440 noxmtbuf 0 phy_failures 0 |
|---|
| 309 | as well as info from 'netstat -in' such as mtu, network, etc. |
|---|
| 310 | - Solaris: now collects more detailed filesystem information in SunOS/df.py, |
|---|
| 311 | including inode usage, filesystem type, flags, and blocks as well as kBytes |
|---|
| 312 | used. The full list of variables now available to directives is: |
|---|
| 313 | fsname - filesystem name (string) |
|---|
| 314 | mountpt - mount point (string) |
|---|
| 315 | size - size of filesystem in kBytes (int) |
|---|
| 316 | used - kBytes used (int) |
|---|
| 317 | avail - kBytes free (int) |
|---|
| 318 | pctused - percentage of filesystem used (float) |
|---|
| 319 | totalblocks - total amount of physical blocks (512 Bytes/block) (int) |
|---|
| 320 | usedblocks - number of physical blocks used (int) |
|---|
| 321 | availblocks - number of physical blocks available for unprivileged users (int) |
|---|
| 322 | freeblocks - number of physical blocks free (int) |
|---|
| 323 | blocksize - filesystem (logical) block size (int) |
|---|
| 324 | fragsize - filesystem fragmentation size (int) |
|---|
| 325 | totalinodes - total inodes on filesystem (int) |
|---|
| 326 | usedinodes - number of inodes used (int) |
|---|
| 327 | availinodes - number of inodes left available (int) |
|---|
| 328 | pctinodes - percentage of inodes used (float) |
|---|
| 329 | filesysid - filesystem id (int) |
|---|
| 330 | fstype - type of filesystem (string) |
|---|
| 331 | flag - filesystem flags (string) |
|---|
| 332 | filelen - max filename length (int) |
|---|
| 333 | Thanks to Dougal Scott for submitting this patch. |
|---|
| 334 | - When matching hostnames to group names, ignore any domain parts of the |
|---|
| 335 | hostname it is fully-qualified. Group names cannot contain |
|---|
| 336 | non-alphanumeric characters, so will only match the host part of a FQDN. |
|---|
| 337 | - Bugfix: clear checkdependson if it is assigned an empty string. |
|---|
| 338 | - Solaris: improvement to uptime/loadavg stats collection where it is |
|---|
| 339 | possible for the "day(s)" section of /usr/bin/uptime output to be |
|---|
| 340 | missing (usually if wtmpx rotated more often than the system boot, |
|---|
| 341 | thus losing the last 'reboot' entry) so SunOS/system.py now handles |
|---|
| 342 | this exceptional case. |
|---|
| 343 | |
|---|
| 344 | Eddie-0.34 (13-Sep-2004) |
|---|
| 345 | - OpenBSD: collect in/out byte counters for network interfaces, which |
|---|
| 346 | requires an extra netstat call. |
|---|
| 347 | - OpenBSD: added drops counter to network interface stats. |
|---|
| 348 | - OpenBSD: fixed some bugs preventing network interface statistics collection |
|---|
| 349 | from working properly. |
|---|
| 350 | - Improved handling of exceptions when counting file descriptors in use. |
|---|
| 351 | Instead of raising a global exception (and causing EDDIE to die) just log |
|---|
| 352 | the exception and carry on. |
|---|
| 353 | - Perform global housekeeping duties more often. Now they are every |
|---|
| 354 | 1 minute instead of every 10 minutes. This means that changes to |
|---|
| 355 | config and rules files will be picked up much faster. |
|---|
| 356 | - Added pysnmp module to Extra dir, which EDDIE uses for making SNMP queries. |
|---|
| 357 | - Extra 3rd-party modules are now being distributed with EDDIE. They will |
|---|
| 358 | live in lib/common/Extra/ and are provided to make installation simpler |
|---|
| 359 | for commonly-used modules. |
|---|
| 360 | - HTTP: Make sure 'ip' message variable is initialized in HTTP directives. |
|---|
| 361 | - HTTP: Some HTTP response exceptions were not being caught properly. |
|---|
| 362 | - HTTP: Some socket.timeout checks weren't checking for the correct version |
|---|
| 363 | of Python (which was causing AttributeError exceptions). |
|---|
| 364 | - HTTP: Changed the logging of response body read() exceptions which were not |
|---|
| 365 | working for some types of exceptions. |
|---|
| 366 | - Made eddie_wrapper smarter about finding a date or gdate command to use. |
|---|
| 367 | - Darwin: Fixed a bug parsing vmstat statistics. These counters were |
|---|
| 368 | being truncated (and hence wrong) before. |
|---|
| 369 | - Darwin: Better handling of parsing errors in the proc data collector. |
|---|
| 370 | - The COM directive now shares the utils.systemcall_semaphore semaphore |
|---|
| 371 | rather than relying on its own. This prevents conflicts between any |
|---|
| 372 | threads that need to perform a system() (or os.popen() or |
|---|
| 373 | commands.getstatusoutput()) simultaneously. |
|---|
| 374 | Thanks to Denis Menshikov for verifying this issue. |
|---|
| 375 | - Bugfix for SP directive determining the right protocol (Dougal Scott). |
|---|
| 376 | - Bugfix for a problem that occasionally the get TCPtable returns no entries |
|---|
| 377 | for no obvious reason. This means that all the SP style checks would |
|---|
| 378 | start complaining that no one is listening (Dougal Scott). |
|---|
| 379 | - If ELVINURL and ELVINSCOPE are both undefined in eddie.cf then disable |
|---|
| 380 | Elvin functionality. |
|---|
| 381 | - Update to MYSQL directive adding "result#" variable (Dougal Scott). |
|---|
| 382 | - Converted mysql.py from DOS line endings to UNIX. |
|---|
| 383 | - Fixed 'daemon' call in contrib init script so it works properly on newer |
|---|
| 384 | versions of Redhat. |
|---|
| 385 | - Added new exception DataFailure. |
|---|
| 386 | Changed exceptions to be subclasses of Exception. |
|---|
| 387 | Catch DataFailure exceptions from collectData(). These are raised if the |
|---|
| 388 | Data Collector encounters a major problem collecting the data. |
|---|
| 389 | - Added support for Redhat Enterprise Linux (or perhaps newer kernels 2.4.21+) |
|---|
| 390 | which has extra stats added to the cpu fields in /proc/stat. The cpu counters |
|---|
| 391 | now available with these kernels are: |
|---|
| 392 | ctr_cpu_user |
|---|
| 393 | ctr_cpu_nice |
|---|
| 394 | ctr_cpu_system |
|---|
| 395 | ctr_cpu_idle |
|---|
| 396 | ctr_cpu_iowait |
|---|
| 397 | ctr_cpu_hardirq |
|---|
| 398 | ctr_cpu_softirq |
|---|
| 399 | |
|---|
| 400 | Eddie-0.33 (15-Jul-2004) |
|---|
| 401 | - Handle socket timeout exceptions properly when HTTP response read() fails. |
|---|
| 402 | - Handle socket.settimeout() not being available on Python pre-2.3 versions. |
|---|
| 403 | - A new HTTP rule/action variable 'timedout' has been added which will be set |
|---|
| 404 | to 1 if a socket timeout exception has occurred, otherwise it will be 0. |
|---|
| 405 | - Added HTTP directive option 'request_timeout' which specifies how long a |
|---|
| 406 | HTTP(S) connection should wait for a response before timing out with an |
|---|
| 407 | error. This makes use of a new Python 2.3 feature where socket timeouts |
|---|
| 408 | can be configured, hence this option is only available when Eddie is |
|---|
| 409 | running on Python 2.3+. |
|---|
| 410 | - Better defaults for SENDMAIL and ELVIN settings in sample eddie.cf. |
|---|
| 411 | - Added better logging of HTTP directive actions. |
|---|
| 412 | - Enhancements to HTTP directive: |
|---|
| 413 | Supports URLs with non-standard ports, eg: http://localhost:8080/ |
|---|
| 414 | Added finer grained timing of four parts of the HTTP connection: |
|---|
| 415 | time_resolve - elapsed time to resolve hostname to IP |
|---|
| 416 | time_connect - elapsed time to connect to server |
|---|
| 417 | time_request - elapsed time to send HTTP/S request to server |
|---|
| 418 | time_response - elapsed time to retrieve the server response (and close connection) |
|---|
| 419 | time - elapsed total time (sum of above) |
|---|
| 420 | - Added system-specific sample rules for Linux & Solaris. |
|---|
| 421 | - Added testing ruleset for OpenBSD in development/testing/. |
|---|
| 422 | - Added initial OpenBSD support, thanks to John McInnes. |
|---|
| 423 | - DataCollect now logs what module is being requested for import. |
|---|
| 424 | - Fixed act2ok bug in FILE test. |
|---|
| 425 | - Remove accidental accented character from nice() comments. |
|---|
| 426 | It was causing a DeprecationWarning in Python 2.3.3+. |
|---|
| 427 | - Created a full directive test suite for Darwin (OS X) to provide standard |
|---|
| 428 | testing of all possible directives (or as many as possible). |
|---|
| 429 | These live in development/testing/. |
|---|
| 430 | _ PING: PING directive was logging pktloss as decimal when it should have been |
|---|
| 431 | a percentage. |
|---|
| 432 | - SP: Local address IP for SP directives (using netstat data-collector) can now |
|---|
| 433 | be specified as '*' or '0.0.0.0' for Solaris. '*' is automatically |
|---|
| 434 | converted to '0.0.0.0' for consistency. |
|---|
| 435 | - First version of OS-specific modules ported to Mac OS X (Darwin). |
|---|
| 436 | Tested on OS X 10.3.3 (Darwin 7.3.0). Needs plenty more testing. |
|---|
| 437 | - HTTP: Initialize HTTP directive exception data so variable substitution in |
|---|
| 438 | messages doesn't fail. |
|---|
| 439 | - Added new directive argument: checktime |
|---|
| 440 | Used to restrict directive execution to specified times. The value |
|---|
| 441 | is a Python expression which can use various variables representing |
|---|
| 442 | the current time and day: |
|---|
| 443 | day ('mon', 'tue', etc); time (HHMM); hour (0-23); minute (0-59); second (0-59). |
|---|
| 444 | And for shorthands, the fixed lists: |
|---|
| 445 | weekdays ('mon' - 'fri'), weekend ('sat', 'sun'). |
|---|
| 446 | Examples: |
|---|
| 447 | checktime='day=="mon" or day=="tue"' |
|---|
| 448 | checktime='day in weekdays and hour>18' |
|---|
| 449 | - Only perform act2ok action(s) if some actions were already called. |
|---|
| 450 | In cases where the check fails but actiondependson causes actions to |
|---|
| 451 | be skipped, we don't need the act2ok actions to be called. |
|---|
| 452 | - Added MYSQL directive submitted by Dougal Scott. |
|---|
| 453 | - PING: Fixed a socket exception for gethostbyname failures. |
|---|
| 454 | - Added option to disable a directive. Specify 'disabled=1' in a directive |
|---|
| 455 | to force it to be disabled. |
|---|
| 456 | - SNMP directive now supports 64-bit counters split into high/low OIDs. Specify |
|---|
| 457 | these as "OIDhigh:OIDlow". |
|---|
| 458 | Example: |
|---|
| 459 | oid='1.3.6.1.2.1.2.2.1.10.2:1.3.6.1.2.1.2.2.1.10.3' |
|---|
| 460 | Where the first OID is the High 32 bits and the second OID is the lower 32 bits. |
|---|
| 461 | - Added an FS template, fstpl, to sample common.rules. |
|---|
| 462 | |
|---|
| 463 | Eddie-0.32 (21-Apr-2003) |
|---|
| 464 | - Added an exception handler for httplib read() where it can fail in |
|---|
| 465 | some circumstances. |
|---|
| 466 | - Fixed HTTP timing so that the whole HTTP session was timed, not just the |
|---|
| 467 | connect part. This was mis-leading before. |
|---|
| 468 | - If no output from COM directive, set outfield1 anyway so rule |
|---|
| 469 | strings don't break. Suggested by Arcady Genkin. |
|---|
| 470 | - Changed some sample rules to use ALERT_EMAIL alias rather than "alert" |
|---|
| 471 | fixed email address. Thanks to Zac Stevens <zts@itga.com.au> for |
|---|
| 472 | pointing them out. |
|---|
| 473 | - Added restart option to redhat init.d script in contrib. |
|---|
| 474 | - Added new directive parameter: actionmaxcalls - defines the maxmimum number |
|---|
| 475 | of times actions will be called for a particular failure. |
|---|
| 476 | - Minor bugfix: sendmail_smtp() was returning wrong return codes; successful |
|---|
| 477 | posts were showing as failures, etc. |
|---|
| 478 | - Added new directive parameter: excludehosts |
|---|
| 479 | Directive will be skipped on any hosts specified by excludehosts. |
|---|
| 480 | Specified as a string containing a comma-separated list of hostnames. |
|---|
| 481 | - If groups of the same name are defined, merge them together rather than |
|---|
| 482 | throwing an error. This allows for more custom rule configurations. |
|---|
| 483 | Requested by Arcady Genkin <agenkin@cdf.toronto.edu> |
|---|
| 484 | |
|---|
| 485 | Eddie-0.31 (11-Dec-2002) |
|---|
| 486 | - Increased Linux system counters from int to long. |
|---|
| 487 | - Fixed bug with isfile/isdir/etc shorthands not working properly. |
|---|
| 488 | - Console displays "<directive not ready>" for directives which have not |
|---|
| 489 | yet been initialised, rather than throwing KeyError exception. |
|---|
| 490 | - Added option to send emails via SMTP servers, rather than relying on |
|---|
| 491 | a local sendmail binary. Either option can now be used. |
|---|
| 492 | Set SMTP_SERVERS in config to use SMTP server option. This option |
|---|
| 493 | is now the default, and server defaults to 'localhost'. |
|---|
| 494 | Based on a submission by Dougal Scott <dwagon@connect.com.au> |
|---|
| 495 | - Fixed FILE example rule when performing cron test. |
|---|
| 496 | Noted by Dougal Scott <dwagon@connect.com.au>. |
|---|
| 497 | - Convert the weird time format that Solaris ps returns for etime and time |
|---|
| 498 | into plain seconds, which is a lot more useful for rules rather than |
|---|
| 499 | checking lengths or doing a integer conversion of a subslice of the |
|---|
| 500 | result and then a comparison based on that. |
|---|
| 501 | Patched by Dougal Scott <dwagon@connect.com.au>. |
|---|
| 502 | - Improved error output when parsing rules. |
|---|
| 503 | - Fixed bug when using Python pre-2.2 versions. |
|---|
| 504 | - Added some more sample directives. |
|---|
| 505 | - Added support for remembering historical data in directives. Rules can |
|---|
| 506 | reference data from previous samples. |
|---|
| 507 | - Changed actionperiod slightly, so first actionperiod defaults to scanperiod, |
|---|
| 508 | then actionperiod expression is used thereafter. |
|---|
| 509 | - Shift sticky and type bits of mode across, right justified. |
|---|
| 510 | - Improved handling of tokenization errors. |
|---|
| 511 | - Directive is cancelled (not re-queued) if there are too many |
|---|
| 512 | SNMP query failures (usually host not responding or some other |
|---|
| 513 | network or transport failure). |
|---|
| 514 | - Added shorthand booleans to FILE directive for checking file types in rules: |
|---|
| 515 | issocket |
|---|
| 516 | issymlink |
|---|
| 517 | isfile |
|---|
| 518 | isblockdevice |
|---|
| 519 | isdir |
|---|
| 520 | ischardevice |
|---|
| 521 | isfifo |
|---|
| 522 | - Updated docs with version 0.30 changes (forgot to do this at release time, |
|---|
| 523 | oops). |
|---|
| 524 | - Improved handling of sockets errors for console. |
|---|
| 525 | - Fixed issue with templates not being handled before rest of directive arguments. |
|---|
| 526 | - Added perm, sticky and type rule variables to the FILE directive. They are |
|---|
| 527 | shorthands for the permissions, sticky/setuid/setgid and file type bits |
|---|
| 528 | of a file's mode. |
|---|
| 529 | - Improved config syntax error handling of bad directive names. |
|---|
| 530 | - Implemented check and action dependency definitions. Two new directive |
|---|
| 531 | options are: actiondependson and checkdependson. These can be set to a |
|---|
| 532 | string containing a list of directives (comma-separated) that this directive |
|---|
| 533 | is dependent on. If any of the dependent directives has failed when this |
|---|
| 534 | directive comes to perform its check or action (depending on which option |
|---|
| 535 | was used) then that check or action will be skipped. |
|---|
| 536 | - Added new directive option actionperiod. This is a string containing an |
|---|
| 537 | expression which, when evaluated, sets the current period between actions |
|---|
| 538 | being performed. This allows for periods between actions to different to |
|---|
| 539 | the period between checks. It also allows for the period to be defined by |
|---|
| 540 | a mathematical expression, so the action period could exponentially increase |
|---|
| 541 | for example (for actions called during a single failure - the action period |
|---|
| 542 | will be reset when the failure is fixed). |
|---|
| 543 | - Enforced unique group and directive names at same group level. |
|---|
| 544 | - Improved error handling of console connections from bad clients. |
|---|
| 545 | - Fixed syntax error in sample config. |
|---|
| 546 | - Changed Linux ctr_interrupts system counter from int to long. |
|---|
| 547 | - Improved error handling of snmp directive. |
|---|
| 548 | - Improved handling of group configuration errors. |
|---|
| 549 | - Finally removed dependency on user-compiled 'top' command for collecting |
|---|
| 550 | some system stats on Solaris. All current stats are collected from uptime |
|---|
| 551 | and vmstat commands now, which should be standard on any Solaris system. |
|---|
| 552 | - Fetch Linux memory statistics from /proc/meminfo. |
|---|
| 553 | |
|---|
| 554 | Eddie-0.30 (31-May-2002) |
|---|
| 555 | - Prevented failed calls to 'top' (which will soon be made redundant anyway) |
|---|
| 556 | from causing system stats collection to fail on Solaris. |
|---|
| 557 | - Removed fetching WCHAN field from process information on Linux, as this |
|---|
| 558 | sometimes caused kernel warnings to output or logged. The field doesn't |
|---|
| 559 | appear particularly useful. |
|---|
| 560 | - Changed Linux Context switch counter from an int to a long. |
|---|
| 561 | - Fixed bug when an error parsing top output locks the system call semaphore |
|---|
| 562 | on Solaris. |
|---|
| 563 | - Fixed small bug when parsing string variables and catching exceptions in |
|---|
| 564 | actions. |
|---|
| 565 | - Added SENDMAIL config option to specify location of the sendmail binary |
|---|
| 566 | which EDDIE uses to send all email. |
|---|
| 567 | - Fixed bug when templates not in same group as directive referencing them. |
|---|
| 568 | - Changes PID directive argument 'pid' to 'pidfile'. |
|---|
| 569 | - Better handling of missing pysnmp module in snmp.py. |
|---|
| 570 | - Added basic SNMP directive based on a module by Dougal Scott |
|---|
| 571 | <dwagon@connect.com.au>. Requires pysnmp. |
|---|
| 572 | - Changed Linux 'df' call to 'df -l' which lists all local filesystems. |
|---|
| 573 | Much friendlier now that there are many alternative filesystems available |
|---|
| 574 | for Linux. |
|---|
| 575 | - Added patch by Kees Bakker <kees.bakker@altium.nl> to handle Linux df |
|---|
| 576 | when it sometimes outputs filesystem information over multiple lines. |
|---|
| 577 | - Added outfield variables to the COM directive. The out variable is split |
|---|
| 578 | by whitespace and the fields are stored in outfieldn variables, e.g., |
|---|
| 579 | outfield1, outfield2, etc. This is to assist rule creation. |
|---|
| 580 | - Added netsaint action and Elvin notification method, submitted by |
|---|
| 581 | Dougal Scott <dwagon@connect.com.au>. |
|---|
| 582 | - Added minor bug-fixes, thanks to pre-release testing by Dougal Scott |
|---|
| 583 | <dwagon@connect.com.au>. |
|---|
| 584 | - Linux ctr_cpu_idle variables need to be longs (instead of ints) as the |
|---|
| 585 | counters are larger than expected. |
|---|
| 586 | - Created a HTTP directive for performing HTTP (and HTTPS) tests. |
|---|
| 587 | - Fixed minor bug when displaying config lines that have parsing errors. |
|---|
| 588 | - Fixed bug in METASTAT directive. |
|---|
| 589 | - Removed the CRON directive. It is redundant now that the FILE directive |
|---|
| 590 | can perform the same test. |
|---|
| 591 | - Added a new data variable to FILE directive: now, which contains the |
|---|
| 592 | current time for use in tests with atime/mtime/ctime. |
|---|
| 593 | - LOGSCAN directive now initalizes data variables on first check, which is |
|---|
| 594 | only for finding the end of the logfile in question. This prevents an |
|---|
| 595 | exception when variables are needed for console strings before second |
|---|
| 596 | check has run. |
|---|
| 597 | - Removed optional actionList from being logged by directives also. |
|---|
| 598 | - Fixed bug with directives trying to log the action list, which is optional |
|---|
| 599 | now and may not exist. |
|---|
| 600 | - Moved sample M/MSG definitions to message.rules file. |
|---|
| 601 | - Added some more sample rules. |
|---|
| 602 | - Cleaned up sample rules and updated for the latest directive changes. |
|---|
| 603 | Added some elvinrrd sample rules. |
|---|
| 604 | - Minor cleanup of base directory path; just found os.path.norm() :) |
|---|
| 605 | - Fixed small problem with arg parsing handling None values. |
|---|
| 606 | - Fixed small bug in PORT directive: when a check fails due to a connection |
|---|
| 607 | timeout, the recv string that wasn't set was still being searched. |
|---|
| 608 | - Cleaned up config formatting some more so that actions do not need to be |
|---|
| 609 | inside strings, they can be entered directly in a function call-like |
|---|
| 610 | format, e.g., |
|---|
| 611 | action=ticker("Load on %(h)s is %(out)s", timeout=1) |
|---|
| 612 | or for a notification object, |
|---|
| 613 | action=COMMONALERT(commonmsg.fs,1) |
|---|
| 614 | - Changed PROC argument 'procname' to 'name' and action variable |
|---|
| 615 | 'proc_check_name' to 'name' also, for consistency. |
|---|
| 616 | - Fixed minor bug with lack of expect argument for PORT directive. |
|---|
| 617 | - Removed data collection modules which are not required. |
|---|
| 618 | - Cleaned up all data collection modules and classes to simplify their |
|---|
| 619 | definitions. Data collectors should be derived from the DataCollect |
|---|
| 620 | base class which handles all the data caching and thread-locking. |
|---|
| 621 | - Changes to parseConfig to simply directive definitions. |
|---|
| 622 | - Removed old datastore module. |
|---|
| 623 | - Fixed up console code to handle errors better. |
|---|
| 624 | - Changed Directive base-class to simplify directive definitions. |
|---|
| 625 | - New datacollect module which defines DataModules class to handle dynamic |
|---|
| 626 | importing of architecture-dependent data collection modules, and |
|---|
| 627 | DataCollect class to provide a base-class for data collectors. |
|---|
| 628 | - Fixed PING directive to handle un-resolvable addresses. Also returns ping |
|---|
| 629 | round-trip-times in seconds as a floating-point number. |
|---|
| 630 | - Simplified directive definitions by moving most of the common code to |
|---|
| 631 | Directive base-class. New directives only need to define __init__, |
|---|
| 632 | tokenparser and getData methods. |
|---|
| 633 | - Removed requirement for action variables to be prefixed by directive name. |
|---|
| 634 | Action variables now have the same name as the rule variables, for |
|---|
| 635 | consistency. Changed a few more variable names so they make more sense. |
|---|
| 636 | - Moved common directive definitions from directive.py to |
|---|
| 637 | Directives/common.py. |
|---|
| 638 | - OS-dependent modules are now imported dynamically when needed, not in the |
|---|
| 639 | main eddie.py anymore. All data collection modules are handled by the |
|---|
| 640 | new datacollect module. |
|---|
| 641 | - Removed old method of determining systype with external script (wasn't used |
|---|
| 642 | anymore anyway). |
|---|
| 643 | - Fixed bug with Pinger where it would throw an exception when pinging |
|---|
| 644 | addresses that did not resolve. |
|---|
| 645 | - Added extra console argument variables: |
|---|
| 646 | . lastchecktime - date/time of last directive execution |
|---|
| 647 | . problemfirstdetect - date/time of current failure first detected (only if |
|---|
| 648 | state is failed) |
|---|
| 649 | . problemlastfail - date/time of current failure last detected (only if state |
|---|
| 650 | is failed) |
|---|
| 651 | - Cleaned up description of ADMINLEVEL in sample config so it makes more sense. |
|---|
| 652 | - Added console argument to directives to specify how the console output should |
|---|
| 653 | look for that directive. console=None can be specified to hide that directive |
|---|
| 654 | from console output. |
|---|
| 655 | - Added support for EXT3 filesystems in Linux filesystem checking code. |
|---|
| 656 | Patch submitted by Kees Bakker <kees.bakker@altium.nl> |
|---|
| 657 | - Fixed a minor bug where directives using the eval() function and catching |
|---|
| 658 | an exception would log a very ugly looking message. This was due to the Python |
|---|
| 659 | eval() function modifying the user-supplied environment dictionary by adding |
|---|
| 660 | the __builtin__ dictionary. When this is printed it looks horrible. |
|---|
| 661 | - Added 'actelse' directive argument to perform actions if directive state is |
|---|
| 662 | ok and has not changed with last check. |
|---|
| 663 | Based on patches submitted by Dougal Scott <dwagon@connect.com.au> |
|---|
| 664 | - Changed Linux counter variables to have 'ctr_' at start of name, to be |
|---|
| 665 | consistent with Solaris and HP-UX variables. |
|---|
| 666 | - Fixed minor bug in HP-UX and Solaris system data collection. |
|---|
| 667 | - Fixed bug in uptime parsing in HP-UX system.py. |
|---|
| 668 | - Added a timeout argument to the ticker action. |
|---|
| 669 | - Re-implemented Elvin connection and notification code using the Elvin |
|---|
| 670 | ThreadedLoop client and a dedicated Elvin thread which should prevent |
|---|
| 671 | other threads from blocking on Elvin problems. |
|---|
| 672 | - Specify full path for solaris 'ps' command to prevent calling wrong version of |
|---|
| 673 | 'ps'. |
|---|
| 674 | - Started work on a basic Developer's Guide: doc/dev_guide.txt. |
|---|
| 675 | - Standardised logging levels and tidied up all logging. |
|---|
| 676 | - Added system performance data collecting from 'uptime' and 'vmstat -s' |
|---|
| 677 | commands on Solaris. |
|---|
| 678 | - Improved network interface statistics on Linux by retrieving data from |
|---|
| 679 | /proc/net/dev. |
|---|
| 680 | |
|---|
| 681 | Eddie-0.29 (non-public release) |
|---|
| 682 | |
|---|
| 683 | Eddie-0.28 (9-Mar-2002) |
|---|
| 684 | - Cleaned up df code, added data caching and made thread-safe, like other |
|---|
| 685 | data collectors. |
|---|
| 686 | - Fixed up eddie_wrapper locating GNU date on Solaris. |
|---|
| 687 | - Fixed memory-leak in disk-usage code (reported by Dougal Scott |
|---|
| 688 | <dwagon@connect.com.au>). |
|---|
| 689 | - Exit with error if all threads are locked (cannot kill threads in current |
|---|
| 690 | Python implementation). |
|---|
| 691 | Make eddie_wrapper a little smarter when restarting eddie process. |
|---|
| 692 | - Added example init.d scripts to contrib for Solaris and Redhat Linux. |
|---|
| 693 | - Added another vmstat parser to get free memory/swap information for |
|---|
| 694 | Solaris. |
|---|
| 695 | - Added a common semaphore for utils.safe_popen()/safe_pclose() and |
|---|
| 696 | utils.safe_getstatusoutput() to use between them. It appears that |
|---|
| 697 | system calls, of any sort - system() calls, popen(), commands module, |
|---|
| 698 | etc - are not thread-safe and cannot be performed simultaneously by |
|---|
| 699 | multiple threads at once. This should prevent such race-conditions as |
|---|
| 700 | all EDDIE system calls use these functions. |
|---|
| 701 | - Cleaned up access to the system stats cache so that only one thread at a |
|---|
| 702 | time will be refreshing the data. |
|---|
| 703 | - Added some more smarts to eddie_wrapper: |
|---|
| 704 | - don't start Eddie if one is already running. |
|---|
| 705 | - don't restart Eddie more than a set number of times in a short period of |
|---|
| 706 | time (requires GNU date command). |
|---|
| 707 | - Put semaphore lock around Elvin notify to ensure thread-safe notifications |
|---|
| 708 | are being sent. Suspect duplicates were being sent before. |
|---|
| 709 | - Now logs the current thread name for each log entry for improved debugging. |
|---|
| 710 | - A lot of cleaning up of system.py for Solaris. |
|---|
| 711 | Added all counter stats from 'vmstat -s'. |
|---|
| 712 | Changed gathering of loadavg/uptime stats from '/usr/bin/uptime' rather than |
|---|
| 713 | '/opt/local/bin/top' - trying to phase out use of 'top'. |
|---|
| 714 | Improved documentation at top of class, with listing of every stats variable |
|---|
| 715 | available from the system class. |
|---|
| 716 | - Added prtdiag parsing for Enterprise class servers (E3500,E6500,etc) |
|---|
| 717 | for temperature. |
|---|
| 718 | - Added support for prtdiag for Sun U280R's. |
|---|
| 719 | - Added list of paths to find metastat command for Solaris METASTAT directive. |
|---|
| 720 | - Added PRTDIAG directive to provide an interface to the system-specific |
|---|
| 721 | data provided by prtdiag on Sun machines. |
|---|
| 722 | Currently only system temperatures are extracted for U450s and U250s. |
|---|
| 723 | - Added support for VxFS filesystems in df.py for Solaris. |
|---|
| 724 | - Updated docs to require Python versions 1.6+ |
|---|
| 725 | |
|---|
| 726 | Eddie-0.27 (12-Nov-2001) |
|---|
| 727 | - Put semaphore lock around Elvin connect calls to prevent multiple threads |
|---|
| 728 | trying to connect at once. |
|---|
| 729 | - Fixed bug with ELVINURL and ELVINSCOPE config options not being set |
|---|
| 730 | properly. |
|---|
| 731 | - Socket errors in Console code are matched with errno error names, rather |
|---|
| 732 | than assuming the error numbers are the same across platforms. |
|---|
| 733 | [Bug reported by: Ivar Zarans <iff@alcaron.ee>] |
|---|
| 734 | - Handle socket errors from PINGs nicely. |
|---|
| 735 | - Added a reconnect() function to force the elvin connection closed before |
|---|
| 736 | reconnecting. |
|---|
| 737 | - Cleaned up eddieElvin4 code, including connecting and auto-reconnecting to |
|---|
| 738 | Elvin server when connection is lost. |
|---|
| 739 | - Added better exception handling for "Connection Timed Out" error in PORT |
|---|
| 740 | directive isalive() function. |
|---|
| 741 | - Fixed file descriptor leak in PORT directive isalive() function when |
|---|
| 742 | Connection Refused exception is handled the socket file descriptor was |
|---|
| 743 | not being closed. |
|---|
| 744 | - Added more system statistics to the Linux system data collector module. |
|---|
| 745 | Added most of the stats available from /proc/stat, including: |
|---|
| 746 | cpu_user - total cpu in user space |
|---|
| 747 | cpu_nice - total cpu in user nice space |
|---|
| 748 | cpu_system - total cpu in system space |
|---|
| 749 | cpu_idle - total cpu in idle thread |
|---|
| 750 | cpu%d_user - per cpu in user space (e.g., cpu0, cpu1, etc) |
|---|
| 751 | cpu%d_nice - per cpu in user nice space (e.g., cpu0, cpu1, etc) |
|---|
| 752 | cpu%d_system - per cpu in system space (e.g., cpu0, cpu1, etc) |
|---|
| 753 | cpu%d_idle - per cpu in idle thread (e.g., cpu0, cpu1, etc) |
|---|
| 754 | pages_in - pages read in |
|---|
| 755 | pages_out - pages written out |
|---|
| 756 | pages_swapin - swap pages read in |
|---|
| 757 | pages_swapout - swap pages written out |
|---|
| 758 | interrupts - number of interrupts received |
|---|
| 759 | contextswitches - number of context switches |
|---|
| 760 | boottime - time of boot (epoch) |
|---|
| 761 | processes - number of processes started (I think?) |
|---|
| 762 | These are now available to directives like SYS. |
|---|
| 763 | - Cleaned up eddie-adm email headers. |
|---|
| 764 | |
|---|
| 765 | Eddie-0.26 (1-Oct-2001) |
|---|
| 766 | - Changed elvinrrd() action call arguments slightly. It is now: |
|---|
| 767 | elvinrrd( 'rrdkey', 'arg1=val1', 'arg2=val2', ... ) |
|---|
| 768 | The first argument must be the RRD database name to store data into. |
|---|
| 769 | All arguments following that (one or more) are "variable=data" strings |
|---|
| 770 | where variable is the name of the variable in the RRD db and data is |
|---|
| 771 | the data to store in that variable. RRD dbs can have multiple variables |
|---|
| 772 | so this allows some or all of them to be updated in one action call. |
|---|
| 773 | - Wrapped the critical calls in safe_popen(), safe_pclose() and |
|---|
| 774 | safe_getstatusoutput() in try/except clauses, so that any exceptions are |
|---|
| 775 | intercepted and the semaphore locks are released (exceptions are then |
|---|
| 776 | raised again to be handled as normal). This stops threads being blocked |
|---|
| 777 | on semaphore acquires which used up the thread pool quickly and was |
|---|
| 778 | obviously bad. |
|---|
| 779 | - Added elvinrrd action which is used to send data samples over Elvin to a |
|---|
| 780 | consumer which stores that data into an RRD database. |
|---|
| 781 | - Updated elvindb() action and elvindb() Elvin function to support Elvin4. |
|---|
| 782 | elvindb actions are now working again. |
|---|
| 783 | - Directive states now transition from "ok" to "failinitial" to "fail". |
|---|
| 784 | "ok" indicates the directive is fine; |
|---|
| 785 | "failinitial" indicates the directive is current transitioning to the "fail" |
|---|
| 786 | state or is waiting on a re-check; |
|---|
| 787 | "fail" indicates the directive has definitely failed. |
|---|
| 788 | - Fixed a small bug where a directive performing multiple checks (numchecks>1) |
|---|
| 789 | which fails one of the first checks but passes a subsequent re-check still |
|---|
| 790 | performs the act2ok action, which it should not do. |
|---|
| 791 | - Directive threads are named, for easier debugging. The name they are given |
|---|
| 792 | is the ID of the directive they are executing. |
|---|
| 793 | - Cleaned up ALIAS code to support being passed in action calls properly. |
|---|
| 794 | - Cleaned up action calling code. Actions called from action and act2ok now |
|---|
| 795 | use the same action evaluation function, whether actions are called |
|---|
| 796 | directly as a function or from Notification objects. Thus actions can be |
|---|
| 797 | called directly or Notification objects used from both action and act2ok |
|---|
| 798 | arguments, and can even be combined. |
|---|
| 799 | - Added a rule argument to RADIUS directive so rules can be written to test |
|---|
| 800 | radius auths. The variable passed is set in the rule environment and is |
|---|
| 801 | set to either 0 for failed or 1 for passed. |
|---|
| 802 | - FILE directive now makes the file statistics from the previous check |
|---|
| 803 | available so rules can compare the current statistics against the previous |
|---|
| 804 | statistics to see if files or file metadata have changed over time. |
|---|
| 805 | Variables are same but prepended by 'last', e.g.: rule='md5 != lastmd5' |
|---|
| 806 | - Fixed bug: Connection not being closed in all cases for PORT isalive() |
|---|
| 807 | function. |
|---|
| 808 | - Added new directive, FILE, allowing tests to be made on a file based on |
|---|
| 809 | standard file statistics (size, mode, ownerships, etc) and md5 hashes. |
|---|
| 810 | - Update lastfailtime in stateok function so any actions called by act2ok |
|---|
| 811 | will know the full age of the problem. |
|---|
| 812 | - Added PING directive to provide network ping checking of hosts. |
|---|
| 813 | - Added initial HP-UX support. |
|---|
| 814 | - Fixed bug in PROC R() check. |
|---|
| 815 | |
|---|
| 816 | |
|---|
| 817 | Eddie-0.25 (6-Jul-2001) |
|---|
| 818 | - Changed where varDict action variables are set in some directives so that |
|---|
| 819 | they are available for act2ok action calls. |
|---|
| 820 | - Improved error handling in directive.py |
|---|
| 821 | - Fixed problem with DF list not refreshing itself properly. |
|---|
| 822 | - Changed CONSPORT config option to CONSOLE_PORT. |
|---|
| 823 | I find more verbose to be much user-friendlier than less. |
|---|
| 824 | - Added two new config settings: |
|---|
| 825 | EMAIL_FROM='emailaddress' |
|---|
| 826 | EMAIL_REPLYTO='emailaddress' |
|---|
| 827 | so the From: and Reply-To: fields in the email action can be set. |
|---|
| 828 | If these are not set, they default to the current USER for the From: field, |
|---|
| 829 | and '' for the Reply-To: field. |
|---|
| 830 | - Cleaned up PORT directive isalive() handling Connection Refused exceptions. |
|---|
| 831 | - Create a QUICKSTART text document to give the impatient a quick way to |
|---|
| 832 | get Eddie running. |
|---|
| 833 | - sockets.py: handle port already in use by exiting and signalling the other |
|---|
| 834 | non-daemon threads to exit. If the port is in use the whole program should |
|---|
| 835 | exit cleanly with an appropriate error message now. |
|---|
| 836 | Similarly, exit cleanly (and signal other threads to exit) if too many |
|---|
| 837 | socket errors. |
|---|
| 838 | - config.py: Improved error handling; if CONSPORT is not a positive integer a |
|---|
| 839 | ParseFailure is raised. |
|---|
| 840 | - The console server thread will not be started if CONSPORT=0. This allows |
|---|
| 841 | the console feature to be disabled if required. |
|---|
| 842 | - Main thread will now also exit if please_die Event is set. This allows |
|---|
| 843 | other threads to signal that the program should exit. |
|---|
| 844 | - Added act2ok param - allows you to specify a Notification object |
|---|
| 845 | to use when Check goes from bad to good |
|---|
| 846 | - Log accepted connections with remote IP:port, for security or whatever. |
|---|
| 847 | - directive.py: made directive string representation tidier. |
|---|
| 848 | - sockets.py: Handle "Interrupted system call" (from CTRL-C) nicely. |
|---|
| 849 | - Chaged eddie.py - changes include cleaning up the way threads |
|---|
| 850 | are started and stoped, there is now start_threads() and |
|---|
| 851 | stop_threads(). I did this so that both the scheduler thread |
|---|
| 852 | and the console socket thread can be started and stop easily |
|---|
| 853 | when the config changes. |
|---|
| 854 | - Added config var CONSPORT - this is the port to listen to |
|---|
| 855 | console connections on. The default is 33343. |
|---|
| 856 | - Added sockets.py - A sockets interface to the current state of |
|---|
| 857 | all eddie checks, this will be used for a console like interface. |
|---|
| 858 | - Removed DEFs and replaced by ALIASes which are now used to define string |
|---|
| 859 | aliases to be substituted during config parsing, or during action argument |
|---|
| 860 | parsing. '$' signs are not used anymore, giving a much nicer Python |
|---|
| 861 | look-and-feel. |
|---|
| 862 | - Added %(problemage)s %(problemfirstdetect)s to sample MSGs to demonstrate |
|---|
| 863 | usage. These are substituted for the age of the current directing being |
|---|
| 864 | false and the time the first false was detected respectively; or empty |
|---|
| 865 | strings ("") if the problem age is currently 0. |
|---|
| 866 | - Added more detailed logging of thread usage, making thread problems easier |
|---|
| 867 | to track. |
|---|
| 868 | - Added a utils.safe_getstatusoutput() as a thread-safe wrapper around |
|---|
| 869 | commands.getstatusoutput(). |
|---|
| 870 | The IPF directive now uses this to avoid deadlocks. |
|---|
| 871 | - Problem age and First time detected variables are now substitutable values |
|---|
| 872 | within an email message body, %(problemage)s and %(problemfirstdetect)s, |
|---|
| 873 | instead of automatically being appended to the bottom of every email. |
|---|
| 874 | Note, these variables are empty ("") if the problem age is zero. |
|---|
| 875 | - Changed all os.popen() calls to use the thread-safe utils.safe_popen(). |
|---|
| 876 | This should prevent deadlocks when multiple directives are gathering info. |
|---|
| 877 | - Added 'negate' option to LOGSCAN - will match lines which do NOT match the |
|---|
| 878 | regex. |
|---|
| 879 | - Added formatted exception traceback to safeCheck() logging. |
|---|
| 880 | - Fixed socket connect() call in pop3.py to support Python 2.1 |
|---|
| 881 | - Email admin logs when exiting due to config parse failure. |
|---|
| 882 | - Added LOGSCAN examples. |
|---|
| 883 | - Updated sample rules to reflect new config layout and features. |
|---|
| 884 | - Log Eddie version and systype. |
|---|
| 885 | Also log when configuration parsing complete. |
|---|
| 886 | - Cleaned up pop3.py imports. |
|---|
| 887 | - Added LOGSCAN directive for monitoring logfiles. |
|---|
| 888 | - Fixed PROC custom rules setting. |
|---|
| 889 | - Fixed directives setting their own ID only if none set in config. |
|---|
| 890 | - parseFailure() logs problem to logfile as well as printing to stdout. |
|---|
| 891 | - Cleaned up sample eddie.cf and added verbose comments. |
|---|
| 892 | - Catch any uncaught exceptions around main() so they are logged and displayed |
|---|
| 893 | nicely, making it easier for the Eddie admin to see and act on them. |
|---|
| 894 | Hence eddie doesn't have to be run from eddie_wrapper with stderr captured |
|---|
| 895 | (which didn't really work properly anyway). |
|---|
| 896 | - Fixed socket connect() call in PORT directive to use tuple as argument |
|---|
| 897 | rather than two arguments. This changed in Python-2.1 (but works with |
|---|
| 898 | older versions). |
|---|
| 899 | - Removed the old snpp code which wasn't being used. This should be replaced |
|---|
| 900 | with updated code. |
|---|
| 901 | - Elvin config parameters have changed from ELVINHOST and ELVINPORT to |
|---|
| 902 | ELVINURL and ELVINSCOPE to support Elvin4 properly. |
|---|
| 903 | - The Elvin tickertape action is now called ticker() [it was just called |
|---|
| 904 | elvin() before]. |
|---|
| 905 | - Updated Elvin code to support Elvin4 and moved to new file eddieElvin4.py. |
|---|
| 906 | Elvin3 will no longer be supported. |
|---|
| 907 | - Replaced any use of old regex module with new re module (using regex causes |
|---|
| 908 | warnings with Python-2.1). |
|---|
| 909 | - Tested under Python-2.1. Had to modify some of the globals to avoid new |
|---|
| 910 | warnings under 2.1. |
|---|
| 911 | - Updated system.py to handle 'top' under Solaris 8. |
|---|
| 912 | - Directive threads are started with safeCheck() which wraps up docheck() |
|---|
| 913 | in try/except so all un-caught exceptions within that thread will be caught |
|---|
| 914 | and the thread can exit cleanly. |
|---|
| 915 | - Cleaned up parsing of 'top' a bit more, so it works better under Solaris 8. |
|---|
| 916 | - Added support for directive templates. A directive can be created to be |
|---|
| 917 | only used as a template for other directives, supplying default settings; |
|---|
| 918 | as well as standard directives can also be used as templates for other |
|---|
| 919 | directives. |
|---|
| 920 | Directive template creation, eg: |
|---|
| 921 | PROC 'template1': template=self scanperiod='5m' checks=2 checkwait=30 |
|---|
| 922 | PROC 'cron': template='template1' procname='crond' action="..." |
|---|
| 923 | special template=self means this directive is a template and not to |
|---|
| 924 | schedule it. |
|---|
| 925 | Can use other working directives as templates also. |
|---|
| 926 | Template should be same directive type as directive using it - but this is |
|---|
| 927 | not enforced because it shouldn't hurt.... directives ignore any arguments |
|---|
| 928 | they don't need. |
|---|
| 929 | - Added support for new Directive arguments: |
|---|
| 930 | numchecks=<int> |
|---|
| 931 | checkwait=<time> |
|---|
| 932 | numchecks specifies how many checks a directive should perform before |
|---|
| 933 | calling its actions. By default this will be 1. Setting this to 2 |
|---|
| 934 | will force 2 checks before actions are called. It can be set to any |
|---|
| 935 | positive integer, include 0. 0 is a special case which indicates that |
|---|
| 936 | this directive will not perform any checks. This could be used to |
|---|
| 937 | temporarily disabled a directive, for example. |
|---|
| 938 | checkwait specifies how long the directive will wait before performing |
|---|
| 939 | its next re-check if numchecks>1. Its value is a standard time specification |
|---|
| 940 | eg: '5' = 5 seconds; '5s' = 5 seconds; '2m' = 2 minutes; '5h' = 5 hours. |
|---|
| 941 | By default checkwait is 0 which means the next re-check will run instantly. |
|---|
| 942 | checkwait should normally be set to a meaningful value if numchecks>1. |
|---|
| 943 | - Added ALIAS definition. Similar to DEFs but ALIASes are replaced inside |
|---|
| 944 | action calls, etc. Whereas DEFs are only translated during config file |
|---|
| 945 | parsing time. |
|---|
| 946 | Note: DEFs break the Python-like look&feel of the config file and may |
|---|
| 947 | disappear in the future if they can be replaced neatly. |
|---|
| 948 | - Cleaned up logging in config.py. LOGFILE should be the first option |
|---|
| 949 | in eddie.cf so logs end up in the right place. |
|---|
| 950 | - Handle scanperiod argument in directives so scanperiod can be overrided |
|---|
| 951 | for each directive. |
|---|
| 952 | - Signals received during a time.sleep() under Linux cause an IOError |
|---|
| 953 | exception so just catch these and move on. Main thread should be |
|---|
| 954 | handling the shutdown cleanly anyway. |
|---|
| 955 | - Cleaned up directive tokenparsing so base Directive class does as |
|---|
| 956 | much of the work as possible and user-written directive objects |
|---|
| 957 | only have to test existance of arguments and setup. |
|---|
| 958 | - New config format, which is not compatable with old format. |
|---|
| 959 | All arguments to a directive are now named arguments. |
|---|
| 960 | - Max number of threads to use can be limited in eddie.cf with the |
|---|
| 961 | NUMTHREADS variable now. Should be set > 5 for normal use. |
|---|
| 962 | If set too low checks will never be allowed to run. |
|---|
| 963 | - Created Radius auth checking directive. |
|---|
| 964 | - Added clean exiting code to SIGINT, same as SIGTERM. |
|---|
| 965 | - Cleaned up exiting on SIGTERM signal. The scheduler thread is signalled to |
|---|
| 966 | die and the main thread will wait for the scheduler thread to receive the |
|---|
| 967 | signal and exit before exiting cleanly itself. All "worker" threads are |
|---|
| 968 | ignored and should die of their own accord. |
|---|
| 969 | - Put semaphores around COM checks which do os.system() calls. |
|---|
| 970 | Only one COM check will execute at a time. |
|---|
| 971 | - Made proc.py thread-friendly. |
|---|
| 972 | - timeQueue is the queueing class derived from Python's Queue class. It is as |
|---|
| 973 | thread-friendly as Queue, the major difference being objects are inserted |
|---|
| 974 | into the queue based on a given time. Objects with the lowest times are |
|---|
| 975 | closest to the front of the queue. |
|---|
| 976 | To support this, objects have to be added along with their time, so a |
|---|
| 977 | 2-tuple must be added, eg: q.put( (obj, time) ). Similarly q.get() |
|---|
| 978 | returns the same 2-tuple. |
|---|
| 979 | An extra public method has been added, over what Queue offers, q.head(). |
|---|
| 980 | This method returns the item (and time) from the front of the queue, |
|---|
| 981 | exactly as q.get(), but does not remove it from the queue. |
|---|
| 982 | - To support the new queueing of jobs, all directives must end by submitting |
|---|
| 983 | themselves back into the queue. A |
|---|
| 984 | Config.q.put(self,time.time()+self.scanperiod) will submit itself back |
|---|
| 985 | into the queue and schedule itself to be run in self.scanperiod seconds. |
|---|
| 986 | If a directive does not put itself back into the queue it will not be |
|---|
| 987 | called again (this can be useful if there is some sort of error and the |
|---|
| 988 | directive should not be called again). |
|---|
| 989 | - os.popen() appears to cause problems when used by multiple threads at once, |
|---|
| 990 | so all such calls now use a wrapper, utils.safe_popen() which performs |
|---|
| 991 | a semaphore lock around os.popen(). utils.safe_pclose() _MUST_ be called |
|---|
| 992 | after the pipe has been finished with or the semaphore will not be released |
|---|
| 993 | and all other calls will be blocked forever. |
|---|
| 994 | - Core of Eddie is now multi-threaded using a scheduler thread to run each |
|---|
| 995 | check in its own thread. Thread usage is limited so things don't get out |
|---|
| 996 | of control. |
|---|
| 997 | The scheduler tracks jobs with a derivative of Python's Queue class which |
|---|
| 998 | orders items by time, so that the job to be started soonest will be at the |
|---|
| 999 | front of the queue. This will now allow directives to specify their own |
|---|
| 1000 | scanperiod and execute as often or as little as desired, indepentently of |
|---|
| 1001 | other directives. |
|---|
| 1002 | Modified config files are still automatically detected (sometime within a 10 |
|---|
| 1003 | minute period by the "Housecleaning" thread (main process)) which causes the |
|---|
| 1004 | scheduler to be signalled to exit and then the configs are re-read and a new |
|---|
| 1005 | scheduler will be started up. |
|---|
| 1006 | |
|---|
| 1007 | |
|---|
| 1008 | Eddie-0.24 (1-Oct-2000) |
|---|
| 1009 | - Added custom disksuite check to alert if any metadevices require |
|---|
| 1010 | maintenance. Skips checking if /usr/opt/SUNWmd/sbin/metastat not found. |
|---|
| 1011 | - Separate system.py for Solaris 5.8 because of differences. |
|---|
| 1012 | - Better logging of problem states for debugging. Problem states track |
|---|
| 1013 | the current "state" of a problem (ok, failed, etc), and the time when |
|---|
| 1014 | the problem was first detected. |
|---|
| 1015 | - Email action now includes the age of the problem and when the problem was |
|---|
| 1016 | first detected (if not the first time) in the email. |
|---|
| 1017 | - Added handler for when pop3 connections are failing or not authing. |
|---|
| 1018 | - Fixed bug with parsing config when indentations are incorrect. |
|---|
| 1019 | Handles it better now by raising a ParseFailure exception and pointing |
|---|
| 1020 | out the error line in the config file. |
|---|
| 1021 | - Changed string variable substitution method from %variable to Python's |
|---|
| 1022 | format %(variable)s. This lets us use Python's built-in variable |
|---|
| 1023 | substitution on strings and makes the implementation much simpler. |
|---|
| 1024 | - Log parsing failures in parseVars() |
|---|
| 1025 | - Fixed small bug with pop3 error checking. |
|---|
| 1026 | |
|---|
| 1027 | |
|---|
| 1028 | Eddie-0.23 (19-Jun-2000) |
|---|
| 1029 | - cleaned up Solaris x86 support (still fairly untested). |
|---|
| 1030 | - changed Linux system.py to get statistics from /proc rather than |
|---|
| 1031 | parsing 'top' output. |
|---|
| 1032 | - added close() method to kstat object and removed kstat_close() call |
|---|
| 1033 | from kstat object initialization function which was possibly causing |
|---|
| 1034 | seg faults in solkstatmodule.so. |
|---|
| 1035 | - elvindb() action now takes optional string argument containing the |
|---|
| 1036 | column/value pairs to store in the database (via Elvin). |
|---|
| 1037 | - added POP3TIMING directive for checking and timing pop3 connections |
|---|
| 1038 | in new pop3 directive module. |
|---|
| 1039 | - added CRON directive for cron checks in solaris.py directive module. |
|---|
| 1040 | - added IPF directive for ipfilter tests. |
|---|
| 1041 | - added support for custom Directive imports from new Directive directory. |
|---|
| 1042 | - added count of filedescriptors for debugging. |
|---|
| 1043 | - fixed bug with PROC check which would only perform check on first |
|---|
| 1044 | process found with the name specified. It now performs checks on every |
|---|
| 1045 | process with the specified name. |
|---|
| 1046 | - added a kstat_close() to fix a file-descriptor leak in solkstatmodule.so. |
|---|
| 1047 | - added a default class, DataStore, for storage subclasses to use, which |
|---|
| 1048 | automatically caches data. |
|---|
| 1049 | - added support for iostat data to STORE directive. |
|---|
| 1050 | - find OS-specific modules in multiple directories from most specific |
|---|
| 1051 | to least specific (eg: OS/version/architecture, OS/version, then OS). |
|---|
| 1052 | - changed auto system type determination to internal code rather than |
|---|
| 1053 | calling separate 3rd-party 'systype' script. |
|---|
| 1054 | - added iostat objects for collecting iostat data under Solaris. Uses |
|---|
| 1055 | a shared library, solkstatmodule.so, created by the Eddie developers |
|---|
| 1056 | and included. |
|---|
| 1057 | |
|---|
| 1058 | |
|---|
| 1059 | Eddie-0.22 (15-May-2000) |
|---|
| 1060 | - added defaults for out and err in COM directive to stop an exception |
|---|
| 1061 | when the executed command did not write stdout or stderr files. |
|---|
| 1062 | - fixed SIGALRM so it works properly under Linux. (Works slightly |
|---|
| 1063 | differently to Solaris). |
|---|
| 1064 | - COM did not use the return value of a os.system() call properly. This |
|---|
| 1065 | has been fixed as per the wait (2) call. |
|---|
| 1066 | - email() action takes an optional 3rd argument which is the body of |
|---|
| 1067 | the message. Otherwise the 2nd argument is used as both subject and body. |
|---|
| 1068 | - msg is copied to subject for simple email() call with only subject given. |
|---|
| 1069 | - changed import for new Elvin.py module. |
|---|
| 1070 | - now using Python 1.5.2 |
|---|
| 1071 | - fixed process name hash keys. |
|---|
| 1072 | - added Solaris x86 support. |
|---|
| 1073 | - added Linux support (tested with RedHat 6). |
|---|
| 1074 | |
|---|
| 1075 | |
|---|
| 1076 | Eddie-0.21 (4-Oct-1999) |
|---|
| 1077 | - catch timeouts while trying to stat config files and skip the config file |
|---|
| 1078 | modified checks. |
|---|
| 1079 | - added '-n' to 'netstat -i' coz resolving interfaces on some hosts were |
|---|
| 1080 | taking forever. |
|---|
| 1081 | - fixed Elvin messaging with new ElvinConnection object. |
|---|
| 1082 | - added Elvin db data-storage consumer daemon. |
|---|
| 1083 | - separated Solaris 2.5 and 2.7-specific lib areas. |
|---|
| 1084 | - added caching functionality to data gathering code. |
|---|
| 1085 | - added functions to return hashes of network information. |
|---|
| 1086 | - changed eddie-elvin interface to maintain single shared connection to Elvin |
|---|
| 1087 | server. |
|---|
| 1088 | - added double-checking for SP directive. |
|---|
| 1089 | - added STORE directive to enable configuration of data to be sent via |
|---|
| 1090 | elvindb(). |
|---|
| 1091 | - added elvindb() functionality to send database objects over Elvin to |
|---|
| 1092 | database consumer. |
|---|
| 1093 | - added support for Solaris 2.7. |
|---|
| 1094 | - created sample config files in config.sample. |
|---|
| 1095 | - fixed quote problem with address specified in SP check. |
|---|
| 1096 | - PORT directive handles multiple lines sent to destination. |
|---|
| 1097 | - added estored development to contrib area. |
|---|
| 1098 | - added creation of system objects for collecting system stats. |
|---|
| 1099 | - added SYS directive to allow detailed checks to be performed on system |
|---|
| 1100 | data such as load-average, memory/swap usage, cpu idle %, etc. |
|---|
| 1101 | - parses email address strings for variable definitions. |
|---|
| 1102 | - fixed int overflow errors in netstat data collection. |
|---|
| 1103 | - elvin() will use subject as message if message is blank for Tickertape. |
|---|
| 1104 | - fixed small bug with actions parsing '%' at end of line. |
|---|
| 1105 | - added directive 'NET' to allow detailed checks on current network |
|---|
| 1106 | statistics. |
|---|
| 1107 | - netstat object now obtains all current network statistics from host |
|---|
| 1108 | (ie: 'netstat -s') under Solaris. |
|---|
| 1109 | - added 'IF' directive to enable detailed network interface checks. |
|---|
| 1110 | - added new rule for process checking to allow complex checks to be performed |
|---|
| 1111 | on running processes. |
|---|
| 1112 | - automatically re-load config files if any have changed. |
|---|
| 1113 | - during process (and pid) checks, if process isn't found, sleep a bit then |
|---|
| 1114 | double check. |
|---|
| 1115 | - now only pulls in process info when called, and caches that info for a |
|---|
| 1116 | set time before fetching it again. |
|---|
| 1117 | - now pulls in every bit of process info that ps can provide. |
|---|
| 1118 | - added a wrapper for eddie to capture major exceptions and auto restart. |
|---|