The daqAI program monitors the DZERO online system's performance via the Level 3 Monitor system and attempts to identify any problems by looking for patterns in the monitor data. It produces a log file of what is going on and a shift report. It also has a GUI frontend that allows you to watch what is going on. The decision making process is based on an embeddable expert system, CLIPS. It can also issue commands to the online system to recover from errors automatically (though not by default).

Version History

Version Date Comments
v02-06-00 12/16/2002
  • Shift report is more compact
  • Looks for missing SLIC inputs and issues SCL init
  • Fixed muon crate "waiting for SCL INIT" bug
  • Logs missing L2 mdt inputs (but only in log text)
  • rate-to-tape timer now only runs while in store, not all the time
  • Fix annoying "5 Hz rate to tape" warning when coming back from problem.
  • And/Or terms are now available.
v02-05-00 11/24/2002
  • Rate into L3, but less than 5 hz out of L3 prompts a verbal warning
  • Gui now shows what version of daqAI is currently running.
  • Minor email formatting changes
v02-04-00 11/23/2002
  • The l3_configured timer is no longer incremented unless we are in a store
  • Shift report emails contain the start/end time for the data within
  • Fixed memory leak
v02-03-00 11/22/2002
  • Says "no data flow" when no data flow in present in the muon readout and there is expected data flow (i.e. l3 RM is configured to take data)
  • When muon 0x30 or 0x31 goes out of sync will issue an SCL init
  • When muon crate requests an SCL should issue an SCL init
  • Shift reports reformatted, also problem reports are now sent to the online log book
  • Shift reports are mailed to d0_daqtf
  • RM, Super, and MS crashes can send email.
  • Gets ALARM/SES information and prints it out to the problem reports, but doesn't attribute them for downtime reasons yet (next version).
v02-01-00 11/12/2002
  • If an L2 crate goes out of sync more than 15 seconds after data flow has stopped, don't label that downtime as L2 Sync error.
  • Shift reports can now be sent to email addresses.
  • Added ability to send email for arbitrary conditions
    • RM, Super, or Monitor Server crash will now send email to interested parties
  • The daqAI_gui now correctly updates the status and facts page every 1 second with data that isn't more stale than 1 second (instead of 5 seconds).
  • More robust determination of FEB for crates. The new direct connection to TCC for the monitor server meant this data was arriving much quicker than previously, and was causing some difficulties.
  • Added ability to have memory -- this allows one to compare the current state with the previous state (or anything arbitrary). This is, for example, how the system tells that the MS, RM, etc., have crashed.
  • Entries made in the online log book are now surrounded with the PRE tag in an effort to make them more readable.
v02-00-00 11/1/2002 Interim releases were test versions, and never run for a long period of time.
  • The default daqAI script now does the following:
    • Looks for single crates that go FEB
    • Looks for RM to assert disables. If the disable fraction gets over 20%, then it is flagged as a problem. Though it won't show up unless the rate goes below 50Hz as well.
    • Now checks that a crate is in the run before complaining there is a problem with it. A crate is in the run if it is being read out by the Routing Master.
    • Timers for in store/daq configured/data flowing are maintained.
  • Python D0Gui based monitor program that can be run online to view the current state of daqAI.
  • Maintains an arbitrary set of timers that are included in the shift report. There is also the ability to take fractions of the timers with respect to each other. All under script control.
  • Input file is now XML based and supports timers and CLIPS rules.
  • Serves L3 monitor information describing its current state and even logs of the problems and start-of-shift-to-present shift reports.
v00-01-14 09/20/2002
  • Does not send problem reports for every problem to the online logbook (Denisov)
  • Integrated into ups; has a start and stop command for easy use. Works from all machines online (I hope)
  • Fixed open connection bug that was leaving an open TCP/IP connection everytime we logged something to the online log book (Haas)
v00-01-13 09/18/2002
  • Correctly sends SCL inits
  • Added component to detect muon readout errors
  • Added code to publish monitor values from python code.