Operation of DAQ and Slow Controls - TWIST

On this page:

Restarting Run Software

MHTTPD | ODB | DAQ | Slow Controls

If the DAQ stops taking data but everything seems to be running (and the beam is on), the first thing to try is to stop the run and start a new one. This will often get it going again.

Restarting MHTTPD

MHTTPD is the Midas HTTP Daemon, the program which provides the web interface to the Midas controls. It dies sometimes, meaning your web browser won't be able to connect to Midas, but the run continues uninterrupted. No other parts of the DAQ system care whether MHTTPD is running or not; it's just a front end.

If MHTTPD is not running (your browser has complained that it cannot connect, or similar), you can type (as user twistonl on machine midtwist):

start_mhttpd
which checks to see if MHTTPD is running, and if not it starts it.

If MHTTP is responding, but not doing what you expect (eg. it's stuck in an infinite loop, or it's not showing all the messages when you request "last 8 hours", etc), you can type (as user twistonl on machine midtwist):

restart_mhttpd
which kills the presently running MHTTPD and starts a new one.

ODB

Restoring the ODB Terminal Window

Normally, on the "DAQ" desktop, there is a terminal window open to the ODB. This window (with a prompt that usually looks something like [local:twist:Running]/> or similar) is where the system messages, errors, etc appear (including what's read over the speakers). This window is not the actual ODB, but rather an interface to it. Losing this window should not affect data taking; it just means you don't see the status or error messages.

To get the ODB window, open a terminal window, and as user twistonl on machine midtwist type

odbedit
The prompt should change, and you're now in the ODB. (You might want to make this window a little bigger.)

Recovering from an ODB Corruption

If the ODB is not responding, and you cannot reach Konstantin or Renee, a rather complicated recovery procedure is available.

Restarting TWIST DAQ

This section includes restarting the Lazy Logger.

All commands are run as user twistonl on machine midtwist unless otherwise noted.

General Problem

  1. Stop the run.
  2. Run start_daq from a command prompt. (This will check for all the processes that should be running, and restart any which aren't.

PPC (PowerPC) Problem

  1. Stop the run.
  2. Reboot the PowerPC and restart the associated frontend (see below).
  3. If that doesn't help, try to contact an expert (Renee or Konstantin).
  4. If that doesn't help (ie. you can't reach them), do a kill_daq followed by a start_daq (but only as a last resort). (You'll need to reboot the PPC machines at this point, too.)

To Reboot PPC's:

From the Midas Status page, push the "Programs" button. Then push the "stop" button for fbc1fe or fbc2fe, whichever PPC has the problem. They should go red on the status page. Then reboot the PPC machine: The last things you should see are some pedestal things, followed by a few TDC things. Watch that the error "Cannot init equipment record, probably other FE is using it" does not occur. If the error does show up, go to the programs page and stop the front end again, then reboot the PPC again. The error shouldn't come up this time.

If, when you press the "stop fbcxxx" button on the "Programs" page, it complains that it cannot stop the frontend, continue with rebooting the PPC.

Restarting TWIST Slow Controls

When the Launcher is working:

If the slow controls programs are not running, go to the main Midas page, and click on "SlowLauncher" (just below the row of buttons, near the top of the page). This screen lets you stop and start the various Slow Controls programs. ("test" does not need to be running. All others should be running.)

You can also access the SlowLauncher directly at this address:

http://midtwist.triumf.ca:8082

In either case, if you get "connection refused" errors instead of the "Slow Launcher" page, maybe the launcher is not running. Start it by running start_daq from twistonl@midtwist.

If there are still some red "FE Nodes" on the Midas page after trying to start everything, try stopping the appropriate frontends and restarting them.

Which one needs restarting can be determined by looking for which frontend on the SlowLauncher page does not appear on the Midas page. (This snapshot of the Midas page may also be helpful.) The following shows some of the "StatusBar" items controlled by Slow Controls frontends:

Make an Elog entry and restart the front ends when you have trouble controlling them.

When the Launcher is hung:

Normally, the launcher is stopped by using the "shutdown" button on it's web interface. If the launcher is not responding you will have to kill it by hand:

Possibly Special Case: fe3hp

If unable to restart fe3hp, try the following instructions from Konstantin:
fe3hp was not running because the Digi terminal server was refusing
connections. Here is the recovery procedure:
1) go to http://142.90.130.83/
2) login as "root", password "dbps"
3) go to the "admin"->"who" page
4) find the "tty03 | 142.90.101.76 | direct_tty01 | 0 | Kill Session" entry
5) go to the "kill session" link
6) go to the MIDAS Status->Programs page. "Stop fe3hp"
7) fe3hp should restart, "ubeam stale data" should clear within 2 minutes.
K.O.

After an e614slow reboot or crash

In general, a crash of e614slow does not prevent running the DAQ and taking data. However, a number of services will be disrupted and may have to be restarted manually:


Handling Software Alarms and Error Messages

Remember to make an ELog entry whenever there's an alarm.

Event Builder Alarm

If the Event Builder dies, you will see a large red bar on the Status Page.

Synch Error

Make sure the run has successfully stopped ("End run" lines in both PPC windows). If it hasn't stop it manually from the Status screen.

Try to start a new run. If it dies, try again. If that dies too, call an expert.

Once a new run has started, reset the alarm: from the Status page, press button "Alarms", then press button "Reset" at right of line for "fbc1fe" or "fbc2fe" (whichever tripped). Press button Status to return to status page.

lazy_file_exists file runNNNNN.ybs doesn't exists

If the Messages screen (or the ODB window) is filling up with message like the above, check to see if the mentioned run file does exist:
ls -l /twist/data_onl/current/runNNNNN.ybs
It probably exists, but has zero size. If that's the case, just delete the file (with rm), and see that the lazy logger stops complaining.

General DAQ/Midas Operation

Daq consists of 2 sets of components: DAQ proper and Slow Controls.
Slow controls is documented elsewhere on this page.

DAQ proper components are running on the host machine MIDTWIST
and on the Power PCs located in each of the Fastbus crates.

DAQ Computers

The TWIST DAQ uses several computers:

PPC tasks:

The tasks running on the PowerPCs are called "fbc1fe" and "fbc2fe" respectively.
Outputs from these tasks are seen in the "PPC windows" PPC1 and PPC2. These
windows are not essential to acquisition of data but they are the console of the PPCs
and give useful information.
If power is lost on the Fastbus crates, PPC1 in crate 1 reboots automatically with power on.
PPC2 usually looses the boot parameters which have to be reloaded from PPC2 window.
If anything goes wrong in the PPCs, it is best to shutdown the programs from netscape
and to issue a reboot command in the PPCx window.
If the PPCx window does not respond to a <CR>, you have to go to the Fastbus crate
and press the "reset" button on the PPC. If you do not know where that is, turn power off
on the PPC power supply control and turn power back on.

MIDTWIST tasks:

The essential tasks running on midtwist are The other tasks running on midtwist are

General Slow Controls Operation

The Slow Control frontend (SCfe) programs access slow control readout devices or Epics to set and/or monitor variables.

The SCfe report information two ways:

  1. Data events are sent to the MIDAS event pool. If "logging" is enabled, these events are written in the run data files mixed with the Fastbus TDC events. Later the files are spooled to tape. The SC events contain the RAW data and the CALIBRATED data. These SC events are generated:

  2. Data is written in the ODB (Online DabaBase) which is a snapshot of the status of DAQ. When data is sent to the ODB, it replaces previous values. There is much more information in the ODB than just the values of variables. At the end of each run, the ODB is written in an ASCII file called runxxxxxx.odb, kept on disks and backed up regularly. It is intended to keep all these files forever on the disk. Only the CALIBRATED values are sent to the ODB. Also, there is a setting called "LOG_HISTORY" for each "equipment" type. If this value is >0, it is interpreted as the frequency for logging the variables from the given equipment to the HISTORY files. This is done by the program logger all the time regardless of the DAQ state: RUNNING or STOPPED. If the program logger is NOT RUNNING, there is no history file created. We usually turn off this program during long shutdown. At the moment, all our SC equipments have LOG_HISTORY set to 60 seconds, except for LAS which readouts out whenever it feels like it. (this should be fixed) The values logged in the history files are taken directly from the ODB. There is no knowledge as to when the value was read last. It is just a snapshot of the values every minute.

Map of Slow Controls Channels

Shows, among other things, which Slow Controls frontends handle which variables.

B1, B2 NMR regulators

The B1 and B2 NMR regulators are designed to compensate for slow drifts of magnet currents and fields. The regulators adjust the B1 and B2 DACs to force the NMR values to be as close as possible to user-set NMR setpoint values.

The regulator code is in feepics and it uses the NMR measurements from fenmr. The regulator activity can be seen in the "history" panels "B1_B2_regulator" (NMR deviations from the setpoints) and "B1/B2_regulator" (raw NMR and averaged NMR vs DAC values).

During normal running, the B1 and B2 regulators should be turned on. If they are off, for example after feepics restart, the "B1/B2 regulator is OFF" alarms will trigger. These alarms can only be cleared by restarting the regulators or by disabling the alarms.

The regulators are started by entering the NMR setpoints. First, make sure the NMRs are locked. (If NMRs lose the lock, the "NMR" status goes "yellow" and a midas alarm should trigger, but it is not 100% reliable.) Also, B1 and B2 should be set to within 5 Gauss of the desired values. Then go to the MIDAS status page and click on the "Reg_B1" or "Reg_B2" aliases. This will open the ODB "alias" page. Click on the "Reg_B1" or "Reg_B2" NMR setpoints and enter the desired NMR values. (This activates the regulators.) Go back to the midas status page. You should see "Start the B1/B2 regulator with NMR setpoint: xxx" midas messages. Clear the "B1/B2 regulator" alarms by going from the midas status page to the "alarms" page and clicking the "reset alarm" buttons. If the alarms refuse to clear, try clearing them again after 5 minutes. If they still do not clear, call the expert.

Once running, the regulators can fail. If they do, the "B1/B2 regulator FAILED" alarms should trigger. There are several failure reasons: changing DAC values from EPICS, NMRs losing their signal lock for more than 10 minutes and NMRs drifting too far away too fast.

To Restart a Regulator

  1. Make sure that the NMR is locked.
  2. Adjust the DACs (from EPICS) to bring the NMR readback within 5 Gauss to the desired NMR setpoint.
  3. Restart the regulator by reentering the NMR setpoint.
  4. Clear the alarm from the "alarms" page. If the alarm does not clear, try clearing them again after 5 minutes. If it still does not clear, call the expert.

Using the StatusBar

The StatusBar is a button bar which is normally started by the start_daq script. Some useful information about the StatusBar:

Other DAQ-Related Software


Back to the Run page.

Last updated 1 December 2005 by Robert MacDonald.