WESTGRID WALKTHROUGH: Monte Carlo ================================= Robert MacDonald & Mina Nozar Last Update: April 9, 2006 Generating and Analyzing a fresh MC Set (using Gen 999 as an example): NOTE: If you want to analyze an existing MC gen which is still on the local nodes (with symlinks to the data files under $SPOOLDIR/finished), skip to the job submission section of this walkthrough. If the MC data has been removed from the local nodes and archived at SFU, it must first be retrieved into /global/scratch; see see "Restoring MC Data from SFU" for retrieving the MC from archives. The flags "--mofia-only" and "--mofia_inpath="/global/scratch/twist/mc/genxxx" (with the appropriate gen number) will need to be added to the tbsub.pl command, in addition to the usual steps required for a new analysis. Aside from that, skip to the job submission section of this walkthrough and follow the instructions. In either case, remember to add appropriate entries to the "MC Analysis" web page for the new analysis! - Run through the Westgrid Production Checklist: http://twist.triumf.ca/private/sysman/westgrid/WestgridCheckList.txt - Logging info for tracking purposes: Make entries for gen999 on both the appropriate "MC Analysis" and "Black Box Parameters" web pages under the TWIST Analysis. Keep these pages up to date as the production progresses, checking off stages as they are done. - Disk space requirements: * Disk space requirements include raw simulation output files, and the resulting Mofia output files. Compute the disk space requirement for a given MC generation and production job. You can do this by running a few (multi-sample) runs of test simulation on the TWIST Cluster. * Detailed information on the available disk space and e614 usage on both "local" and "global" areas can be found at: http://twist.triumf.ca/private/sysman/wg_disk_db/websummary.pl The information is updated nightly. From the info on this page, you need to assess whether there is enough space for generating a given MC gen. You may need to delete existing generated MC from the local nodes first before proceeding with the generation of a new set. Consult with one of the TWIST WG coordinators (Mina, Dick, Renee). Generation: The generated raw MC reside on a local node's "/data" partition area (10 GB reserved for e614). Output files per MC generation run, under a given local node: eg. ice#_#:/data/twist/e614/tbsdata/gen999/run#: run#.dat run#.hbook tbslog.txt Processing: The processed MC (Mofia root + log files) reside on the /global/scratch area. The verbal agreement with the WG folks is that we will always leave at least 10% free and that we will restrict ourselves to < 3 TB. Consult one of the TWIST WG coordinators if this number needs to be exceeded. Output files per MC processed run, under /global/scratch/twist/systematics/mc/gen999/anal1/root/run#/ tree#.root r#.root mofialog.dat #log.txt - Setting up the job submission: - Use the usual "certificate" method to sign onto the Westgrid e614 account. (Talk to Mina or Renee if you don't know how to do this.) - Spool directory structure: - Create the SPOOLROOT directory ~e614/tbsroot/gen999, and the following subdirectories: common/ (e614.com, gb.sh, and .kcm files go here) queued/ (generated ffcards corresponding to individual geant run will go here) Note: The "gen#.ffcards" (and the subsequently generated [run#].ffcards, g614.com, and gb.sh files are required for MC generation, while the "gen#anal#.kcm" file is required for the processing of the generated MC data. The information and submission files are optional but highly recommended for referencing and bookkeeping purposes. - The information file: - Create a gen999.info file in the SPOOLROOT directory. This file should contain a short description of this geant job (generation and/or analysis) and other information such as whether it is a "single Michel sample/run" or a "multi Michel sample/run". - The ffcard files: - Put the basic gen999.ffcards file under the SPOOLROOT directory. - Change to the SPOOLROOT/queued/ directory and use either of the two scripts: "mkffcards.pl" or "mkmultsmplffcards.pl" to generate individual ffcard files for all the Geant runs. Use "mkffcards.pl for generating "one Michel sample/geant run" or "mkmultsmplffcards.pl" for generating "multi Michel samples/geant run" Copy the command line you used into the .../gen999/gen999.info file for future reference. Note: As for any decent script, either running the script without any command line arguments and/or with the "-h" or "--help" option will give the usage info, the available options, and sometimes examples. - The e614_WG.com file: - This file, is the WG version of TWIST e614.com, different from the TWIST cluster, only in details related to job submission such as the input and output directory settings, location of the geant executable, and the way ffcard file setting is used. - The e614_WG.com master file is in cvs under the area: /home/e614mgr/cvs/westgrid/rundb-dev area. A copy of this file should be put under SPOOLROOT/common directory with appropriate modifications for the "job specific" settings. In tbsub.pl, what gets executed for a geant job submission is: "e614_WG.com run#.ffcards". - The e614_WG.com file contains environment variable settings specific to a geant run generation, such as: Mandatory files - always read CHGEOM Detector geometry ISOMAP1 Map of drift times in DC cell (isochrones) ISOMAP2 Map of drift times in PC cell (isochrones) ISOMAP3 Map of drift times in TC cell (isochrones) TDCMAP1 Map of TDC channels (1st bank) TDCMAP2 Map of TDC channels (2nd bank) Optional files - depend on FFCARDS settings FIELDMAP Field map file OPERAMAP Opera field map file RAYFILE REVMOC or M13GEANT rays DCEFF Plane efficiency for each DC plane PCEFF Plane efficiency for each PC plane MUBEAM Muon beam properties EBEAM Electron beam properties Note: It should be obvious that before an MC generation, all field settings in this file should be checked. - The gen999anal#.kcm file: - The gen999anal#.kcm file is used by Mofia for processing the generated MC file. All fields need to be checked before proceeding with the analysis. The kcm file should be placed in $SPOOLROOT/common directory. The name must follow the "gen#anal#.kcm" format. (The analysis number will be extracted from this name and used for job names, output directory name construction, etc.) - The gen999-EnvFile and gen999anal1-EnvFile: The default template EnvFile used by ~e614/bin/tbsub.pl resides in ~/rundb-dev/e614_EnvTmpFile. This file contains a set of environment variables specific to an "MC generation/processing" or "data processing" job. Copy the e614_EnvTmpFile file into $SPOOLROOT/common and name it gen999-EnvFile. Edit this new file, checking all the settings to make sure they match what you want to do, both for MC generation and Mofia analysis. There are lots of comments in the EnvFile explaining what the options mean. You can also get information from "tbsub.pl -h". (Don't forget to change the gen### and anal#!) The file gen999anal1-EnvFile is generated automatically by tbsub.pl if analysis is to be done on the generated MC as well; it is set up for "mofia only" analysis without additional generation. - Pay particular attention to the "walltime" and "file" options in the PBSPARAM line. These should be your best estimate of how long the COMBINED geant & Mofia job will take to run (within an hour or so), and your best estimate of raw MC output disk space requirement. Again, you may need to do a test run on the TWIST Cluster to find out. - There are lots of comments in the EnvFile explaining what the options mean. You can also get information from "tbsub.pl -h". - The Geant and Mofia executables: By default these files should reside in the "~e614/rundb-dev/exe" area. A description of the executables should be put in the ReadMe file in that directory. The executable names can be specified through setting the appropriate arguments in "tbsub.pl" - The sub_gen999 file: In the SPOOLDIR, create a file named "sub_gen999". This file should contain the ~e614/bin/tbsub.pl command with appropriate submission options. Most options have been set in the EnvFile you prepared above, however, so the only command line options you'll need are the genNNN and the --kcm_path option. For example: /global/home/e614/bin/tbsub.pl --genName=gen999 --analNum=1 To analyze existing generated MC, add the --mofia_only flag to the command line and choose an appropriate kcm file by using the --kcm_path option. To generate without analyzing, add the --geant_only flag. (These can also be specified in the EnvFile instead.) - Job submission: It is recommended that you make sub_gen999 file under the SPOOLROOT directory executable and submit jobs with: ./sub_gen999 That way, there is a record of the command line options used for future reference. Check ~e614/logs/pbs/gen999 for pbs log files. They are of size 0 if all goes well with the scripts, etc. and the files get deleted after completion of the jobs; however, in cases of immediate problems (errors/typos in the scripts) these log files will contain the relevant error messages. - ELOG Posting (1): When you've submitted your job, post a note on the TWIST Westgrid ELOG letting people know what you've done, and what gen999 is. - Checking the status of jobs: You can watch the progress of your jobs on the "WG Job Status" page: http://tw04.triumf.ca/private/sysman/wg_jobsrunning_db/websummary.pl The info. on this page is updated every 15 minutes. There is a link to this page from the TWIST ANALYSIS page. (Scroll down to "e614". If there's no "e614" listed, we don't have any jobs running.) Keep an eye on your jobs to make sure they're running happily. - MC generation output: Will be placed under a local node's directory: /data/twist/e614/tbsdata/gen999/ When the generation of MC is completed (and checked), there will be a symlink in the ~e614/tbsroot/gen999/finished directory to the local node where the MC output resides. Upon a successful MC generation job, one Geant log file per given gen# gets automatically copied to ~e614/tbsroot/gen999/goodlogs. - Mofia output of the MC: Will be placed under: /global/scratch/twist/systematics/mc/gen999/anal1/root/ When the processing of MC is completed, "checktrees" is run automatically on a run by run basis by "tbsub.pl". This generates two additions directories: /global/scratch/twist/systematics/mc/gen999/anal1/goodlinks /global/scratch/twist/systematics/mc/gen999/anal1/badlinks The Successfully analyzed runs are sorted from the failed ones into the appropriate directories, via creation of links to the runs under /global/scratch/twist/systematics/mc/gen999/anal1/root/ Note: These directories or any of their contents should not be removed or renamed by users since they are an integral part of the "logic" in "tbsub.pl" during the job resubmission process. Upon a successful MC processing job, one Mofia log file per given anal# gets automatically copied to ~/tbsroot/gen999/goodlogs/. - Checks after job completion: Look under the newly-created "errorlogs" directory for log files from runs that died. Try to investigate the cause. - ELOG posting (2): Make another note in the TWIST Westgrid ELOG, stating the jobs is complete, noting errors, if any. Also write down the number of successfully completed jobs, how long the Geant+Mofia jobs took, and how much disk space was used for MC files and trees. - Job resubmissions: If a "good portion" of the jobs failed, then a job resubmission is necessary. Depending on where the failure happened, generation or processing, and the cause, network, disk space, ... problems then the remaining jobs need to be resubmitted. In most cases, this is a straightforward process by using "tbsub.pl". If you are not sure how to proceed, don't PANIC. The script is setup (to resubmit incomplete jobs) relatively painlessly (thanks to Mina). If you need assistance, contact Mina. - Transferring trees to TRIUMF ("TTT" or "T3"): If "most" of the runs finished properly (this is a judgment call to some extent), transfer the trees to TRIUMF. See the documentation linked from the TWIST Analysis Page. - ELOG posting (3): Make another note in the TWIST Westgrid ELOG, stating that the trees for the gen# are successfully transfered to TRIUMF. - Transferring geant + Mofia log files to TRIUMF: - To transfer the log files, Environment files, and other useful information back to TRIUMF for inclusion on the Westgrid web page, use the script: ~e614/bin/scp_GenSetInfo_to_TRIUMF.pl - Logging info for tracking purposes: Don't forget to check off the appropriate boxes under the gen999 entry in the TWIST "MC Analysis" page: http://twist.triumf.ca/private/TWIST_2004_2007_Analysis/2004_pass2/MC_Analysis.html - Inform the TWIST Westgrid coordinator of the status of your analysis so that tree-summing --> Energy calibration --> tree-summing can proceed.