WESTGRID WALKTHROUGH: Data ============================ Rob MacDonald and Mina Nozar Last Update: April 9, 2006 Retrieving and Analyzing a Data Set (using Set 33 Anal 1 as an example): - Run through the Westgrid Production Checklist: http://twist.triumf.ca/private/sysman/westgrid/WestgridCheckList.txt - Logging info for tracking purposes: Make entries for set33, including the new Analysis number, on the appropriate "Data Analysis" web pages under the TWIST Analysis. Keep these pages up to date as the production progresses, checking off stages as they are done. - Before retrieving data, check the amount of free space on /global/scratch; we don't want to use too much of the disk. Check the summaries on the TWIST Westgrid page: http://tw04.triumf.ca/private/sysman/wg_disk_db/websummary.pl The information is updated nightly. From the info on this page, you need to assess whether there is enough space for generating a given data set plus new Mofia output. You may need to delete existing generated data sets or trees first before proceeding with the generation of a new set. Consult with one of the TWIST WG coordinators (Mina, Dick, Renee). The processed data (Mofia root + log files) reside on the /global/scratch area. The verbal agreement with the WG folks is that we will always leave at least 10% free and that we will restrict ourselves to < 3 TB. Consult one of the TWIST WG coordinators if this number needs to be exceeded. Output files per MC processed run, under /global/scratch/twist/systematics/data/set33/anal1/root/run#/ tree#.root r#.root mofialog.dat #log.txt - Retrieving the Data: - Log into guide.westgrid.ca as e614. - Use "~e614/bin/wg-rawdata-retrieve.perl set33" to retrieve the data onto Westgrid. You can also retrieve by run numbers. - Use "~e614/bin/create_symlinks.pl" to set up symlinks in /global/scratch/twist/rawdata_symlinks/set33 - ELOG Posting (1): When the data is retrieved, post a note on the TWIST Westgrid ELOG letting people know, including which run numbers were retrieved and what set33 is. - Setting up the job submission: - Use the usual "certificate" method to sign onto the Westgrid e614 account. (Talk to Mina or Renee if you don't know how to do this.) - Spool directory structure: - Create the SPOOLROOT directory ~e614/tbsroot/set33, and the following subdirectories: common/ (Environment and .kcm files go here) queued/ (symlinks to the data files will go here) - The information file: - Create a set33.info file in the SPOOLROOT directory. This file should contain a short description of this data set and other information that might be useful. - The set33anal#.kcm file: - The set33anal#.kcm file is used by Mofia for processing the data file. All fields need to be checked before proceeding with the analysis. The kcm file should be placed in SPOOLROOT/common directory. The name must follow the "set#anal#.kcm" format. (The analysis number will be extracted from this name and used for job names, output directory name construction, etc.) - The set33anal1-EnvFile: The default template EnvFile used by ~e614/bin/tbsub.pl resides in ~/rundb-dev/e614_EnvTmpFile. This file contains a set of environment variables specific to an "MC generation/processing" or "data processing" job. Copy the e614_EnvTmpFile file into SPOOLROOT/common and name it set33anal#-EnvFile (where # is the analysis number). Edit this new file, checking all the settings to make sure they match what you want to do, both for Mofia analysis. There are lots of comments in the EnvFile explaining what the options mean. You can also get information from "tbsub.pl -h". (Don't forget to change the set### and anal#!) Make sure MOFIA_ONLY is set to 1. Note that the KCM_PATH line will not be used at submission time; the KCM file must be specified on the tbsub.pl command line (see below), and that will overwrite what's in this file. - Pay particular attention to the "walltime" and "file" options in the PBSPARAM line. These should be your best estimate of how long the Mofia job will take to run (within an hour or so), and your best estimate of mofia output disk space requirement. Again, you may need to do a test run on the TWIST Cluster to find out. - There are lots of comments in the EnvFile explaining what the options mean. You can also get information from "tbsub.pl -h". - The Mofia executable: By default this file should reside in the "~e614/rundb-dev/exe" area. A description of the executable should be put in the ReadMe file in that directory. The executable name can be specified through setting the appropriate arguments in "tbsub.pl" - The sub_set33 file: In the SPOOLDIR, create a file named "sub_set33". This file should contain the ~e614/bin/tbsub.pl command with appropriate submission options. Most options have been set in the EnvFile you prepared above, however, so the only command line options you'll need are the setNNN and the --kcm_path option. For example: /global/home/e614/bin/tbsub.pl --genName=set33 --analNum=1 - Job submission: It is recommended that you make sub_set33 file under the SPOOLROOT directory executable and submit jobs with: ./sub_set33 That way, there is a record of the command line options used for future reference. Check ~e614/logs/pbs/set33 for pbs log files. They are of size 0 if all goes well with the scripts, etc. and the files get deleted after completion of the jobs; however, in cases of immediate problems (errors/typos in the scripts) these log files will contain the relevant error messages. - If you only want to submit a few runs instead of the whole set, there are two options: - Use the "--maxjobs=N" flag to the tbsub.pl script. This will submit the first N data files in the requested set. - Make symlinks in $HOME/tbsroot/set33/queued (these can point to other symlinks in .../rawdata_symlinks/set33) and remember to specify this queued directory on the tsub.pl command line; otherwise it'll look in rawdata_symlinks. This is only really useful if there are specific runs you want to analyze, I think. - ELOG Posting (2): When you've submitted your job, post a note on the TWIST Westgrid ELOG letting people know what you've done, and reminding them what set33 is. - Checking the status of jobs: You can watch the progress of your jobs on the "WG Job Status" page: http://tw04.triumf.ca/private/sysman/wg_jobsrunning_db/websummary.pl The info. on this page is updated every 15 minutes. There is a link to this page from the TWIST ANALYSIS page. (Scroll down to "e614". If there's no "e614" listed, we don't have any jobs running.) Keep an eye on your jobs to make sure they're running happily. - Mofia output of the analysis: Will be placed under: /global/scratch/twist/systematics/data/set33/anal1/root/ When the processing of data is completed, "checktrees" is run automatically on a run by run basis by "tbsub.pl". This generates two additions directories: /global/scratch/twist/systematics/data/set33/anal1/goodlinks /global/scratch/twist/systematics/data/set33/anal1/badlinks The Successfully analyzed runs are sorted from the failed ones into the appropriate directories, via creation of links to the runs under /global/scratch/twist/systematics/data/set33/anal1/root/ Note: These directories or any of their contents should not be removed or renamed by users since they are an integral part of the "logic" in "tbsub.pl" during the job resubmission process. Upon a successful data processing job, one Mofia log file per given anal# gets automatically copied to ~/tbsroot/set33/goodlogs/. - Checks after job completion: Look under the newly-created "errorlogs" directory for log files from runs that died. Try to investigate the cause. - ELOG posting (3): Make another note in the TWIST Westgrid ELOG, stating the jobs is complete, noting errors, if any. Also write down the number of successfully completed jobs, how long the Mofia jobs took, and how much disk space was used for data files and trees. - Job resubmissions: If a "good portion" of the jobs failed, then a job resubmission is necessary. Depending on where the failure happened, generation or processing, and the cause, network, disk space, ... problems then the remaining jobs need to be resubmitted. In most cases, this is a straightforward process by using "tbsub.pl". If you are not sure how to proceed, don't PANIC. The script is setup (to resubmit incomplete jobs) relatively painlessly. If you need assistance, contact Mina. - Transferring trees to TRIUMF ("TTT" or "T3"): If "most" of the runs finished properly (this is a judgment call to some extent), transfer the trees to TRIUMF. See the documentation linked from the TWIST Analysis Page. - ELOG posting (4): Make another note in the TWIST Westgrid ELOG, stating that the trees for the set# are successfully transfered to TRIUMF. - Archiving Mofia output trees: - For archiving trees to dsm you use: ~e614/bin/wg-dsm-tar.perl set33_mofia_trees /global/scratch/twist/systematics/data/set33/analX/root (where X is the number you used to set up and define the jobs). You can check that the archiving was done properly as follows: as e614, ssh to guide.westgrid.ca and run "dsm", which brings up a graphical interface. Click on Click on "Retreive files and directories from long term storage", then follow the links on the left. You can browse what's in storage without having to actually transfer anything. - Transferring Mofia log files to TRIUMF: - To transfer the log files, Environment files, and other useful information back to TRIUMF for inclusion on the Westgrid web page, use the script: ~e614/bin/scp_GenSetInfo_to_TRIUMF.pl - Logging info for tracking purposes: Don't forget to check off the appropriate boxes under the set33 entry in the TWIST "Data Analysis" page: http://twist.triumf.ca/private/TWIST_2004_2007_Analysis/2004_pass2/Data_Analysis.html - Inform the TWIST Westgrid coordinator of the status of your analysis so that tree-summing --> Energy calibration --> tree-summing can proceed. - Deleting data from Westgrid scratch space: - Confirm with the TWIST Westgrid coordinator that the data can be deleted from the scratch space (i.e. it's not needed for further processing). - Remove the set33 data from Westgrid using remove_rawdata.pl and presumably clean up the trees etc somehow as well. - ELOG posting (4): Make another note in the TWIST Westgrid ELOG, stating that the data has been removed from Westgrid (if that's the case!) and the processing of this data set is complete.