DATABASE DOCUMENTATION Mina Nozar and Daniel Graves Last Update: July 7, 2005 1) Disk Space Database used by web interface: "http://tw04.triumf.ca/private/sysman/wg_disk_db/websummary.pl" Overview ======== The disk space database works by using ssh to log into each ice node and execute "statfs" on the local /data partition as well as determining the disk space used per genXXX on each node. The database also finds the global disk space using "statfs" for the "/global/scratch" and "/global/home" partitions. Then it finds e614 usage using files located in "/global/home/DU" and "/global/home/DU_scratch". If these files cannot be found or have zero size, then e614 usage is determined by executing "du -s". Database Files in "glacier.westgrid.ca:~e614/disk_usage_db" =========================================================== Seven days of database files are located on nunatak2.westgrid.ca All database files are in the "~e614/disk_usage_db" directory, with the most recent, "diskspace.0.db", and the oldest, "diskspace.6.db". Script Files ============ The scripts generating/using the information in the database files reside in "~e614/bin/disk_usage". The script that gets executed to determine free/used disk space is "~e614/bin/disk_usage/check_diskspace.pl". The script generating the website are on the twist side under: "~/private_html/sysman/wg_disk_db". There are a few scripts that read the most recent database file: diskspace.0.db. "~e614/bin/disk_usage/nodes_down.pl" outputs the names of all the local nodes that are down. "~e614/bin/disk_usage/nodes_low.pl" outputs the names of all the local nodes that have <30% free space. "~e614/bin/disk_usage/summary.pl" outputs a summary table with the same information as the website. A program called "~e614/bin/disk_usage/statfs" written in C using a UNIX system call to find the disk space usage of a file system only. The output is in number of blocks and is converted into bytes by the "~e614/bin/disk_usage/check_diskspace.pl" script. Web Site Files ============== The website files are located on tw04.triumf.ca under "~e614/public_html/private/sysman/wg_disk_db" The web site is written in perl and the page is called "websummary.pl" Cron Job Files ============== The script that gets executed by the cron job is located in "/global/home/e614/bin/disk_usage" and is called "disk_usage_db_cron.sh". The cron job is under the e614 cron tab on "nunatak2". To view the cron tab for e614, use the command "crontab -l". The cron job is run at 5:30 a.m. every morning and STDERR is redirected to STDOUT. The cron job command is as follows: 30 5 * * * source /global/home/e614/bin/disk_usage/disk_usage_db_cron.sh > /global/home/e614/disk_usage_db/disk_usage_db_cron.log 2>&1 This script a) runs "~e614/bin/disk_usage/check_diskspace.pl". b) "scp"s the "~/disk_usage_db/diskspace.0.db" to "tw04:~e614/private_html/sysman/wg_disk_db" which is in turn used by "websummary.pl" to generate the web page. ============================================================================================== 2) Job Status Database Overview ======== The job status database originally worked by reading information from /global/scratch/qstat/qstat-f.cache, which used to get updated through a cron job by the Westgrid administration, every 15 minutes. Once the WG adm. discontinued updating this file, we started creating our own flie: "~e614/jobs_running_db/qstat-f.cache" Database Files in "glacier.westgrid.ca:~e614/jobs_running_db" ============================================================= 3.5 hours worth of database files are located on "glacier.westgrid.ca". All database files are in the "~e614/jobs_running_db" directory, with the most recent, "jobsrunning.0.db", and the oldest, "jobsrunning.6.db". A database called "history.db" contains the e614 history sampled from the job status database at once a day at about noon. This is accomplished within the "check_qstat.pl" script. There is currently no limit on the size of the database can grow. As a result, the history was limited to being updated once a day. Script Files ============ The scripts generating/using the information in the database files reside in "~e614/bin/jobs_running". The script creating the database file, "jobsrunning.0.db" is "check_qstat.pl. The script generating the website are on the twist side under: "~/private_html/sysman/wg_jobsrunning_db". There are a few scripts that read the most recent database file, "jobsrunning.0.db". "qstat_summary.pl" outputs a text-based table of all the jobs currently in PBS, either running or queued. "e614_qstat_summary.pl" outputs a text-based table of all the e614 jobs currently in PBS, either running or queued. "history_summary.pl" prints a comma delimited table of e614 history. Web Site Files ============== The website files are located on "tw04.triumf.ca", under "~e614/public_html/private/sysman/wg_jobsrunning_db". The website is written in perl and the page is called "websummary.pl". The history web page is in a separate script called "history_websummary.pl" Cron Job Files ============== The script that gets executed by the cron job is located in "~e614/bin/jobs_running" and is called "jobs_running_db_cron.sh". The cron job is under the e614 cron tab on "nunatak2". To view the cron tab for e614, use the command "crontab -l". The cron job is run every 15 minutes and STDERR is redirected to STDOUT. The cron job command is as follows: 7,22,37,52 * * * * source /global/home/e614/bin/jobs_running/jobs_running_db_cron.sh > /global/home/e614/jobs_running_db/jobs_running_db_cron.log 2>&1 This script a) "scp"s the "~e614/jobs_running_db/jobsrunning.0.db" to "tw04:~e614/private_html/sysman/wg_jobsrunning_db/" which is in turn used by "websummary.pl" to generate the web page. b) "scp"s the "~e614/jobs_running_db/history.db" to "tw04.triumf.ca:/home/e614/public_html/private/sysman/wg_jobsrunning_db/" which is in turn used by "history_websummary.pl" to generate the web page. ==================================================================================================================================