Purdue CMS Tier-2 Center
Home » User Information » CMSSW Job Tutorials » CRAB3 at Purdue

CRAB3 at Purdue

Introduction

This tutorial allows user to run CMS analysis on input dataset using CRAB3.

Differences between CRAB2 and CRAB3

For those who already know about CRAB2 and have to face the transition to CRAB3, here is a list of architecture improvements (or just differences) in the new version.

Login to UI

First login to Purdue Tier-2 user interface. We encourage user to use bash shell for their CMS analysis work. This tutorial is based on bash shell environment.

Setup the environment

In order to have the correct environment setup, the order in which one should source the environment files has to always be the following:

  1. CMSSW installation (only once).
  2. CMS environment (for every new shell).
  3. CRAB environment (for every new shell).

CMS software installation

Install CMSSW in a directory of your choice. We suggest to create a sub directory (e.g. called CRAB3-tutorial) and install CMSSW there. Before installing CMSSW, one has to check whether the scram architecture is the one needed (in our case slc6_amd64_gcc481 ), and if not, change it accordingly. The scram architecture is specified in the environment variable SCRAM_ARCH . Thus, one has to check this variable:

echo $SCRAM_ARCH

If the environment variable SCRAM_ARCH is not set to slc6_amd64_gcc481 , then set it by running following command:

export SCRAM_ARCH=slc6_amd64_gcc481

Only after setting the appropriate scram architecture, install CMSSW

source /cvmfs/cms.cern.ch/cmsset_default.sh
cd /home/$USER
mkdir  CRAB3-tutorial
cd CRAB3-tutorial
cmsrel CMSSW_7_0_5

CMS environment

Setup the CMS environment:

cd CRAB3-tutorial/CMSSW_7_0_5/src
cmsenv

The cmsenv command will automatically set the scram architecture to be the one corresponding to the installed CMSSW release. CRAB environment CRAB3 can be setup by sourcing:

source /cvmfs/cms.cern.ch/crab3/crab.sh

This script always points to the latest version of CRAB3. After sourcing this script, it is possible to use CRAB from any directory.

Get a CMS VO proxy

To request a proxy valid for seven days, execute:

voms-proxy-init --voms cms --valid 168:00

CMSSW configuration file

The CMSSW config file pset_tutorial_analysis.py slim an already existing dataset. In this tutorial, we will show you how to slim an official CMS dataset.

Input dataset

We will be using dataset /GenericTTbar/HC-CMSSW_7_0_4_START70_V7-v1/GEN-SIM-RECO for this tutorial.

CRAB configuration file

For convenience, we suggest to locate the CRAB configuration file in the same directory as the CMSSW parameter-set file to be used by CRAB.

Input dataset available in 'global' DBS

The expected (by CRAB) default name of the CRAB configuration file is crabConfig.py, but of course one can give it any name (respecting always the file name extension .py and not adding dots in the base file name), as long as one specifies the name when required (e.g. when issuing the CRAB submission command).

Private input files

When running an analysis over private input files, one should create a text file with the names (giving the full path) of the input files, and specify in the parameter 'userInputFile' the name of the text file. You also need to whitelist the site to 'T2_US_Purdue' so that jobs can run only at Purdue. Crab config file for private input dataset can be found here .

Data handling in CRAB

All successful jobs in a task produce output files which are eventually copied (staged-out) into the CMS site permanent storage element specified in the Site.storageSite parameter of the CRAB configuration file. The jobs also produce out/err log files from the CMSSW code, which are by default not copied to the permanent storage element, but kept in the temporary (meaning that they will soon be deleted) storage element of the running site. The user can force the stage-out of the CMSSW log files to the permanent storage element by setting General.saveLogs = True in the CRAB configuration file. The user can also disable the stage-out of the output files by means of the General.transferOutput parameter, but in that case publication will not be possible. If the output files are successfully transferred to the permanent storage element, CRAB will automatically (and by default) publish the output dataset in DBS. This is done by the same service (ASO) which does the stage-out. If the user wants to disable the publication, he/she can set Data.publication = False in the CRAB configuration file.

CRAB files naming convention

When using CRAB, the LFN of the permanently stored output files are of the following form:

  • For files stored in a user storage space:

/store/user/<dir>[/<subdirs>]/<primary-dataset>/<publication-name>/<time-stamp>/<counter>/<file-name>

CRAB commands

In this section, we provide a list with the currently available CRAB commands and their explanation. We will see how to use the commands as we go along in the tutorial.

Command Description
submit Submit a task.
status Report the states of jobs in a task (and more).
resubmit Resubmit the failed jobs in a task.
report Get a task final report with the number of analyzed files, events and luminosity sections.
kill Kill all jobs in a task.
getoutput Retrieve the output ROOT files from a task.
getlog Retrieve the log files from a task.
uploadlog Uploads the crab log file to the CRAB cache in the server.
checkwrite Check write permission into a site.
purge Clean-up the user's directory in the schedd's and in crabcache.

CRAB3 environment

To run a CRAB command, one has to type:

crab <command>

One can also get a list of available commands invoking the crab help menu:

crab -h

The 'crab checkwrite' command

The crab checkwrite command can be used by a user to check if he/she has write permission in a given LFN directory path (by default /store/user//) in a given site. The syntax to be used is:

crab checkwrite --site=T2_US_Purdue

If you don't have write permission to storage at Purdue Tier-2, send mail to cms-support@lists.purdue.edu

The 'crab checkHNname' command

The crab checkHNname command tries to retrieve the user's HyperNews username from SiteDB. Below is shown the crab checkHNname screen output for the case of successful retrieval of my HyperNews username from SiteDB.

crab checkHNname

Running CMSSW analysis with CRAB

In this section, we will show how to run analysis on input dataset. We will be using the CRAB configuration file 'crabConfig.py'.

Task submission

To submit a task, execute the following CRAB command:

crab submit -c crabConfig.py

Task status

To check the status of a task, execute the following CRAB command:

crab status crab_projects/crab_tutorial_MC_analysis_test1

The crab status command will produce an output containing the task name, the status of the task as a whole, the details of how many jobs are in which state (submitted, running, transferring, finished, cooloff, etc.) and the location of the CRAB log (crab.log) file. It will also print the URLs of two web pages that one can use to monitor the jobs. One can also get a more detailed status report (showing the state of each job, the job number, the site where the job ran, etc.), by adding the --long option to the crab status command. For our task, we run:

crab status crab_projects/crab_tutorial_MC_analysis_test1 --long

Job states

The job state idle means that the job has been submitted, but is not yet running. On the other hand, the job state cooloff means that the server has not submitted the job yet for the first time or that the job is waiting for automatic resubmission after a recoverable error. For a complete list and explanation of task and job states, please refer to Task and Node States in CRAB3-HTCondor.

Task resubmission

CRAB allows the user to resubmit a task, which will actually resubmit only the failed jobs in the task. The resubmission command is as follows:

crab resubmit -t crab_projects/crab_tutorial_MC_analysis_test1

Task report

One can obtain a short report about a task, containing the total number of events and files processed by completed jobs:

crab report crab_projects/crab_tutorial_MC_analysis_test1

Task log files retrieval

The user can retrieve the full cmsRun logs using the following CRAB command:

crab getlog [-t] <CRAB-project-directory> [-i <comma-separated-list-of-jobs-and/or-job-ranges>]

For example, suppose we want to retrieve the log files for jobs 1 and 3 in the task. We execute:

crab getlog crab_projects/crab_tutorial_MC_analysis_test1 -i 1,3

Task output retrieval

In case one wants to retrieve some output ROOT files of a task, one can do so with the following CRAB command:

crab getoutput [-t]  [-i ]

The files are copied into the corresponding task's results subdirectory. For our task, the output files can be retrieved by running:

crab report crab_projects/crab_tutorial_MC_analysis_test1

Output dataset publication

If publication was not disabled in the CRAB configuration file, CRAB will automatically publish the task output dataset in DBS. The publication timing logic is as follows: the first files available (whatever is the number) are published immediately; then ASO waits for 100 files (per user) or until the task is in state COMPLETED (i.e. all jobs finished successfully, and therefore all files are published) or a maximum of 8 hours (in which case only the successfully finished jobs are published). One can check the publication status using the crab status command. For our task, we run:

crab status -t crab_projects/crab_tutorial_MC_analysis_test1

The publication state idle means that the publication request for the corresponding job output has not yet been processed by ASO. Once ASO starts to process the request, the publication state becomes running, and finally it becomes either finished if the publication succeeded or failed if it didn't.

More information

User is advised to read information available on CRAB3 tutorial twiki. More information about CRAB3 can be obtained by sending mail to CRAB3 hypernews list.

CPU Utilization

Raw Storage Use