Purdue CMS Tier-2 Center
Home » User Information » CMSSW Job Tutorials » Submitting CRAB jobs at Purdue

Submitting CRAB jobs at Purdue

Introduction

This document shows an example of submitting a CMSSW job through CRAB. It produces a Monte-Carlo dataset which gets published in DBS. It is based on Exercise 12 of the set of Pre-Exercises for the CMS Data Analysis Schools conducted regularly at various locations - see the full list here.

Login to UI

First login to Purdue Tier-2 user interface. We encourage user to use bash shell for their CMS analysis work. This tutorial is based on bash shell environment.

Setup the CMSSW environment


NOTE: The exact versions of compatible  CMSSW and SCRAM_ARCH  values have to be selected carefully from the endless possible combinations maintained by the collaboration, based on the user's analysis needs.  In the example below we only show one compatible pair, available at the time of the documentation update.

(NOTE: The following procedure is described for CentOS8 environments - e.g. Negishi and Bell clusters. For CentOS7 environments like Hammer/CMS-FE you should substitute SCRAM_ARCH=slc7_amd_gcc700 and CMSSW_10_6_4)


In order to have the correct environment setup, the order in which one should source the environment files has to always be the following:

  1. CMSSW installation (only once):
    mkdir ~/CRAB_tests
    cd ~/CRAB_tests
    export SCRAM_ARCH=el8_amd64_gcc10
    source /cvmfs/cms.cern.ch/cmsset_default.sh
    cmsrel CMSSW_12_3_4
  2. Activate CMS environment:
    cd ~/CRAB_tests/CMSSW_12_3_4/src
    cmsenv
    git cms-init
  3. Initialize CRAB environment
    source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.6/current/el8-x86_64/setup.sh
    source /cvmfs/cms.cern.ch/crab3/crab.sh
  4. Get a VOMS proxy:
    voms-proxy-init -voms cms -rfc -valid 168:00
  5. Check that you can write to Purdue storage:
    crab checkwrite --site=T2_US_Purdue
If the checkwrite test succeeded, you are ready to move on to creating the configuration file.

 

Create a CMSSW configuration file

Using the cmsDriver command, create a template configuration file:

cmsDriver.py MinBias_13TeV_pythia8_TuneCUETP8M1_cfi --conditions auto:run2_mc -n 10 --era Run2_2016 --eventcontent FEVTDEBUG --relval 100000,300 -s GEN,SIM --datatier GEN-SIM --beamspot Realistic50ns13TeVCollision --fileout file:step1.root --no_exec --python_filename CMSDAS_MC_generation.py


This will produce a configuration file called 'CMSDAS_MC_generation.py' in the current directory.

You can test it, before submitting it through CRAB, by doing:

cmsRun CMSDAS_MC_generation.py

This should produce an output file in the current directory, named 'step1.root' 

If that was successful, you are ready to create the CRAB configuration file

 

Create the CRAB configuration file

Copy and paste the contents below in a file named 'crabConfig_MC_generation.py':

 

from WMCore.Configuration import Configuration
config = Configuration()

config.section_("General")
config.General.requestName = 'CMSDAS_MC_generation_test0'
config.General.workArea = 'crab_projects'

config.section_("JobType")
config.JobType.pluginName = 'PrivateMC'
config.JobType.psetName = 'CMSDAS_MC_generation.py'
config.JobType.allowUndistributedCMSSW = True

config.section_("Data")
config.Data.outputPrimaryDataset = 'MinBias'
config.Data.splitting = 'EventBased'
config.Data.unitsPerJob = 10
NJOBS = 10 # This is not a configuration parameter, but an auxiliary variable that we use in the next line.
config.Data.totalUnits = config.Data.unitsPerJob * NJOBS
config.Data.publication = True
config.Data.outputDatasetTag = 'CMSDAS2019_CRAB3_MC_generation_test0'

config.section_("Site")
config.Site.storageSite = 'T2_US_Purdue'
 

 

Submit the CMSSW job with CRAB

Now you can submit your job through CRAB:

crab submit -c crabConfig_MC_generation.py

 

If everything went well, you will see a green 'Success' message, followed by instructions how you can check the status of your submission.

 

Check Task status

To check the status of a task, execute the following CRAB command:

crab status crab_projects/crab_CMSDAS_MC_generation_test0

As part of the output from this command you will see links to the dashboard web-pages for monitoring the status of your submission - you can open them in a browser.

After a while, your jobs will finish, and you may find that some of them failed. If that's the case, you can try 

Task resubmission

CRAB allows the user to resubmit a task, which will actually resubmit only the failed jobs in the task. The resubmission command is as follows:

crab resubmit crab_projects/crab_CMSDAS_MC_generation_test0

 

Get the Task report, logs and output files 

You can obtain a short report about a task, containing the total number of events and files processed by completed jobs:

crab report crab_projects/crab_CMSDAS_MC_generation_test0

To retrieve the full cmsRun logs, use the following CRAB command:

crab getlog crab_projects/crab_CMSDAS_MC_generation_test0

In case you want to retrieve the output ROOT files of the tasks,  do so with the following CRAB command:

crab getoutput crab_projects/crab_CMSDAS_MC_generation_test0

The files are copied into the corresponding task's results subdirectory.

 

Output dataset publication

If publication was not disabled in the CRAB configuration file, CRAB will automatically publish the task output dataset in DBS. The publication timing logic is as follows: the first files available (whatever is the number) are published immediately; then ASO waits for 100 files (per user) or until the task is in state COMPLETED (i.e. all jobs finished successfully, and therefore all files are published) or a maximum of 8 hours (in which case only the successfully finished jobs are published). One can check the publication status using the crab status command. For our task, we run:

crab status crab_projects/crab_CMSDAS_MC_generation_test0

The publication state idle means that the publication request for the corresponding job output has not yet been processed by ASO. Once ASO starts to process the request, the publication state becomes running, and finally it becomes either finished if the publication succeeded or failed if it didn't.

 

Advanced topics:

Making sure your jobs run only at Purdue and nowhere else

If you want your jobs to run only at Purdue, and nowhere else (e.g. because your input dataset is only available here, or because you have higher priority on the local queues), you have to add this to the "Site" and "Debug" sections of your CRAB configuration file:


config.section_("Site")
config.Site.blacklist = ['T2_US_Caltech','T2_US_Florida','T2_US_MIT','T2_US_Nebraska','T2_US_Vanderbilt','T2_US_Wisconsin']
config.Site.whitelist = ['T2_US_Purdue']
config.Site.storageSite = 'T2_US_Purdue'
# this is needed in order to prevent jobs overflowing to blacklisted sites
config.section_("Debug")
config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']

Running a user script with CRAB

It is possible, although not trivial, to run an arbitrary user script with CRAB - please see this section of the CRAB documentation.

 

More information

User is advised to read information available on CRAB3 tutorial twiki.

Additional CRAB and CMSSW tutorials are available as part of the training materials for the regular CMS Data Analysis Schools (CMSDAS). In particular, the "Pre-Exercises/instructions" sections of each CMSDAS cover many aspects of the process of setting up the CMSSW environment, and using CRAB to submit jobs (specifically covered in the Third Set of Pre-Exercises).

CPU Utilization

Raw Storage Use

Raw Storage Use