Purdue CMS
CRAB at Purdue Tutorial Print
Contents:

Introduction


First login on cms.rcac.purdue.edu.

ssh -l username cms.rcac.purdue.edu 

or

ssh -l username steele.rcac.purdue.edu 

 

Setup local Environment and prepare user analysis code


UI initialization

In order to submit jobs to the Grid, you must have access to OSG User Interface.
It will allow you to access OSG-affiliated resoures in a fully transparent way.

source /opt/osg/setup.sh 
When you want to use CRAB SERVER, please set up a LCG User Interface instead. It will allow you to access WLCG-affiliated resources in a fully transparent way.
source /grp/cms/tools/glite/setup.sh 

Proxy setup 

Before you can use CRAB submitting job, you need to have a grid certificate.
If you don't have it, please refer here to get it.
voms-proxy-init -voms cms
export X509_USER_PROXY=$(voms-proxy-info -path)

 

CMS software initialization

At Purdue, the setup script has to be sourced by:

source /apps/02/cmssoft/cms/cmsset_default.sh 

Prepare user analysis code 

Install CMSSW project in a directory of at the specific user space
scramv1 project CMSSW CMSSW_2_1_9 
cd CMSSW_2_1_9/src
cmsenv 

 


CRAB setup

At Purdue, users may access CRAB at:
/grp/cms/crab/CRAB 

To know the latest release check CRAB web page or proper HyperNews forum.

 

Setup on cms.rcac.purdue.edu:

In order to setup and use CRAB from any directory, source the script crab.(c)sh located in
/grp/cms/crab/, which always points to the latest version of CRAB. After the source of the
script it's possible to use CRAB from any directory (typically use it from your CMSSW working
directory).

For this tutorial, we will have to use the newest release of CRAB to benefit from latest features.

 
source /grp/cms/crab/crab.sh 

 

 

Data selection

To select data you want to access, use the DBS web page where available datasets are listed
DBS Data Discovery or Purdue site data. For this tutorial we'll use :
/DiPion_E300_Eta5/Summer08_IDEAL_V9_v1/GEN-SIM-RAW
Keyword search for:
  • find dataset where dataset like *DiPion_E300*

 

CRAB configuration

Modify the CRAB configuration file crab.cfg according to your needs: a fully documented template is available at $CRABDIR/python/crab.cfg . For guidance, see the list and description of
configuration parameters
. For this tutorial, the only relevant sections of the file are [CRAB], [CMSSW] and [USER] and [EDG]. The configuration file should be located at the same location as the CMSSW parameter-set to be used by CRAB. Save the crab configuration file:
crab.cfg 
with the following content:
[CRAB]
jobtype = cmssw
scheduler = condor_g
[CMSSW]
datasetpath = /DiPion_E300_Eta5/Summer08_IDEAL_V9_v1/
GEN-SIM-RAW
pset = read_write_root.py
total_number_of_events = 100
number_of_jobs = 1
output_file = dummy.root
[USER]
return_data = 1

[EDG]
rb =CERN
se_white_list = dcache.rcac.purdue.edu
ce_white_list = osg.rcac.purdue.edu

 Download crab.cfg , read_write_root.py or simulation.py here.

Run Crab

Once your crab.cfg is ready and the whole underlying environment is set up, you can start to run CRAB. CRAB supports a command line help which can be useful for the first time. You can get it via:
crab -h 
in particular there is a HOW TO RUN CRAB FOR THE IMPATIENT USER section where the base command are reported.

 

Job Creation

The job creation checks the availability of the selected dataset and prepares all the jobs for
submission according to the selected job splitting specifyed on the crab.cfg.


The creation process creates a CRAB project directory (default: crab_0__) in the current working
directory, where the related crab configuration file is cached for further usage, avoiding interference
with other (already created) projects.


CRAB also allows the user to chose a project name, so that it can be used later to distinguish
multiple CRAB projects in the same directory.

 

crab -create 

 

Job Submission

 

With the submission command it's possible to specify a combination of jobs and job-ranges
separated by comma (e.g.: =1,2,3-4), the default is all.

 

To submit all jobs of the last created project with the default name, it's enough to execute the
following command:

 

crab -submit  

 

to submit a specific project:

 

 crab -submit -c  <dir name> 

 


Job Status Check

 

Check the status of the jobs in the latest CRAB project with the following command:
crab -status  
for check a specific project:
crab -status -c  <dir name>  

 


Job Output Retrieval

 

For the jobs which are in status done it's possible to retrieve their output back to the UI. The
following command retrieves the output of all jobs with status done of the last created CRAB project:
crab -getoutput all
to get the output of a specific project:
crab -getoutput all -c  <dir name> 

it can be repeated as long as there are jobs in status done.

 

Job Aborted Retrieval

For the jobs which are in status aborted it's impossible to retrieve their output back to the UI. The
following command retrieves the error information of all jobs:
crab -postMortem all -c <dir name>

 

 

Final plot

 All 10 jobs produce a histogram output file which can be combined using ROOT in the res directory:

 

hadd dummy.root dummy_*.root  

 

 

CRAB with writing out ROOT files

 

 

Prepare dCache area at Purdue for storage element interaction

 

For CRAB to be able to write into your dCache user directory:

 

/store/user/<username>

This directory is owner writeable only and all users readable.

we have to make sure you have the proper cms role:
/cms/Role=cmsuser or 
/cms/us/Role=cmsususer

Then our system admin at This email address is being protected from spam bots, you need Javascript enabled to view it can create a <username> directory at /store/user for you. 

We can test the <username>  directory by srmcp a small file to it:
srmcp -2 file:////tmp/test.txt srm://dcache.rcac.purdue.edu
:8443/srm/managerv2?SFN=/store/user/<username>/test.txt

 

You can use  srmmkdir command , to create a sub directory <userdir> at /store/user/<username> directory.

Then we can specify this destination directory at our crab.cfg:

 

storage_path = /srm/managerv2?SFN=/store/
lfn = /user/<username>/<userdir> 

 

replacing <username> with your username.

 

Prepare new crab.cfg

 

Now the cmssw parameter-set produces an output file (output.root) which the user can include into the output file card in the new cra.cfg and can ask CRAB to copy it in the Purdue Storage Element (dCache). Please modify the crab.cfg as in the following example:

 

[CRAB] 
jobtype = cmssw
scheduler = condor_g
[CMSSW]
datasetpath = /DiPion_E300_Eta5/Summer08_IDEAL_V9_v1/GEN-SIM-RAW
pset = read_write_root.py
total_number_of_events = 100
number_of_jobs = 10
output_file = dummy.root
[USER]
return_data = 0
copy_data = 1
storage_element = dcache.rcac.purdue.edu
storage_path = /srm/managerv2?SFN=/store/
lfn = /user/<username>
/<userdir>
[EDG]
se_white_list = dcache.rcac.purdue.edu
ce_white_list = osg.rcac.purdue.edu
rb = CERN

 

replacing <username> with your username.

 Download the above crab_write_dcache.cfg here.

  

Prepare new crab.cfg to access local DBS data

 

Now the new cmssw parameter-set can access different DBS. For local DBS data at Purdue, please refer here . Please modify the crab.cfg as in the following example:

 

[CRAB] 
jobtype = cmssw
scheduler = condor_g
[CMSSW]
datasetpath = /CJets50_120-step1/CJets50_120-CMSSW_2_0_6/GEN-SIM-RAW
dbs_url = http://cmsdbs.rcac.purdue.edu:8090/DBS/servlet
/DBSServlet

pset = read_write_root.py
total_number_of_events = 100
number_of_jobs = 10
output_file = dummy.root
[USER]
return_data = 0
copy_data = 1
storage_element = dcache.rcac.purdue.edu
storage_path = /srm/managerv2?SFN=/store/
lfn = /user/<username>/<userdir>
[EDG]
se_white_list = dcache.rcac.purdue.edu
ce_white_list = osg.rcac.purdue.edu
rb = CERN

 

replacing <username> with your username. 

Download the above crab_local_dbs.cfg here. 

 

Using the CRAB server

Before using the CRAB server, we need to set up a "glite" environment. Except to set up CMSSW environment as usual, we need: 

source /grp/cms/tools/glite/setup.sh

Then we can set up voms proxy and crab environment. 

We can use the CRABSERVER mode by adding to the [CRAB] section of the crab.cfg

 

[CRAB] 
scheduler =glite
server_name =purdue

We also want an email when our job is done so we don't have to keep checking the status. Put these two lines to the [USER] section:
[USER] 
thresholdLevel = 100
eMail = This email address is being protected from spam bots, you need Javascript enabled to view it

You can replace 100 by a "percent done" to get an email earlier. 

Then we can repeat creation, submission, status check and getoutput steps described above.

Download crab_server.cfg here. 

 

Srmcp output back or look at files by xrootd command

 

To get data from dcache, we can srmcp back to your local machine:

srmcp -2 srm://dcache.rcac.purdue.edu
:8443/srm/managerv2?SFN=/store/user/<username>/<userdir>/test.txt
file:////tmp/test.txt

Local users can look at a root file at a root session by using xrootd command: .x roottest.C

roottest.C 

{
gInterpreter.AddIncludePath("/apps/02/cmssoft/cms/slc4_ia32_gcc345/cms/
cmssw/CMSSW_2_1_9/src/");
gSystem->Load("libFWCoreFWLite");
AutoLibraryLoader::enable();
TFile *f = new TXNetFile ("root://dcache-00.rcac.purdue.edu/pnfs
/rcac.purdue.edu/data/store/mc/Summer08/DiPion_E300_Eta5
/GEN-SIM-RAW/IDEAL_V9_v1/0027
/BCF0E000-B87F-DD11-8E24-001EC9AAA021.root","READ");
TTree* tree = (TTree*)f->Get("Events");
cout<<" Events:"<tree->GetEntries()<<endl;
f->Close();
}
 
< Prev   Next >

 

 

Utilization
1338 jobs running
Total: 1728 job slots
Jobs running: 1338
Jobs queued: 13
dCache
52% utilization
Total: 443 TB
52% Used, 48% Free

This site, and the work it describes, is primarily funded by a grant from the National Science Foundation (NSF).