Contents: Introduction First login on cms.rcac.purdue.edu. ssh -l username cms.rcac.purdue.edu or ssh -l username steele.rcac.purdue.edu Setup local Environment and prepare user analysis code UI initialization In order to submit jobs to the Grid, you must have access to OSG User Interface. It will allow you to access OSG-affiliated resoures in a fully transparent way. source /opt/osg/setup.sh When you want to use CRAB SERVER, please set up a LCG User Interface instead. It will allow you to access WLCG-affiliated resources in a fully transparent way.
source /grp/cms/tools/glite/setup.sh Proxy setup Before you can use CRAB submitting job, you need to have a grid certificate. If you don't have it, please refer here to get it.voms-proxy-init -voms cms export X509_USER_PROXY=$(voms-proxy-info -path) CMS software initialization At Purdue, the setup script has to be sourced by: source /apps/02/cmssoft/cms/cmsset_default.sh Prepare user analysis code Install CMSSW project in a directory of at the specific user space: scramv1 project CMSSW CMSSW_2_1_9 cd CMSSW_2_1_9/src cmsenv CRAB setup
At Purdue, users may access CRAB at: /grp/cms/crab/CRAB To know the latest release check CRAB web page or proper HyperNews forum. Setup on cms.rcac.purdue.edu: In order to setup and use CRAB from any directory, source the script crab.(c)sh located in /grp/cms/crab/, which always points to the latest version of CRAB. After the source of the script it's possible to use CRAB from any directory (typically use it from your CMSSW working directory). For this tutorial, we will have to use the newest release of CRAB to benefit from latest features. source /grp/cms/crab/crab.sh Data selection To select data you want to access, use the DBS web page where available datasets are listed DBS Data Discovery or Purdue site data. For this tutorial we'll use : /DiPion_E300_Eta5/Summer08_IDEAL_V9_v1/GEN-SIM-RAW Keyword search for: - find dataset where dataset like *DiPion_E300*
CRAB configuration Modify the CRAB configuration file crab.cfg according to your needs: a fully documented template is available at $CRABDIR/python/crab.cfg . For guidance, see the list and description of configuration parameters. For this tutorial, the only relevant sections of the file are [CRAB], [CMSSW] and [USER] and [EDG]. The configuration file should be located at the same location as the CMSSW parameter-set to be used by CRAB. Save the crab configuration file: crab.cfg with the following content: [CRAB] jobtype = cmssw scheduler = condor_g [CMSSW] datasetpath = /DiPion_E300_Eta5/Summer08_IDEAL_V9_v1/ GEN-SIM-RAW pset = read_write_root.py total_number_of_events = 100 number_of_jobs = 1 output_file = dummy.root [USER] return_data = 1
[EDG] rb =CERN se_white_list = dcache.rcac.purdue.edu ce_white_list = osg.rcac.purdue.edu Download crab.cfg , read_write_root.py or simulation.py here.
Run Crab Once your crab.cfg is ready and the whole underlying environment is set up, you can start to run CRAB. CRAB supports a command line help which can be useful for the first time. You can get it via: crab -h in particular there is a HOW TO RUN CRAB FOR THE IMPATIENT USER section where the base command are reported. Job Creation The job creation checks the availability of the selected dataset and prepares all the jobs for submission according to the selected job splitting specifyed on the crab.cfg. The creation process creates a CRAB project directory (default: crab_0__) in the current working directory, where the related crab configuration file is cached for further usage, avoiding interference with other (already created) projects.
CRAB also allows the user to chose a project name, so that it can be used later to distinguish multiple CRAB projects in the same directory. crab -create Job Submission With the submission command it's possible to specify a combination of jobs and job-ranges separated by comma (e.g.: =1,2,3-4), the default is all. To submit all jobs of the last created project with the default name, it's enough to execute the following command: crab -submit to submit a specific project: crab -submit -c <dir name>
Job Status Check Check the status of the jobs in the latest CRAB project with the following command: crab -status for check a specific project: crab -status -c <dir name>
Job Output Retrieval For the jobs which are in status done it's possible to retrieve their output back to the UI. The following command retrieves the output of all jobs with status done of the last created CRAB project: crab -getoutput all to get the output of a specific project: crab -getoutput all -c <dir name> it can be repeated as long as there are jobs in status done. Job Aborted Retrieval
For the jobs which are in status aborted it's impossible to retrieve their output back to the UI. The following command retrieves the error information of all jobs: crab -postMortem all -c <dir name> Final plot All 10 jobs produce a histogram output file which can be combined using ROOT in the res directory: hadd dummy.root dummy_*.root CRAB with writing out ROOT files Prepare dCache area at Purdue for storage element interaction For CRAB to be able to write into your dCache user directory: /store/user/<username> This directory is owner writeable only and all users readable.
we have to make sure you have the proper cms role:/cms/Role=cmsuser or /cms/us/Role=cmsususer Then our system admin at
This email address is being protected from spam bots, you need Javascript enabled to view it
can create a <username> directory at /store/user for you. We can test the <username> directory by srmcp a small file to it: srmcp -2 file:////tmp/test.txt srm://dcache.rcac.purdue.edu :8443/srm/managerv2?SFN=/store/user/<username>/test.txt You can use srmmkdir command , to create a sub directory <userdir> at /store/user/<username> directory. Then we can specify this destination directory at our crab.cfg: storage_path = /srm/managerv2?SFN=/store/ lfn = /user/<username>/<userdir> replacing <username> with your username. Prepare new crab.cfg Now the cmssw parameter-set produces an output file (output.root) which the user can include into the output file card in the new cra.cfg and can ask CRAB to copy it in the Purdue Storage Element (dCache). Please modify the crab.cfg as in the following example: [CRAB] jobtype = cmssw scheduler = condor_g [CMSSW] datasetpath = /DiPion_E300_Eta5/Summer08_IDEAL_V9_v1/GEN-SIM-RAW pset = read_write_root.py total_number_of_events = 100 number_of_jobs = 10 output_file = dummy.root [USER] return_data = 0 copy_data = 1 storage_element = dcache.rcac.purdue.edu storage_path = /srm/managerv2?SFN=/store/ lfn = /user/<username>/<userdir> [EDG] se_white_list = dcache.rcac.purdue.edu ce_white_list = osg.rcac.purdue.edu rb = CERN
replacing <username> with your username. Download the above crab_write_dcache.cfg here. Prepare new crab.cfg to access local DBS data Now the new cmssw parameter-set can access different DBS. For local DBS data at Purdue, please refer here . Please modify the crab.cfg as in the following example: [CRAB] jobtype = cmssw scheduler = condor_g [CMSSW] datasetpath = /CJets50_120-step1/CJets50_120-CMSSW_2_0_6/GEN-SIM-RAW dbs_url = http://cmsdbs.rcac.purdue.edu:8090/DBS/servlet /DBSServlet pset = read_write_root.py total_number_of_events = 100 number_of_jobs = 10 output_file = dummy.root [USER] return_data = 0 copy_data = 1 storage_element = dcache.rcac.purdue.edu storage_path = /srm/managerv2?SFN=/store/ lfn = /user/<username>/<userdir> [EDG] se_white_list = dcache.rcac.purdue.edu ce_white_list = osg.rcac.purdue.edu rb = CERN
replacing <username> with your username. Download the above crab_local_dbs.cfg here. Using the CRAB serverBefore using the CRAB server, we need to set up a "glite" environment. Except to set up CMSSW environment as usual, we need: source /grp/cms/tools/glite/setup.sh
Then we can set up voms proxy and crab environment. We can use the CRABSERVER mode by adding to the [CRAB] section of the crab.cfg [CRAB] scheduler =glite server_name =purdue
We also want an email when our job is done so we don't have to keep checking the status. Put these two lines to the [USER] section: [USER] thresholdLevel = 100 eMail =
This email address is being protected from spam bots, you need Javascript enabled to view it
You can replace 100 by a "percent done" to get an email earlier. Then we can repeat creation, submission, status check and getoutput steps described above.
Download crab_server.cfg here. Srmcp output back or look at files by xrootd command
To get data from dcache, we can srmcp back to your local machine: srmcp -2 srm://dcache.rcac.purdue.edu :8443/srm/managerv2?SFN=/store/user/<username>/<userdir>/test.txt file:////tmp/test.txt Local users can look at a root file at a root session by using xrootd command: .x roottest.C roottest.C { gInterpreter.AddIncludePath("/apps/02/cmssoft/cms/slc4_ia32_gcc345/cms/ cmssw/CMSSW_2_1_9/src/"); gSystem->Load("libFWCoreFWLite"); AutoLibraryLoader::enable(); TFile *f = new TXNetFile ("root://dcache-00.rcac.purdue.edu/pnfs /rcac.purdue.edu/data/store/mc/Summer08/DiPion_E300_Eta5 /GEN-SIM-RAW/IDEAL_V9_v1/0027 /BCF0E000-B87F-DD11-8E24-001EC9AAA021.root","READ"); TTree* tree = (TTree*)f->Get("Events"); cout<<" Events:"<tree->GetEntries()<<endl; f->Close(); }
|