Commands to Operate HDFS Data
The Hadoop Distributed Filesystem (HDFS) can be browsed using Unix filesystem commands (cd, ls) on 'gluon.rcac.purdue.edu', and 'hep.rcac.purdue.edu' . HDFS is mounted under /mnt/hadoop.
cd /mnt/hadoop/store
Copying data from HDFS
Datasets in HDFS can be copied to personal disk areas using the cp command on 'gluon.rcac.purdue.edu' . Wildcards are valid for copying multiple files.
cp /mnt/hadoop/store/path/to/*.root <destination folder>
Accessing Data via xrootd
Data in HDFS can be read from 'xrootd.rcac.purdue.edu' with a valid grid certificate.
xrdcp root://xrootd.rcac.purdue.edu//store/path/to/file.root
/path/to/my.root
Accessing Data inside ROOT
Use the following methods to load files in ROOT.
Open a single file in ROOT:
root [0] TFile* f = TFile::Open("root://xrootd.rcac.purdue.edu//store
/group/ewk/DY/May10ReReco/ntuple_skim_9_6_ZgF.root");
Open multiple files in ROOT:Create text file in the directory you are going to start ROOT, call it "xrootdfiles.txt". Populate xrootdfiles.txt with the files you would like to open.
root://xrootd.rcac.purdue.edu//store/path/to/file1.root
root://xrootd.rcac.purdue.edu//store/path/to/file2.root
.
.
.
root://xrootd.rcac.purdue.edu//store/path/to/fileX.root
Add the files to a TFileCollection object in ROOT:
root [0] TFileCollection* c1 = new TFileCollection("data","data");
root [1] c1->AddFromFile("xrootdfiles.txt");
root [2] c1->Print("L");
TFileCollection data - data contains: 3 files with a size of 0 bytes,
0.0 % staged - default tree name: '(null)'
The the following command can be an easy way to load a full directory of root files from gluon into your text file. Just replace the directory name with the directory you want to load.
find /mnt/hadoop/store/path/to/directory -name "*.root" | \
sed 's/^.\{11\}//g' | \
awk '{OFS=""} {print "root://xrootd.rcac.purdue.edu/", $1}' > \
~/xrootdfiles.txt
Accessing Data via SRM
Data at Purdue can be accessed from 'srm-dcache.rcac.purdue.edu' with a valid grid certificate.
Create a directory in HDFS :
srmmkdir -2 'srm://srm-dcache.rcac.purdue.edu:8443/srm/v2/server?SFN=/mnt
/hadoop/store/user/<username>/<directoryname>'
Delete an empty directory in HDFS:
srmrmdir -2 'srm://srm-dcache.rcac.purdue.edu:8443/srm/v2/server?SFN=/mnt
/hadoop/store/user/<username>/<directoryname>'
Copy data from local to HDFS:
srmcp -2 file:////tmp/test.root 'srm://srm-dcache.rcac.purdue.edu:8443Delete data from HDFS (only applies to user data):
/srm/v2/server?SFN=/mnt/hadoop/store/user/<username>/test.root'
srmrm -2 'srm://srm-dcache.rcac.purdue.edu:8443/srm/v2/server?SFN=/mnt
/hadoop/store/user/<username>/test.root'
List HDFS data directory:
srmls -2 'srm://srm-dcache.rcac.purdue.edu:8443/srm/v2/server?SFN=/mnt
/hadoop/store/user/<username>'
Copy data from CMS SRM at CERN to HDFS:
srmcp -srm_protocol_version=2 -debug=true 'srm://srm-cms.cern.ch:8443
/srm/managerv2?SFN=/castor/cern.ch/user/h/hdyoo/DYstudy/redigi_22x
/DYM6to40_GEN_SIM_RECO/PYTHIA6_DYmumu_M6_40_filter_10TeV
_cff_redigi_RECO_1.root' 'srm://srm-dcache.rcac.purdue.edu:8443
/srm/v2/server?SFN=/mnt/hadoop/store/user/hdyoo/DYmumuM6to40_22x_redigi
/PYTHIA6_DYmumu_M6_40_filter_10TeV_cff_redigi_RECO_1.root'





