Purdue CMS Tier-2 Center
Home » User Information » CMSSW Job Tutorials » Commands to Operate HDFS Data

Commands to Operate HDFS Data

Contents


Introduction

The Hadoop Distributed Filesystem (HDFS) can be browsed using Unix filesystem commands (cd, ls) from the User Interface (UI). HDFS is mounted under /mnt/hadoop local filesystem.
cd /mnt/hadoop/store

Copying data from HDFS

Datasets from HDFS can be copied to personal disk areas using the cp command from the UI. Wildcards are valid for copying multiple files.
cp /mnt/hadoop/store/path/to/*.root <destination folder>

Accessing data via xrootd

Data in HDFS can be read from 'xrootd.rcac.purdue.edu' with a valid grid certificate.
xrdcp root://xrootd.rcac.purdue.edu//store/path/to/file.root  /path/to/my.root 

Accessing data inside ROOT

Use the following methods to load files in ROOT.

Open a single file in ROOT

root [0] TFile* f = TFile::Open("root://xrootd.rcac.purdue.edu//store/group/ewk/DY/May10ReReco/ntuple_skim_9_6_ZgF.root");

Open a multiple file in ROOT

Create text file in the directory you are going to start ROOT, call it "xrootdfiles.txt". Populate xrootdfiles.txt with the files you would like to open.
root://xrootd.rcac.purdue.edu//store/path/to/file1.root
root://xrootd.rcac.purdue.edu//store/path/to/file2.root
.
.
.
root://xrootd.rcac.purdue.edu//store/path/to/fileX.root
Add the files to a TFileCollection object in ROOT:
root [0] TFileCollection* c1 = new TFileCollection("data","data");
root [1] c1->AddFromFile("xrootdfiles.txt");
root [2] c1->Print("L");
TFileCollection data - data contains: 3 files with a size of 0 bytes, 
0.0 % staged - default tree name: '(null)'
The following command can be an easy way to load a full directory of root files from cms into your text file. Just replace the directory name with the directory you want to load.
find /mnt/hadoop/store/path/to/directory -name "*.root" | \ 
sed 's/^.\{11\}//g' | awk '{OFS=""} {print \
"root://xrootd.rcac.purdue.edu/", $1}' > ~/xrootdfiles.txt

Accessing data using gfal2 tools

Data at the Purdue Tier-2 can be accessed from 'cms-gridftp.rcac.purdue.edu' with a valid grid certificate.

Creating a directory in HDFS

In the following commands <username> stands for your CERN lxplus ID.

gfal-mkdir gsiftp://cms-gridftp.rcac.purdue.edu/store/user/<username>/<directoryname>

Delete a directory and all its contents in HDFS (be careful!)

gfal-rm -r gsiftp://cms-gridftp.rcac.purdue.edu/store/user/<username>/<directoryname>

Copy data from local to HDFS

gfal-copy file:////tmp/test.root gsiftp://cms-gridftp.rcac.purdue.edu/store/user/<username>/test.root

Delete data from HDFS (only applies to user data)

gfal-rm gsiftp://cms-gridftp.rcac.purdue.edu/store/user/<username>/test.root

List HDFS data directory

gfal-ls gsiftp://cms-gridftp.rcac.purdue.edu/store/user/<username>

Renaming a folder in HDFS

gfal-rename gsiftp://cms-gridftp.rcac.purdue.edu/store/user/<username>/folder1 gsiftp://cms-gridftp.rcac.purdue.edu/store/user/<username>/folder2

CPU Utilization

Raw Storage Use