Hardware - Purdue CMS Tier2 Center

Home » Hardware

Overview

CMS Tier-2 center at Purdue consists of dedicated and opportunistic resources on six computing clusters: Hammer, Brown, Bell, Gilbreth, Geddes, and Negishi.

Hammer, Brown, Bell, and Negishi are large community clusters maintained by ITaP on which CMS has dedicated and opportunistic access to job slots. Gilbreth cluster is exclusively dedicated to GPU-enabled workflows. Geddes cluster is currently used for the development of an Analysis Facility at Purdue CMS Tier-2.

Instructions to access the Tier-2 clusters and recommendations about their usage use are given here.

The technical details of the hardware installed at the Tier-2 clusters are listed below.

Resources

Network connections

Wide area networking
- WAN link to Indiana GigaPOP: 100Gb/s dedicated with 100Gb/s backup path
Data Center Core and LAN
- Storage cluster to Data Center Core: 400 Gb/sec
- Community clusters are 100G Infiniband

Computing/Storage nodes

Name	# nds	job slots	Model	Compute	GPU	Storage
cms-j (retiring)	4 (*)	-	Advanced HPC (SuperMicro)	-	-	216TB (*) (36x6TB)
cms-k (retiring)	10	-	KingStar (SuperMicro)	-	-	216TB (36x6TB)
cms-l	3	-	Aspen (SuperMicro)	-	-	288TB (36x8TB)
cms-m	2	-	Aspen (SuperMicro)	-	-	360TB (36x10TB)
cms-n	1	-	KingStar (SuperMicro)	-	-	192TB (12x16TB)
cms-jb00 + eos-c01	1	-	SuperMicro JBOD + SM H12SSW-NT	-	-	840TB (60x14TB)
cms-jb01 + eos-b00	1		WD JBOD + Dell R6515	-	-	1428TB (102x14TB)
cms-jb02 + eos-a00	1		WD JBOD + Dell R6515	-	-	1632TB (102x16TB)
cms-jb03 + eos-a01	1		WD JBOD + Dell R6515	-	-	1632TB (102x16TB)
cms-jb04 + eos-z00(*)	1		WD JBOD + Dell R640	-	-	1428TB (102x14TB)
cms-jb05 + eos-z01(*)	1		WD JBOD + Dell R640	-	-	1428TB (102x14TB)
cms-jb06 + eos-c00	1		WD JBOD + SM H12SSW-NT			1428TB (102x14TB)
cms-jb07 + eos-d00 (HW '24)	1		WD JBOD + ASUS (?)			2040TB (102x20TB)
cms-jb08 + eos-d01 (HW '24)	1		WD JBOD + ASUS (?)			2040TB (102x20TB)
cms-jb09 + eos-d02 (HW '24)	1		WD JBOD + ASUS (?)			2040TB (102x20TB)
dtn00,01	2		ASUS
dtn02-07	6		SM H12SSW-NT	Data Transfer Nodes (dtn06 is currently used as XCache server)
ps-eos	1		ASUS K14PA-U24	PerfSONAR server (close to storage)
hammer-d	18	864 (18x48)	DELL PowerEdge R640	48-core (HT-on) Xeon Gold 6126 @ 2.60GHz, 192GB RAM	-	-
hammer-e	15	720 (15x48)	Supermicro SYS-6019P-WTR	48-core (HT-on) Xeon Gold 6126 @ 2.60GHz, 96GB RAM	-	-
hammer-f,g	22 (16 + 6)	5632 (22x256)	DELL PowerEdge C6525	256-core (HT-on) AMD EPYC 7702 @ 2GHz, 512GB RAM	1x nVidia T4	-
bell	9.5 (**)(#)	1216 (9.5x128)	DELL PowerEdge C6525	128-core (HT-off) AMD EPYC 7662 @ 2GHz, 256GB RAM	-	-
gilbreth	2 (**)	-	DELL PowerEdge R740	40-core Xeon Gold 5218R CPU @ 2.10GHz, 192GB RAM	2x nVidia V100	-
geddes CPU	3 (**)	-	DELL PowerEdge C6525	128-core (HT-off) AMD EPYC 7662 @ 2.0GHz, 512GB RAM	-	8TB SSD
geddes GPU	3 (**)	-	DELL PowerEdge R7525	128-core (HT-off) AMD EPYC 7662 @ 2.0GHz, 512 GB RAM	2x nVidia A100	8TB SSD
Additional Geddes Storage						4TB SSD
negishi-c	16(**)	4096 (16x256)	DELL PowerEdge C6525	256-core (HT-on) AMD EPYC 7763 @ 2.2GHz, 512GB RAM	-	-
negishi-a	8.5(**)	1088 (8.5x128)	DELL PowerEdge C6525	128-core (HT-on) AMD EPYC 7763 @ 2.2GHz, 512GB RAM	-	-
gautschi	1(**)	192 (1x192)	DELL PowerEdge R7625	196-core (HT-off) AMD EPYC 9654 @ 2.4GHz, 512GB RAM	-	-

(*) On out-of-warranty systems, the number of currently active nodes or disks can be lower than listed in the table, due to attrition.
(**) Nodes purchased through the Community Clusters Program. Hardware not owned by CMS.

(#) Bell compute nodes are purchased in units of '1/2 hardware node' (1x64-core CPU, 128GB RAM), therefore the fractional node number.

Servers:

User interface
- cms-fe00/01: (cms.rcac.purdue.edu)
  * 2x DELL R420 32-Core, 96GB RAM.
  services: log in, grid UI, Condor + SLURM batch, afs client
  Serve both the CMS and Hammer clusters.
Gatekeepers/Schedulers
- Purdue-Hammer: (hammer-osg.rcac.purdue.edu) 4-cores, 8 GB RAM VM, services: HTCondor-CE
- Purdue-Bell: (bell-osg.rcac.purdue.edu) 4-cores, 8 GB RAM VM, services: HTCondor-CE
- Purdue-Negishi (osg.negishi.rcac.purdue.edu) 4-cores, 8 GB RAM VM, services: HTCondor-CE
Analysis Facility
- 3 nodes (paf-a0[0-2].cms.rcac.purdue.edu): 2x AMD EPYC 7662 64-core CPUs, 512GB RAM, 3x T4 GPUs
- 2 nodes (paf-b0[0-1].cms.rcac.purdue.edu); 2x AMD EPYC 7702 64-core CPUs, 512GB RAM, T4
Miscellaneous
- XRooTD: (xrootd.rcac.purdue.edu) 12-cores, AMD Opteron(tm) Processor 4280, 16 GB RAM, service: xrootd
- Squid: Running 6 instances of squid servers in total. Each squid servers runs three instances of squid.
  - squid1.rcac.purdue.edu: 8-cores, Intel(R) Xeon(R) CPU E31240 @ 2.1 GHz, 8 GB RAM, service: squid
  - squid2.rcac.purdue.edu: 8-cores, Intel(R) Xeon(R) CPU E31240 @ 2.1 GHz, 8 GB RAM, service: squid
- Perfsonar:
  - Perfsonar latency node: (perfsonar-cms1.itns.purdue.edu) 8-cores, AMD Opteron(tm) Processor 2380 @ 2.4 GHz, 16 GB RAM, service: latency
  - Perfsonar bandwidth node: (perfsonar-cms2.itns.purdue.edu) 8-cores, AMD Opteron(tm) Processor 2380 @ 2.4GHz, 16 GB RAM, service: bandwidth

Storage:

EOS storage: 7.3 PiB usable (14.6 PiB with replication)
- Shared between central CMS data-management (6.3 PiB) and local users and groups (1 PiB)
- Mounted read-only on the Front-End machines and in the Analysis Facility.
- Accessible from everywhare via XRoot
- j,k,l,m,n nodes from the table above, plus all JBODs (j,k-nodes to be retired at the end of 2024)
Depot: 80 TB, shared between local CMS users and groups.
- Shared between local CMS users and groups, and AF users.
- Mounted (read-write) on all nodes (AF and Tier-2 cluster).
Work-space in Analysis Facility (Geddes): 52 TB of fast (SSD) local storage.
- Shared between Analysis Facility users
- Mounted only in the AF (not accessible in the Tier-2 cluster)