The AI.Panther HPC Cluster at Florida Tech is a Aspen Systems cluster, comprised of 16 compute nodes (Total of 768 processor cores and 6,144 GB RAM), 12 GPU nodes, 3 storage nodes, 1 login node, 1 head node, and a DDN ExaScaler storage system. Eight of the GPU nodes contain 4 NVIDIA A100 GPUs each, and four of the GPU nodes contain 8 NVIDIA H200 GPUs each. The AI.Panther cluster was initially funded by the National Science Foundation major Research Implementation grant and expanded through a NIST ASCEND grant.
A HPC Cluster is a set of computers working in parallel performing similar to a supercomputer for a fraction of the price. The HPC Cluster is made up of a cluster of nodes connected by a high-speed network that perform intense computing tasks.
To learn about High Performance Computing or Supercomputers, see Wikipedia.
Hardware
Head / Login Nodes
A100 PCIe GPU Nodes
-
4 x AMD EPYC 7402P Rome, 2.8 GHz 24-Core CPUs
-
32 x 32 = 1024GB RAM
-
4 x 960GB Enterprise SSD
-
16 x NVIDIA Tesla Ampere A100 40GB Memory, PCIe
-
24 x A100 NVLink Bridge
A100 SXM4 GPU Nodes
-
8 x AMD EPYC 7402 Rome, 2.8 GHz 24-Core CPUs
-
64 x 32 = 2048GB RAM
-
4 x 960GB Enterprise SSD
-
16 x NVIDIA Tesla Ampere A100 40GB Memory, 4 baseboards with 4 A100s with NVLINK
H200 SXM GPU Nodes
-
4 x Dell PowerEdge XE9680 GPU compute nodes
-
2 x Intel Xeon Platinum 8562Y+, 32-Core CPUs
-
32 x 96 = 3072GB RAM
-
1 x Dell Boss-N1 NVMe (960 GB usable, OS device)
-
4 x Dell NVMe ISE PS1010 RI U.2 3.84 TB (node-local NVMe)
-
32 x NVIDIA H200 141GB Memory, 4 nodes with 8 H200s with NVLINK
High Memory Compute Nodes
-
32 x Intel Xeon Cascade Lake Gold 6240R, 2.4GHz, 24-Core CPUs
-
192 x 32 = 6144GB RAM
-
16 x 960GB Enterprise SSD
ZFS User/Home Fileserver (78TB after overhead) - /home1
-
2 x Intel Xeon Cascade Lake Silver 4215R, 3.20GHz, 8-Core CPUs
-
12 x 16 = 192GB RAM
-
2 x 240GB Enterprise SSD
-
8 x 14TB SAS HDD configured as RAIDZ2
-
2 x P4800X 375GB Optane SSDs
ZFS Archive Fileserver (420TB after overhead) - /archive
-
2 x Intel Xeon Cascade Lake Silver 4215R, 3.20GHz, 8-Core CPUs
-
12 x 32 = 384GB RAM
-
2 x 240GB Enterprise SSD
-
36 x 16TB SATA HDD configured as four RAIDZ2 arrays
-
2 x P4800X 375GB Optane SSDs
DDN AI /HPC Storage System (Project/Shared Scratch) - /shared
-
DDN AI400X2 All Flash Appliance Bundle
-
24 x NVMe slots with 2x SE2420 enclosures (48 x NVMe slots)
-
8 x NDR200 / 200GbE QSFP112 ports
-
~1 PiB usable capacity
Network Switches
-
One NVIDIA NDR Infiniband switch (MQM9700-NS2R) for H200 GPU Nodes and DDN Storage
-
One Mellanox/NVIDIA HDR IB switch for CPU and A100 GPU Nodes
-
One managed 1GbE switch
-
One 25Gb (18 port)/100Gb (4 port) Ethernet switch
-
Nodes have 25Gb Ethernet connection to the 25GbE switch (Dual Port 1GbE with IPMI, HDR100 IB Connection). 100GbE links from 25GbE switch are available for future expansion