What is the AI.Panther cluster?

Body

The AI.Panther HPC Cluster at Florida Tech is a Aspen Systems cluster, comprised of 16 compute nodes (Total of 768 processor cores and 6,144 GB RAM), 12 GPU nodes, 3 storage nodes, 1 login node, 1 head node, and a DDN ExaScaler storage system. Eight of the GPU nodes contain 4 NVIDIA A100 GPUs each, and four of the GPU nodes contain 8 NVIDIA H200 GPUs each. The AI.Panther cluster was initially funded by the National Science Foundation major Research Implementation grant and expanded through a NIST ASCEND grant.

A HPC Cluster is a set of computers working in parallel performing similar to a supercomputer for a fraction of the price. The HPC Cluster is made up of a cluster of nodes connected by a high-speed network that perform intense computing tasks.

To learn about High Performance Computing or Supercomputers, see Wikipedia.

Hardware

Head / Login Nodes

  • 4 x Intel Xeon Cascade Lake Silver 4210R, 2.4 GHz 10-Core CPUs

  • 12 x 8 = 96GB RAM

  • 4 x 960GB Enterprise SSD 

 A100 PCIe GPU Nodes

  • 4 x AMD EPYC 7402P Rome, 2.8 GHz 24-Core CPUs

  • 32 x 32 = 1024GB RAM

  • 4 x 960GB Enterprise SSD

  • 16 x NVIDIA Tesla Ampere A100 40GB Memory, PCIe

  • 24 x A100 NVLink Bridge

A100 SXM4 GPU Nodes

  • 8 x AMD EPYC 7402 Rome, 2.8 GHz 24-Core CPUs

  • 64 x 32 = 2048GB RAM

  • 4 x 960GB Enterprise SSD

  • 16 x NVIDIA Tesla Ampere A100 40GB Memory, 4 baseboards with 4 A100s with NVLINK

H200 SXM GPU Nodes 

  • 4 x Dell PowerEdge XE9680 GPU compute nodes

  • 2 x Intel Xeon Platinum 8562Y+, 32-Core CPUs

  • 32 x 96 = 3072GB RAM

  • 1 x  Dell Boss-N1 NVMe (960 GB usable, OS device)

  • 4 x Dell NVMe ISE PS1010 RI U.2 3.84 TB (node-local NVMe)

  • 32 x NVIDIA H200 141GB Memory, 4 nodes with 8 H200s with NVLINK

High Memory Compute Nodes

  • 32 x Intel Xeon Cascade Lake Gold 6240R, 2.4GHz, 24-Core CPUs

  • 192 x 32 = 6144GB RAM

  • 16 x 960GB Enterprise SSD

ZFS User/Home Fileserver (78TB after overhead) - /home1

  • 2 x Intel Xeon Cascade Lake Silver 4215R, 3.20GHz, 8-Core CPUs

  • 12 x 16 = 192GB RAM

  • 2 x 240GB Enterprise SSD 

  • 8 x 14TB SAS HDD configured as RAIDZ2

  • 2 x P4800X 375GB Optane SSDs 

ZFS Archive Fileserver (420TB after overhead) - /archive

  • 2 x Intel Xeon Cascade Lake Silver 4215R, 3.20GHz, 8-Core CPUs 

  • 12 x 32 = 384GB RAM

  • 2 x 240GB Enterprise SSD 

  • 36 x 16TB SATA HDD configured as four RAIDZ2 arrays

  • 2 x P4800X 375GB Optane SSDs 

DDN AI /HPC Storage System (Project/Shared Scratch) - /shared

  • DDN AI400X2 All Flash Appliance Bundle

  • 24 x NVMe slots with 2x SE2420 enclosures (48 x NVMe slots)

  • 8 x NDR200 / 200GbE QSFP112 ports

  • ~1 PiB usable capacity

Network Switches

  • One NVIDIA NDR Infiniband switch (MQM9700-NS2R) for H200 GPU Nodes and DDN Storage

  • One Mellanox/NVIDIA HDR IB switch for CPU and A100 GPU Nodes

  • One managed 1GbE switch

  • One 25Gb (18 port)/100Gb (4 port) Ethernet switch 

  • Nodes have 25Gb Ethernet connection to the 25GbE switch (Dual Port 1GbE with IPMI, HDR100 IB Connection). 100GbE links from 25GbE switch are available for future expansion

Details

Details

Article ID: 2831
Created
Tue 1/31/23 8:55 AM
Modified
Mon 1/12/26 11:46 AM