What is the AI.Panther cluster?

Body

The AI.Panther HPC Cluster at Florida Tech is a Aspen Systems cluster, comprised of 16 compute nodes (Total of 768 processor cores and 6,144 GB RAM), 8 GPU nodes with each node containing 4 Nvidia A100 GPUs , 3 storage nodes, 1 login node, and 1 head node. The AI.Panther cluster was funded by the National Science Foundation major Research Implementation grant.

A HPC Cluster is a set of computers working in parallel performing similar to a supercomputer for a fraction of the price. The HPC Cluster is made up of a cluster of nodes connected by a high-speed network that perform intense computing tasks.

To learn about High Performance Computing or Supercomputers, see Wikipedia.

Hardware

Head / Login Nodes

  • 4 x Intel Xeon Cascade Lake Silver 4210R, 2.4 GHz 10-Core CPUs
  • 12 x 8 = 96GB RAM
  • 4 x 960GB Enterprise SSD 

 

A100 PCIe GPU Nodes

  • 4 x AMD EPYC 7402P Rome, 2.8 GHz 24-Core CPUs
  • 32 x 32 = 1024GB RAM
  • 4 x 960GB Enterprise SSD
  • 16 x NVIDIA Tesla Ampere A100 40GB Memory, PCIe
  • 24 x A100 NVLink Bridge

 

A100 SXM4 GPU Nodes

  • 8 x AMD EPYC 7402 Rome, 2.8 GHz 24-Core CPUs
  • 64 x 32 = 2048GB RAM
  • 4 x 960GB Enterprise SSD
  • 16 x NVIDIA Tesla Ampere A100 40GB Memory, 4 baseboards with 4 A100s with NVLINK

 

High Memory Compute Nodes

  • 32 x Intel Xeon Cascade Lake Gold 6240R, 2.4GHz, 24-Core CPUs
  • 192 x 32 = 6144GB RAM
  • 16 x 960GB Enterprise SSD

 

ZFS User/Home Fileserver (78TB after overhead)

  • 2 x Intel Xeon Cascade Lake Silver 4215R, 3.20GHz, 8-Core CPUs
  • 12 x 16 = 192GB RAM
  • 2 x 240GB Enterprise SSD 
  • 8 x 14TB SAS HDD configured as RAIDZ2
  • 2 x P4800X 375GB Optane SSDs

 

ZFS Archive Fileserver (420TB after overhead)

  • 2 x Intel Xeon Cascade Lake Silver 4215R, 3.20GHz, 8-Core CPUs 
  • 12 x 32 = 384GB RAM
  • 2 x 240GB Enterprise SSD 
  • 36 x 16TB SATA HDD configured as four RAIDZ2 arrays
  • 2 x P4800X 375GB Optane SSDs 

 

Network Switches

  • One externally managed HDR IB switch 
  • One managed 1GbE switch
  • One 25Gb (18 port)/100Gb (4 port) Ethernet switch 
  • Nodes have 25Gb Ethernet connection to the 25GbE switch (Dual Port 1GbE with IPMI, HDR100 IB Connection). 100GbE links from 25GbE switch are available for future expansion.

Details

Details

Article ID: 2831
Created
Tue 1/31/23 8:55 AM
Modified
Wed 11/15/23 1:12 PM