Body
Symptom
Running apptainer exec on a GPU node produces:
Illegal instruction (core dumped)
This typically occurs when running containers that invoke compiled binaries (e.g., PyTorch, TensorFlow).
Cause
The default module path loads packages compiled for Intel Cascade Lake (cascadelake) CPUs. The GPU nodes use AMD EPYC (Zen 2) processors. Binaries compiled with Cascade Lake-specific instructions (AVX-512, etc.) will crash with "Illegal instruction" on AMD hardware.
Fix
Before loading apptainer (or any other module for use on GPU nodes), switch to the Zen 2 module stack:
module use /usr/local/spack/share/spack/modules/linux-ubuntu22.04-zen2
module load apptainer
You can verify you're loading the correct module with:
module avail apptainer
The loaded module should come from the zen2 path, not cascadelake.
Notes
- This applies to any compiled software loaded via modules on GPU nodes, not just Apptainer.