Common Slurm Workload Errors and Solutions

Error: slurmstepd: error: Exceeded job memory limit at some point.

The job you ran tried using more memory than what was defined in your submission script. As a result, slurm automatically killed your job. A simple fix is to increase the amount of memory dedicated to your job, using --mem at the command line or #sbatch --mem in your submission script.

Error: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)

You attempted to submit a job to a partition that didn't support your --time option. The solution is to move your job to a partition with a longer execution time (med, long, etc.). By default, jobs are sent to the short queue, which only permits at most 45 minutes. Specify a partition in your submission script, or reduce your --time option.

Error: SSH: Access denied: user [username] (uid=[uid]) has no active jobs.

This error comes up when you attempt to SSH into a node that you're not currently running a job on. Under normal circumstances, you should not run jobs directly on the nodes as this can confuse the scheduler and prevent other users from submitting jobs. If you're unable to use the scheduler to submit your job, and you absolutely need to SSH in (for example, X11 forwarding), see this section of Slurm's FAQ.

Error: SFTP: Received message too long [random number]

This error comes up when you attempt to use SFTP and you have a message printed via a *profile config in your directory (.bash_profile, etc.) To correct the issue, remove the printing message and try again.

Error: sbatch: error: Batch script contains DOS[MAC] line breaks (\r\n)
sbatch: error: instead of expected UNIX line breaks (\n).

Sometimes, if you download a Slurm submission script to a Windows or Mac computer and re-upload to HPC, you may get this error when attempting to submit the script using sbatch. The solution is to run "dos2unix" on the file.

Details

Article ID: 3281
Created
Thu 2/2/23 1:28 PM
Modified
Mon 2/6/23 4:35 PM