On AMD Turin nodes with 768 CPUs, we're seeing the following error: srun --mpi=cray_shasta --nodefile=/tmp/mymachlist.24775.run --ntasks-per-node=768 /usr/diags/mpi/cray/amd/olconft.cray.sles slurmstepd: error: environment variable SLURM_CPU_BIND is too long slurmstepd: error: Unable to set SLURM_CPU_BIND slurmstepd: error: environment variable SLURM_CPU_BIND_LIST is too long slurmstepd: error: Unable to set SLURM_CPU_BIND_LIST slurmstepd: error: environment variable SLURM_CPU_BIND is too long slurmstepd: error: Unable to set SLURM_CPU_BIND slurmstepd: error: environment variable SLURM_CPU_BIND is too long slurmstepd: error: Unable to set SLURM_CPU_BIND slurmstepd: error: environment variable SLURM_CPU_BIND_LIST is too long ...
Hi David, The errors about `SLURM_CPU_BIND` and `SLURM_CPU_BIND_LIST` being too long will not prevent job execution, although those variables will remain unset in the job environment. Can you confirm that your jobs still run as expected? The failure to set these environment variables is due to a kernel limitation (128KB per variable). We are looking at updating the logging around this since it won't cause jobs to fail. Related: ticket 644 Michael