If a user submit a job(sbatch) from a directory in which he/she does NOT have write permission, the job obvious fails quickly and no output gets generated. This is obviously a user error. However, there is no clear indication to the user what the problem is. When the administrator is called upon to investigate, there is no simple command to tell that the problem is the job directory permission. The thing I have to do to help user troubleshoot is 1. scontrol show job xxxx or sacct -j xxxx to find out which node it ran on 2. login to that node and search slurmd.log is there a better way? Thanks George
There is currently not a better way unfortunately. I am looking into what it would take to add an error message at submission time.
Thanks. that's would save admins a lot of time!
So I have specced this out a bit. It is an enhancement, so I'm changing the severity level accordingly. It isn't feasible to check for write permissions at submission time, so it is likely that my patch will change the reason for job failure in scontrol show jobs to something about being unable to open a file for IO.
That would be much better than what I have to do now.