| Summary: | getting interactive shell with srun does not put you on allocated node with tcsh | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Raghu Reddy <Raghu.Reddy> |
| Component: | User Commands | Assignee: | Jason Booth <jbooth> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 18.08.1 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | NOAA | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | NESCC | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | selene | CLE Version: | |
| Version Fixed: | not a bug | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Raghu Reddy
2018-10-23 13:48:54 MDT
Hi Raghu Reddy,
In your comment you mention salloc but I only see two sruns.
Note that you can specify tcsh on a salloc but that only runs tcsh on the node from which it was executed. You can modify this behavior with SallocDefaultCommand.
SallocDefaultCommand
Normally, salloc(1) will run the user's default shell when a command to execute is not specified on the salloc command line. If SallocDefaultCommand is specified, salloc will instead run the configured command. The command is passed to '/bin/sh -c', so shell metacharacters are allowed, and commands with multiple arguments should be quoted. For instance:
SallocDefaultCommand = "$SHELL"
would run the shell in the user's $SHELL environment variable. and
SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --mpi=none $SHELL"
For example, the following entry will run srun with tcsh when I run salloc.
SallocDefaultcommand="srun -N1 -N1 --pty --preserve-env tcsh"
-Jason
Hi Jason, Sorry about confusing the issue by mentioning both srun and salloc. Briefly the problem is: srun --pty -A nesccmgmt --ntasks=1 --qos=debug bash (works as expected) srun --pty -A nesccmgmt --ntasks=1 --qos=debug tcsh (works differently) The only difference was the last argument which is the shell. When bash is used the user lands on a compute node that has been allocated. When tcsh is used the user is still on the submit host and not on an allocated node. Sorry about the mix up. Hi Raghu Reddy, Thank you for the additional details. While in the second allocation with srun would you echo out the hostname and verify that you are indeed still on sfe01? Perhaps there is some environment setting which is causing this strange behavior. I ran a few tests here but was unable to duplicate the same behavior. Please also check your environment to make sure you are not already in an allocation. I would also like to see the output of 'env' before and after the srun and your slurm.conf. -Jason Hi Jason, You are correct, this is was something because of my environment. Please feel free to close this ticket. The problem was this: With Slurm, the submit host env is propagated by default whereas that was not the case with our previous queuing system. In my .cshrc, I don't change the prompt if it is already set, and I was mistakenly thinking I was on the submit host when I was not. Both bash and tcsh behave the same, so please feel free to close this ticket. Thanks! Resolving as infogiven |