Ticket 5906

Summary: getting interactive shell with srun does not put you on allocated node with tcsh
Product: Slurm Reporter: Raghu Reddy <Raghu.Reddy>
Component: User CommandsAssignee: Jason Booth <jbooth>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 18.08.1   
Hardware: Linux   
OS: Linux   
Site: NOAA Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: NESCC NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name: selene
CLE Version: Version Fixed: not a bug
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Raghu Reddy 2018-10-23 13:48:54 MDT
In order to get an interactive shell on a compute node the recommended command is:

sfe01% srun --pty -A nesccmgmt --ntasks=1 --qos=debug bash
[Raghu.Reddy@s0014 ~]$
[Raghu.Reddy@s0014 ~]$ exit
sfe01%

And that works fine with bash, and puts us on a compute node as explained in the training materials (unlike salloc which was mentioned as an alternative, but does not put the user on the allocated compute node).

But that does not appear to be the case if the last argument is tcsh: 
 
sfe01% srun --pty -A nesccmgmt --ntasks=1 --qos=debug tcsh
sfe01% 

sfe01% echo $SLURM_NODELIST
s0014
sfe01%
sfe01% exit
Comment 1 Jason Booth 2018-10-23 14:34:04 MDT
Hi Raghu Reddy,


In your comment you mention salloc but I only see two sruns. 

Note that you can specify tcsh on a salloc but that only runs tcsh on the node from which it was executed. You can modify this behavior with SallocDefaultCommand.

       SallocDefaultCommand
              Normally,  salloc(1) will run the user's default shell when a command to execute is not specified on the salloc command line.  If SallocDefaultCommand is specified, salloc will instead run the configured command. The command is passed to '/bin/sh -c', so shell metacharacters are allowed, and commands with multiple arguments should be quoted. For instance:

SallocDefaultCommand = "$SHELL"

would run the shell in the user's $SHELL environment variable.  and

SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --mpi=none $SHELL"



For example, the following entry will run srun with tcsh when I run salloc.

SallocDefaultcommand="srun -N1 -N1 --pty --preserve-env tcsh"


-Jason
Comment 2 Raghu Reddy 2018-10-23 14:40:29 MDT
Hi Jason,

Sorry about confusing the issue by mentioning both srun and salloc.

Briefly the problem is:

srun --pty -A nesccmgmt --ntasks=1 --qos=debug bash       (works as expected)
srun --pty -A nesccmgmt --ntasks=1 --qos=debug tcsh       (works differently)

The only difference was the last argument which is the shell.

When bash is used the user lands on a compute node that has been allocated.
When tcsh is used the user is still on the submit host and not on an allocated node.

Sorry about the mix up.
Comment 3 Jason Booth 2018-10-23 14:46:41 MDT
Hi Raghu Reddy,

 Thank you for the additional details. While in the second allocation with srun would you echo out the hostname and verify that you are indeed still on sfe01? Perhaps there is some environment setting which is causing this strange behavior. I ran a few tests here but was unable to duplicate the same behavior. Please also check your environment to make sure you are not already in an allocation. I would also like to see the output of 'env' before and after the srun and your slurm.conf.

-Jason
Comment 4 Raghu Reddy 2018-10-26 09:16:23 MDT
Hi Jason,

You are correct, this is was something because of my environment.  Please feel free to close this ticket.

The problem was this:  With Slurm, the submit host env is propagated by default whereas that was not the case with our previous queuing system.

In my .cshrc, I don't change the prompt if it is already set, and I was mistakenly thinking I was on the submit host when I was not.

Both bash and tcsh behave the same, so please feel free to close this ticket.

Thanks!
Comment 5 Jason Booth 2018-10-26 09:17:40 MDT
Resolving as infogiven