| Summary: | srun is slower starting hybrid mpi/openmp jobs than mpirun | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Rod Schultz <Rod.Schultz> |
| Component: | Other | Assignee: | David Bigagli <david> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | da, nancy.kritkausky, yiannis.georgiou |
| Version: | 2.6.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Meteo France | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
Source tar file
Text of original report |
||
Created attachment 531 [details]
Text of original report
Hi Rod,
I will mark this as duplicate of 459 reported by Yiannis,
we have tracked the slow down to be in MPI_Init() when teh mpi library
calls the pmi module. There is not much we can do given the current
implementation.
On 11/21/2013 12:58 PM, bugs@schedmd.com wrote:
> David Bigagli <mailto:david@schedmd.com> changed bug 531
> <http://bugs.schedmd.com/show_bug.cgi?id=531>
> What Removed Added
> Assignee jette@schedmd.com david@schedmd.com
>
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
> * You are on the CC list for the bug.
> * You are the assignee for the bug.
> * You are watching someone on the CC list of the bug.
> * You are watching the assignee of the bug.
>
Thanks. *** This ticket has been marked as a duplicate of ticket 459 *** |
Created attachment 530 [details] Source tar file David, We have another support request. Meteo France has observed that it takes 8 times as long to start a hybrid mpi/openmp job with srun than with mpirun. I have had problems reproducing the problem because it they are using intel mpi and we have not configured that on our test cluster. In addition, they observed it using 24 nodes with 12 cores per node. I suppose I could try and emulate the cluster with –enable-front-end, but since the complaint is performance I’m not sure emulation doesn’t introduce side effects. So let me explain the environment. Maybe you will have an idea on how to evaluate the root problem using their program without reproducing their environment. If you want me to try the emulation route I can, but we can find a better way of observing the problem that would probably be more productive. Attached is the tar file sent with the support request. I’ve also attached a text file containing the bug report we received. It contains a src folder. It also contains an Excel spreadsheet analyzing results and a lot of sample outputs. I don’t think they are particularly useful other than proving there is a problem. Here are their instructions for reproducing the problem. I had to include in the path the location of mpiifort, but maybe that is because we haven’t configured intel-mpi. (PATH=/opt/intel/impi/4.1.1/bin:$PATH) I also had to change they source program, reducing the size of the array by setting LG = 10000 on line 8, otherwise I got a segmentation fault. tar xvfoz mpirun_vs_srun.tgz cd mpirun_vs_srun ROOT=$PWD cd $ROOT/src icc -c wtime.C ifort -c starter.f90 ifort -o starter.exe starter.o wtime.o -lstdc++ mpiifort -c -openmp main.f90 mpiifort -openmp -o main.intel.exe main.o wtime.o -lstdc++ At this point, main.intel.exe has been built in src. The rest of the instructions I believe are for automating running with either mpirun or srun. The scripts have some stuff relevant to their site, particularly about which distributed file system is present that I don’t think are relevant to the problem. ## cd $ROOT ## nohup ./go & ## ./extract.sh mpirun ## ./extract.sh srun ## ls -l log.*run.txt At this point, I can do mpirun –np 2 main.intel.exe On my system, when I do srun main.intel.exe I get this error, which I suspect is because we haven’t configured intel-mpi.