pmix-2.0.2 and slurm-17.11.0. Linux rhel 7.3. My name is Jim Stoffel, HPE MPT devel team. jimstoffel@hpe.com Found that when a PMIx_Put(GLOBAL) key is added, then after a PMIx_commit, fence with PMIX_COLLECT_DATA that the GLOBAL key is not available to remote node. This was found with a slightly modified test example called client.c released in /opt/pmix-2.0.2/examples. I started with questions in the pmix devel google groups. Ralph Castain asked me to create this bug report to give you a heads up. //Here is how I execute it. srun --mpi=pmix_v2 --nodelist cranberry0,cranberry1 --pty /bin/bash srun -n 2 ./a.out Following is my modified version of /opt/pmix-2.0.2/examples/client.c [root@cranberry0 examples]# cat t.c /* * Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. * Copyright (c) 2004-2011 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2013 Los Alamos National Security, LLC. * All rights reserved. * Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2011 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2013-2016 Intel, Inc. All rights reserved. * Copyright (c) 2015 Mellanox Technologies, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow * * $HEADER$ * */ #define _GNU_SOURCE #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <time.h> #include <pmix.h> static volatile bool waiting_for_debugger = true; static pmix_proc_t myproc; /* this is the event notification function we pass down below * when registering for general events - i.e.,, the default * handler. We don't technically need to register one, but it * is usually good practice to catch any events that occur */ static void notification_fn(size_t evhdlr_registration_id, pmix_status_t status, const pmix_proc_t *source, pmix_info_t info[], size_t ninfo, pmix_info_t results[], size_t nresults, pmix_event_notification_cbfunc_fn_t cbfunc, void *cbdata) { if (NULL != cbfunc) { cbfunc(PMIX_EVENT_ACTION_COMPLETE, NULL, 0, NULL, NULL, cbdata); } } /* this is an event notification function that we explicitly request * be called when the PMIX_ERR_DEBUGGER_RELEASE notification is issued. * We could catch it in the general event notification function and test * the status to see if it was "debugger release", but it often is simpler * to declare a use-specific notification callback point. In this case, * we are asking to know when we are told the debugger released us */ static void release_fn(size_t evhdlr_registration_id, pmix_status_t status, const pmix_proc_t *source, pmix_info_t info[], size_t ninfo, pmix_info_t results[], size_t nresults, pmix_event_notification_cbfunc_fn_t cbfunc, void *cbdata) { if (NULL != cbfunc) { cbfunc(PMIX_EVENT_ACTION_COMPLETE, NULL, 0, NULL, NULL, cbdata); } waiting_for_debugger = false; } /* event handler registration is done asynchronously because it * may involve the PMIx server registering with the host RM for * external events. So we provide a callback function that returns * the status of the request (success or an error), plus a numerical index * to the registered event. The index is used later on to deregister * an event handler - if we don't explicitly deregister it, then the * PMIx server will do so when it see us exit */ static void evhandler_reg_callbk(pmix_status_t status, size_t evhandler_ref, void *cbdata) { volatile int *active = (volatile int*)cbdata; if (PMIX_SUCCESS != status) { fprintf(stderr, "Client %s:%d EVENT HANDLER REGISTRATION FAILED WITH STATUS %d, ref=%lu\n", myproc.nspace, myproc.rank, status, (unsigned long)evhandler_ref); } *active = status; } int main(int argc, char **argv) { int rc; pmix_value_t value; pmix_value_t *val = &value; char *tmp; pmix_proc_t proc; uint32_t nprocs, n; pmix_info_t *info; bool flag; volatile int active; pmix_status_t dbg = PMIX_ERR_DEBUGGER_RELEASE; /* init us - note that the call to "init" includes the return of * any job-related info provided by the RM. This includes any * debugger flag instructing us to stop-in-init. If such a directive * is included, then the process will be stopped in this call until * the "debugger release" notification arrives */ if (PMIX_SUCCESS != (rc = PMIx_Init(&myproc, NULL, 0))) { fprintf(stderr, "Client ns %s rank %d: PMIx_Init failed: %d\n", myproc.nspace, myproc.rank, rc); exit(0); } fprintf(stderr, "Client ns %s rank %d: Running\n", myproc.nspace, myproc.rank); /* register our default event handler - again, this isn't strictly * required, but is generally good practice */ active = -1; PMIx_Register_event_handler(NULL, 0, NULL, 0, notification_fn, evhandler_reg_callbk, (void*)&active); while (-1 == active) { sleep(1); } if (0 != active) { fprintf(stderr, "[%s:%d] Default handler registration failed\n", myproc.nspace, myproc.rank); exit(active); } /* job-related info is found in our nspace, assigned to the * wildcard rank as it doesn't relate to a specific rank. Setup * a name to retrieve such values */ PMIX_PROC_CONSTRUCT(&proc); (void)strncpy(proc.nspace, myproc.nspace, PMIX_MAX_NSLEN); proc.rank = PMIX_RANK_WILDCARD; /* check to see if we have been instructed to wait for a debugger * to attach to us. We won't get both a stop-in-init AND a * wait-for-notify directive, so we should never stop twice. This * directive is provided so that something like an MPI implementation * can do some initial setup in MPI_Init prior to pausing for the * debugger */ if (PMIX_SUCCESS == (rc = PMIx_Get(&proc, PMIX_DEBUG_WAIT_FOR_NOTIFY, NULL, 0, &val))) { /* register for debugger release */ active = -1; PMIx_Register_event_handler(&dbg, 1, NULL, 0, release_fn, evhandler_reg_callbk, (void*)&active); /* wait for registration to complete */ while (-1 == active) { sleep(1); } if (0 != active) { fprintf(stderr, "[%s:%d] Debug handler registration failed\n", myproc.nspace, myproc.rank); exit(active); } /* wait for debugger release */ while (waiting_for_debugger) { sleep(1); } } /* get our universe size */ if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, PMIX_UNIV_SIZE, NULL, 0, &val))) { fprintf(stderr, "Client ns %s rank %d: PMIx_Get universe size failed: %d\n", myproc.nspace, myproc.rank, rc); goto done; } nprocs = val->data.uint32; PMIX_VALUE_RELEASE(val); fprintf(stderr, "Client %s:%d universe size %d\n", myproc.nspace, myproc.rank, nprocs); if (0 > asprintf(&tmp, "%s-%d-global", myproc.nspace, myproc.rank)) { exit(1); } value.type = PMIX_UINT64; value.data.uint64 = 1234; if (PMIX_SUCCESS != (rc = PMIx_Put(PMIX_GLOBAL, tmp, &value))) { fprintf(stderr, "Client ns %s rank %d: PMIx_Put internal failed: %d\n", myproc.nspace, myproc.rank, rc); goto done; } free(tmp); fprintf(stderr, "Client ns %s rank %d: Calling PMIx_Commit\n", myproc.nspace, myproc.rank); /* push the data to our PMIx server */ if (PMIX_SUCCESS != (rc = PMIx_Commit())) { fprintf(stderr, "Client ns %s rank %d: PMIx_Commit failed: %d\n", myproc.nspace, myproc.rank, rc); goto done; } fprintf(stderr, "Client ns %s rank %d: PMIx_Commit completed \n", myproc.nspace, myproc.rank); sleep(1); fprintf(stderr, "\n \n Client ns %s rank %d: calling fence. \n", myproc.nspace, myproc.rank); /* call fence to synchronize with our peers - instruct * the fence operation to collect and return all "put" * data from our peers */ PMIX_INFO_CREATE(info, 1); flag = true; PMIX_INFO_LOAD(info, PMIX_COLLECT_DATA, &flag, PMIX_BOOL); if (PMIX_SUCCESS != (rc = PMIx_Fence(&proc, 1, info, 1))) { fprintf(stderr, "Client ns %s rank %d: PMIx_Fence failed: %d\n", myproc.nspace, myproc.rank, rc); goto done; } PMIX_INFO_FREE(info, 1); sleep(1); fprintf(stderr, "\n \n Client ns %s rank %d: COLLECT completed \n", myproc.nspace, myproc.rank); if ( myproc.rank == 0 ) { /* check the returned data */ for (n=0; n < nprocs; n++) { if (0 > asprintf(&tmp, "%s-%d-global", myproc.nspace, n)) { exit(1); } fprintf(stderr, "\n \n Client ns %s rank %d: Calling get for key :%s\n", myproc.nspace, myproc.rank,tmp); if (PMIX_SUCCESS != (rc = PMIx_Get(&myproc, tmp, NULL, 0, &val))) { fprintf(stderr, "Client ns %s rank %d: PMIx_Get %s failed: %d\n", myproc.nspace, myproc.rank, tmp, rc); //goto done; } else { fprintf(stderr, "Client ns %s rank %d: PMIx_Get %s passed _Get: %d\n", myproc.nspace, myproc.rank, tmp, rc); } } } done: sleep(60); /* finalize us */ fprintf(stderr, "Client ns %s rank %d: Finalizing\n", myproc.nspace, myproc.rank); if (PMIX_SUCCESS != (rc = PMIx_Finalize(NULL, 0))) { fprintf(stderr, "Client ns %s rank %d:PMIx_Finalize failed: %d\n", myproc.nspace, myproc.rank, rc); } else { fprintf(stderr, "Client ns %s rank %d:PMIx_Finalize successfully completed\n", myproc.nspace, myproc.rank); } fflush(stderr); return(0); } PMIX trace. [root@cranberry0 examples]# cat /tmp/o [cranberry0.americas.sgi.com:42995] posting notification recv on tag 0 [cranberry1.americas.sgi.com:04409] posting notification recv on tag 0 [cranberry0.americas.sgi.com:42995] pmix: init called [cranberry1.americas.sgi.com:04409] pmix: init called [cranberry0.americas.sgi.com:42995] ptl_base_connect: attempting to connect to server [cranberry1.americas.sgi.com:04409] ptl_base_connect: attempting to connect to server [cranberry0.americas.sgi.com:42995] usock_peer_try_connect: attempting to connect to server on socket 7 [cranberry1.americas.sgi.com:04409] usock_peer_try_connect: attempting to connect to server on socket 7 [cranberry0.americas.sgi.com:42995] send blocking of 64 bytes to socket 7 [cranberry0.americas.sgi.com:42995] blocking send complete to socket 7 [cranberry0.americas.sgi.com:42995] waiting for blocking recv of 4 bytes [cranberry1.americas.sgi.com:04409] send blocking of 64 bytes to socket 7 [cranberry0.americas.sgi.com:42995] blocking receive complete from remote [cranberry0.americas.sgi.com:42995] waiting for blocking recv of 4 bytes [cranberry0.americas.sgi.com:42995] blocking receive complete from remote [cranberry0.americas.sgi.com:42995] sock_peer_try_connect: Connection across to server succeeded [cranberry1.americas.sgi.com:04409] blocking send complete to socket 7 [cranberry1.americas.sgi.com:04409] waiting for blocking recv of 4 bytes [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x620370, 0x7fffffffd410, 1, 14 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] blocking receive complete from remote [cranberry1.americas.sgi.com:04409] waiting for blocking recv of 4 bytes [cranberry1.americas.sgi.com:04409] blocking receive complete from remote [cranberry1.americas.sgi.com:04409] sock_peer_try_connect: Connection across to server succeeded [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7 [cranberry0.americas.sgi.com:42995] posting recv on tag 101 [cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 8 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 101 with NON-NULL msg [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x620c50, 0x7fffffffd410, 1, 14 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7 [cranberry1.americas.sgi.com:04409] posting recv on tag 101 [cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 8 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 101 with NON-NULL msg [cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG [cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400 [cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG [cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7 [cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 101 SIZE 26 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 26 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 26 BYTES FOR TAG 101 ON PEER SOCKET 7 [cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 26 bytes for tag 101 on socket 7 [cranberry1.americas.sgi.com:04409] checking msg on tag 101 for tag 101 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 101 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack: found 1 values for 1 provided storage [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cf0, 0x7ffff61d1c90, 1, 3 ) [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_byte * 18 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE [cranberry1.americas.sgi.com:04409] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.dbg.init [cranberry1.americas.sgi.com:04409] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.dbg.init [cranberry1.americas.sgi.com:04409] HASH:FETCH rank -2 key pmix.dbg.init [cranberry1.americas.sgi.com:04409] HASH:FETCH proc data for rank -2 not found [cranberry1.americas.sgi.com:04409] pmix:client get completed -46 Client ns slurm.pmix.475.13 rank 1: Running [cranberry1.americas.sgi.com:04409] pmix_register_event_hdlr shifting to progress thread [cranberry1.americas.sgi.com:04409] pmix: register event_hdlr with 0 infos [cranberry1.americas.sgi.com:04409] pmix: _add_hdlr [cranberry1.americas.sgi.com:04409] pmix: _add_hdlr sending to server [cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 101 SIZE 26 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 26 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 26 BYTES FOR TAG 101 ON PEER SOCKET 7 [cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff61d1d10, 1, 14 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x636e10, 1, 4 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x636e10, 1, 15 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff0002d60, 1, 4 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff0002d60, 1, 15 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 26 bytes for tag 101 on socket 7 [cranberry0.americas.sgi.com:42995] checking msg on tag 101 for tag 101 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 101 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack: found 1 values for 1 provided storage [cranberry1.americas.sgi.com:04409] posting recv on tag 102 [cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 36 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0ce0, 0x7ffff61d0c80, 1, 3 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_byte * 18 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 102 with NON-NULL msg [cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG [cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7 [cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 102 SIZE 8 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 8 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 102 ON PEER SOCKET 7 [cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 8 bytes for tag 102 on socket 7 [cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.dbg.init [cranberry1.americas.sgi.com:04409] checking msg on tag 102 for tag 102 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 102 [cranberry1.americas.sgi.com:04409] pmix: regevents callback recvd [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack: found 1 values for 1 provided storage [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cf0, 0x7ffff61d1c60, 1, 20 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_status * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE [cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.dbg.init [cranberry0.americas.sgi.com:42995] HASH:FETCH rank -2 key pmix.dbg.init [cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank -2 not found [cranberry0.americas.sgi.com:42995] pmix:client get completed -46 Client ns slurm.pmix.475.13 rank 0: Running [cranberry0.americas.sgi.com:42995] pmix_register_event_hdlr shifting to progress thread [cranberry0.americas.sgi.com:42995] pmix: register event_hdlr with 0 infos [cranberry0.americas.sgi.com:42995] pmix: _add_hdlr [cranberry0.americas.sgi.com:42995] pmix: _add_hdlr sending to server [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff61d0d00, 1, 14 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x636500, 1, 4 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x636500, 1, 15 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff0002d60, 1, 4 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff0002d60, 1, 15 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1 [cranberry0.americas.sgi.com:42995] posting recv on tag 102 [cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 36 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 102 with NON-NULL msg [cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG [cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7 [cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 102 SIZE 8 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 8 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 102 ON PEER SOCKET 7 [cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 8 bytes for tag 102 on socket 7 [cranberry0.americas.sgi.com:42995] checking msg on tag 102 for tag 102 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 102 [cranberry0.americas.sgi.com:42995] pmix: regevents callback recvd [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack: found 1 values for 1 provided storage [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0ce0, 0x7ffff61d0c50, 1, 20 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_status * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE [cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.dbg.notify [cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.dbg.notify [cranberry0.americas.sgi.com:42995] HASH:FETCH rank -2 key pmix.dbg.notify [cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank -2 not found [cranberry0.americas.sgi.com:42995] pmix:client get completed -46 [cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.univ.size [cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.univ.size [cranberry0.americas.sgi.com:42995] HASH:FETCH rank -2 key pmix.univ.size [cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank -2 not found [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack: found 1 values for 1 provided storage [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0cb0, 0x7ffff61d0c90, 1, 21 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0cb0, 0x7ffff61d0c98, 1, 14 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix:client get completed 0 Client slurm.pmix.475.13:0 universe size 2 [cranberry0.americas.sgi.com:42995] pmix: executing put for key slurm.pmix.475.13-0-global type 15 [cranberry0.americas.sgi.com:42995] HASH:STORE rank 0 key slurm.pmix.475.13-0-global [cranberry0.americas.sgi.com:42995] pmix: put slurm.pmix.475.13-0-global data for key global in local cache [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0002cf0, 0x7ffff0000a20, 1, 28 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 27 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0002cf0, 0x7ffff0000a08, 1, 15 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1 [cranberry0.americas.sgi.com:42995] pmix: put slurm.pmix.475.13-0-global data for key global in remote cache [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003550, 0x7ffff0000a20, 1, 28 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 27 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003550, 0x7ffff0000a08, 1, 15 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1 Client ns slurm.pmix.475.13 rank 0: Calling PMIx_Commit [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d0de0, 1, 14 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d0dd0, 1, 32 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff7dd8840, 1, 26 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff0002d28, 1, 15 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 45 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d0dd0, 1, 32 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff7dd8848, 1, 26 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff0003588, 1, 15 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 45 [cranberry0.americas.sgi.com:42995] posting recv on tag 103 [cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 136 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 103 with NON-NULL msg [cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG [cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7 [cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 103 SIZE 8 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 8 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 103 ON PEER SOCKET 7 [cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 8 bytes for tag 103 on socket 7 [cranberry0.americas.sgi.com:42995] checking msg on tag 103 for tag 103 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 103 [cranberry0.americas.sgi.com:42995] pmix:client wait_cbfunc received [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE Client ns slurm.pmix.475.13 rank 0: PMIx_Commit completed [cranberry1.americas.sgi.com:04409] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.dbg.notify [cranberry1.americas.sgi.com:04409] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.dbg.notify [cranberry1.americas.sgi.com:04409] HASH:FETCH rank -2 key pmix.dbg.notify [cranberry1.americas.sgi.com:04409] HASH:FETCH proc data for rank -2 not found [cranberry1.americas.sgi.com:04409] pmix:client get completed -46 [cranberry1.americas.sgi.com:04409] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.univ.size [cranberry1.americas.sgi.com:04409] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.univ.size [cranberry1.americas.sgi.com:04409] HASH:FETCH rank -2 key pmix.univ.size [cranberry1.americas.sgi.com:04409] HASH:FETCH proc data for rank -2 not found [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack: found 1 values for 1 provided storage [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cc0, 0x7ffff61d1ca0, 1, 21 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cc0, 0x7ffff61d1ca8, 1, 14 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix:client get completed 0 Client slurm.pmix.475.13:1 universe size 2 [cranberry1.americas.sgi.com:04409] pmix: executing put for key slurm.pmix.475.13-1-global type 15 [cranberry1.americas.sgi.com:04409] HASH:STORE rank 1 key slurm.pmix.475.13-1-global [cranberry1.americas.sgi.com:04409] pmix: put slurm.pmix.475.13-1-global data for key global in local cache [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0002cf0, 0x7ffff0000a20, 1, 28 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 27 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0002cf0, 0x7ffff0000a08, 1, 15 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1 [cranberry1.americas.sgi.com:04409] pmix: put slurm.pmix.475.13-1-global data for key global in remote cache [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003550, 0x7ffff0000a20, 1, 28 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 27 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003550, 0x7ffff0000a08, 1, 15 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1 Client ns slurm.pmix.475.13 rank 1: Calling PMIx_Commit [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d1df0, 1, 14 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d1de0, 1, 32 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff7dd8840, 1, 26 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff0002d28, 1, 15 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 45 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d1de0, 1, 32 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff7dd8848, 1, 26 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff0003588, 1, 15 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 45 [cranberry1.americas.sgi.com:04409] posting recv on tag 103 [cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 136 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 103 with NON-NULL msg [cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG [cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7 [cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 103 SIZE 8 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 8 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 103 ON PEER SOCKET 7 [cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 8 bytes for tag 103 on socket 7 [cranberry1.americas.sgi.com:04409] checking msg on tag 103 for tag 103 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 103 [cranberry1.americas.sgi.com:04409] pmix:client wait_cbfunc received [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE Client ns slurm.pmix.475.13 rank 1: PMIx_Commit completed Client ns slurm.pmix.475.13 rank 0: calling fence. [cranberry0.americas.sgi.com:42995] pmix: executing fence [cranberry0.americas.sgi.com:42995] pmix: fence_nb called [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd950, 1, 14 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd960, 1, 4 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd960, 1, 15 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffdb50, 1, 22 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 18 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd970, 1, 4 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd970, 1, 15 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x636690, 1, 24 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 13 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x636898, 1, 9 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x6368a0, 1, 1 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_bool * 1 [cranberry0.americas.sgi.com:42995] posting recv on tag 104 [cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 98 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 104 with NON-NULL msg [cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG [cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT Client ns slurm.pmix.475.13 rank 1: calling fence. [cranberry1.americas.sgi.com:04409] pmix: executing fence [cranberry1.americas.sgi.com:04409] pmix: fence_nb called [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd950, 1, 14 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd960, 1, 4 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd960, 1, 15 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffdb50, 1, 22 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 18 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd970, 1, 4 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd970, 1, 15 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x636fa0, 1, 24 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 13 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x6371a8, 1, 9 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x6371b0, 1, 1 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_bool * 1 [cranberry1.americas.sgi.com:04409] posting recv on tag 104 [cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 98 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 104 with NON-NULL msg [cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG [cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7 [cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 104 SIZE 8 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 8 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 104 ON PEER SOCKET 7 [cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 8 bytes for tag 104 on socket 7 [cranberry0.americas.sgi.com:42995] checking msg on tag 104 for tag 104 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 104 [cranberry0.americas.sgi.com:42995] pmix: fence_nb callback recvd [cranberry0.americas.sgi.com:42995] client:unpack fence called [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack: found 1 values for 1 provided storage [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0ce0, 0x7ffff61d0c70, 1, 20 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_status * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1 [cranberry0.americas.sgi.com:42995] client:unpack fence received status 0 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE [cranberry0.americas.sgi.com:42995] pmix: fence released [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7 [cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 104 SIZE 8 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 8 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 104 ON PEER SOCKET 7 [cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 8 bytes for tag 104 on socket 7 [cranberry1.americas.sgi.com:04409] checking msg on tag 104 for tag 104 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 104 [cranberry1.americas.sgi.com:04409] pmix: fence_nb callback recvd [cranberry1.americas.sgi.com:04409] client:unpack fence called [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack: found 1 values for 1 provided storage [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cf0, 0x7ffff61d1c80, 1, 20 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_status * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1 [cranberry1.americas.sgi.com:04409] client:unpack fence received status 0 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE [cranberry1.americas.sgi.com:04409] pmix: fence released Client ns slurm.pmix.475.13 rank 0: COLLECT completed Client ns slurm.pmix.475.13 rank 0: Calling get for key :slurm.pmix.475.13-0-global [cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:0 key slurm.pmix.475.13-0-global [cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:0 key slurm.pmix.475.13-0-global [cranberry0.americas.sgi.com:42995] HASH:FETCH rank 0 key slurm.pmix.475.13-0-global [cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank 0 not found [cranberry0.americas.sgi.com:42995] HASH:FETCH rank 0 key slurm.pmix.475.13-0-global [cranberry0.americas.sgi.com:42995] pmix:client get completed 0 Client ns slurm.pmix.475.13 rank 0: PMIx_Get slurm.pmix.475.13-0-global passed _Get: 0 Client ns slurm.pmix.475.13 rank 0: Calling get for key :slurm.pmix.475.13-1-global [cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:0 key slurm.pmix.475.13-1-global [cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:0 key slurm.pmix.475.13-1-global [cranberry0.americas.sgi.com:42995] HASH:FETCH rank 0 key slurm.pmix.475.13-1-global [cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank 0 not found [cranberry0.americas.sgi.com:42995] HASH:FETCH rank 0 key slurm.pmix.475.13-1-global [cranberry0.americas.sgi.com:42995] HASH:FETCH data for key slurm.pmix.475.13-1-global not found [cranberry0.americas.sgi.com:42995] pmix:client get completed -46 Client ns slurm.pmix.475.13 rank 0: PMIx_Get slurm.pmix.475.13-1-global failed: -46 Client ns slurm.pmix.475.13 rank 1: COLLECT completed Client ns slurm.pmix.475.13 rank 0: Finalizing [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 pmix:client finalize called [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x631630, 0x7fffffffd9b0, 1, 14 ) [cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 pmix:client sending finalize sync to server [cranberry0.americas.sgi.com:42995] posting recv on tag 105 [cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 8 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 105 with NON-NULL msg [cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG [cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT Client ns slurm.pmix.475.13 rank 1: Finalizing [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 pmix:client finalize called [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x620c50, 0x7fffffffd9b0, 1, 14 ) [cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 pmix:client sending finalize sync to server [cranberry1.americas.sgi.com:04409] posting recv on tag 105 [cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 8 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 105 with NON-NULL msg [cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG [cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7 [cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 105 SIZE 8 [cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 8 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 105 ON PEER SOCKET 7 [cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 8 bytes for tag 105 on socket 7 [cranberry1.americas.sgi.com:04409] checking msg on tag 105 for tag 105 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 105 [cranberry1.americas.sgi.com:04409] pmix:client finwait_cbfunc received [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7 [cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 105 SIZE 8 [cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 8 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 105 ON PEER SOCKET 7 [cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 8 bytes for tag 105 on socket 7 [cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 pmix:client finalize sync received [cranberry0.americas.sgi.com:42995] checking msg on tag 105 for tag 105 [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 105 [cranberry0.americas.sgi.com:42995] pmix:client finwait_cbfunc received [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE [cranberry1.americas.sgi.com:04409] listen_thread: shutdown [cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 pmix:client finalize sync received [cranberry0.americas.sgi.com:42995] listen_thread: shutdown Client ns slurm.pmix.475.13 rank 1:PMIx_Finalize successfully completed Client ns slurm.pmix.475.13 rank 0:PMIx_Finalize successfully completed [root@cranberry0 examples]#
Updated the version number.
I think this is the issue of the testing program. It is derived after <pmix>/examples/client.c which is buggy: https://github.com/pmix/pmix/issues/648
This issue was resolved in the most recent PMIx master:https://github.com/pmix/pmix/pull/650
Hi Jim, Since comment 5 from Artem states it is solved, is it ok for you to close the issue? Thanks
https://pmix.org/support/faq/rm-provided-information/ has a list of Keys that are RM provided. This page states that SLRUM supports many of them starting with 16.05. I am using Slurm 17.11. What test case should I try for testing that keys PMIX_LOCAL_SIZE and PMIX_LOCAL_PEERS are set by the RM. I am getting return code of -46 when I try to _Get these values. A pointer to a working test case would be best so I may determine what I am doing incorrectly. Thanks, Jim
Created attachment 6035 [details] test PMIX_LOCAL_SIZE and PMIX_LOCAL_PEERS keys
Hi Jim, Can you clarify please for which rank you are trying to get PMIX_LOCAL_SIZE and PMIX_LOCAL_PEERS? Also, please see the simple test in attachment above.
This can be closed now. I have been able to get the KEYs as expected.
Thank you, Boris!
Closing bug as solution provided.