Ticket 4678

Summary: pmix PMIx_Put(GLOBAL) Key not propagated to other ranks.
Product: Slurm Reporter: jim stoffel <jim.stoffel>
Component: slurmstepdAssignee: Felip Moll <felip.moll>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: artpol84, karasev.b
Version: 17.11.0   
Hardware: Linux   
OS: Linux   
Site: HPE Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: SL-T5-IRONNINJA IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: test PMIX_LOCAL_SIZE and PMIX_LOCAL_PEERS keys

Description jim stoffel 2018-01-24 09:55:44 MST
pmix-2.0.2 and slurm-17.11.0. Linux rhel 7.3.
My name is Jim Stoffel, HPE MPT devel team.  jimstoffel@hpe.com

Found that when a PMIx_Put(GLOBAL) key is added, then after a PMIx_commit, fence
with PMIX_COLLECT_DATA that the GLOBAL key is not available to remote
node.
This was found with a slightly modified test example called client.c
released in /opt/pmix-2.0.2/examples.
I started with questions in the pmix devel google groups.   Ralph Castain
asked me to create this bug report to give you a heads up.

//Here is how I execute it.
srun  --mpi=pmix_v2 --nodelist cranberry0,cranberry1 --pty /bin/bash
srun -n 2 ./a.out

Following is my modified version of /opt/pmix-2.0.2/examples/client.c
[root@cranberry0 examples]# cat t.c
/*
 * Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana
 *                         University Research and Technology
 *                         Corporation.  All rights reserved.
 * Copyright (c) 2004-2011 The University of Tennessee and The University
 *                         of Tennessee Research Foundation.  All rights
 *                         reserved.
 * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
 *                         University of Stuttgart.  All rights reserved.
 * Copyright (c) 2004-2005 The Regents of the University of California.
 *                         All rights reserved.
 * Copyright (c) 2006-2013 Los Alamos National Security, LLC.
 *                         All rights reserved.
 * Copyright (c) 2009-2012 Cisco Systems, Inc.  All rights reserved.
 * Copyright (c) 2011      Oak Ridge National Labs.  All rights reserved.
 * Copyright (c) 2013-2016 Intel, Inc.  All rights reserved.
 * Copyright (c) 2015      Mellanox Technologies, Inc.  All rights reserved.
 * $COPYRIGHT$
 *
 * Additional copyrights may follow
 *
 * $HEADER$
 *
 */

#define _GNU_SOURCE
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>

#include <pmix.h>

static volatile bool waiting_for_debugger = true;
static pmix_proc_t myproc;

/* this is the event notification function we pass down below
 * when registering for general events - i.e.,, the default
 * handler. We don't technically need to register one, but it
 * is usually good practice to catch any events that occur */
static void notification_fn(size_t evhdlr_registration_id,
                            pmix_status_t status,
                            const pmix_proc_t *source,
                            pmix_info_t info[], size_t ninfo,
                            pmix_info_t results[], size_t nresults,
                            pmix_event_notification_cbfunc_fn_t cbfunc,
                            void *cbdata)
{
    if (NULL != cbfunc) {
        cbfunc(PMIX_EVENT_ACTION_COMPLETE, NULL, 0, NULL, NULL, cbdata);
    }
}

/* this is an event notification function that we explicitly request
 * be called when the PMIX_ERR_DEBUGGER_RELEASE notification is issued.
 * We could catch it in the general event notification function and test
 * the status to see if it was "debugger release", but it often is simpler
 * to declare a use-specific notification callback point. In this case,
 * we are asking to know when we are told the debugger released us */
static void release_fn(size_t evhdlr_registration_id,
                       pmix_status_t status,
                       const pmix_proc_t *source,
                       pmix_info_t info[], size_t ninfo,
                       pmix_info_t results[], size_t nresults,
                       pmix_event_notification_cbfunc_fn_t cbfunc,
                       void *cbdata)
{
    if (NULL != cbfunc) {
        cbfunc(PMIX_EVENT_ACTION_COMPLETE, NULL, 0, NULL, NULL, cbdata);
    }
    waiting_for_debugger = false;
}

/* event handler registration is done asynchronously because it
 * may involve the PMIx server registering with the host RM for
 * external events. So we provide a callback function that returns
 * the status of the request (success or an error), plus a numerical index
 * to the registered event. The index is used later on to deregister
 * an event handler - if we don't explicitly deregister it, then the
 * PMIx server will do so when it see us exit */
static void evhandler_reg_callbk(pmix_status_t status,
                                 size_t evhandler_ref,
                                 void *cbdata)
{
    volatile int *active = (volatile int*)cbdata;

    if (PMIX_SUCCESS != status) {
        fprintf(stderr, "Client %s:%d EVENT HANDLER REGISTRATION FAILED WITH STATUS %d, ref=%lu\n",
                   myproc.nspace, myproc.rank, status, (unsigned long)evhandler_ref);
    }
    *active = status;
}

int main(int argc, char **argv)
{
    int rc;
    pmix_value_t value;
    pmix_value_t *val = &value;
    char *tmp;
    pmix_proc_t proc;
    uint32_t nprocs, n;
    pmix_info_t *info;
    bool flag;
    volatile int active;
    pmix_status_t dbg = PMIX_ERR_DEBUGGER_RELEASE;

    /* init us - note that the call to "init" includes the return of
     * any job-related info provided by the RM. This includes any
     * debugger flag instructing us to stop-in-init. If such a directive
     * is included, then the process will be stopped in this call until
     * the "debugger release" notification arrives */
    if (PMIX_SUCCESS != (rc = PMIx_Init(&myproc, NULL, 0))) {
        fprintf(stderr, "Client ns %s rank %d: PMIx_Init failed: %d\n", myproc.nspace, myproc.rank, rc);
        exit(0);
    }
    fprintf(stderr, "Client ns %s rank %d: Running\n", myproc.nspace, myproc.rank);


    /* register our default event handler - again, this isn't strictly
     * required, but is generally good practice */
    active = -1;
    PMIx_Register_event_handler(NULL, 0, NULL, 0,
                                notification_fn, evhandler_reg_callbk, (void*)&active);
    while (-1 == active) {
        sleep(1);
    }
    if (0 != active) {
        fprintf(stderr, "[%s:%d] Default handler registration failed\n", myproc.nspace, myproc.rank);
        exit(active);
    }

    /* job-related info is found in our nspace, assigned to the
     * wildcard rank as it doesn't relate to a specific rank. Setup
     * a name to retrieve such values */
    PMIX_PROC_CONSTRUCT(&proc);
    (void)strncpy(proc.nspace, myproc.nspace, PMIX_MAX_NSLEN);
    proc.rank = PMIX_RANK_WILDCARD;

    /* check to see if we have been instructed to wait for a debugger
     * to attach to us. We won't get both a stop-in-init AND a
     * wait-for-notify directive, so we should never stop twice. This
     * directive is provided so that something like an MPI implementation
     * can do some initial setup in MPI_Init prior to pausing for the
     * debugger */
    if (PMIX_SUCCESS == (rc = PMIx_Get(&proc, PMIX_DEBUG_WAIT_FOR_NOTIFY, NULL, 0, &val))) {
        /* register for debugger release */
        active = -1;
        PMIx_Register_event_handler(&dbg, 1, NULL, 0,
                                    release_fn, evhandler_reg_callbk, (void*)&active);
        /* wait for registration to complete */
        while (-1 == active) {
            sleep(1);
        }
        if (0 != active) {
            fprintf(stderr, "[%s:%d] Debug handler registration failed\n", myproc.nspace, myproc.rank);
            exit(active);
        }
        /* wait for debugger release */
        while (waiting_for_debugger) {
            sleep(1);
        }
    }

 /* get our universe size */
    if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, PMIX_UNIV_SIZE, NULL, 0, &val))) {
        fprintf(stderr, "Client ns %s rank %d: PMIx_Get universe size failed: %d\n", myproc.nspace, myproc.rank, rc);
        goto done;
    }
    nprocs = val->data.uint32;
    PMIX_VALUE_RELEASE(val);
    fprintf(stderr, "Client %s:%d universe size %d\n", myproc.nspace, myproc.rank, nprocs);

    if (0 > asprintf(&tmp, "%s-%d-global", myproc.nspace, myproc.rank)) {
        exit(1);
    }
    value.type = PMIX_UINT64;
    value.data.uint64 = 1234;
    if (PMIX_SUCCESS != (rc = PMIx_Put(PMIX_GLOBAL, tmp, &value))) {
        fprintf(stderr, "Client ns %s rank %d: PMIx_Put internal failed: %d\n", myproc.nspace, myproc.rank, rc);
        goto done;
    }
    free(tmp);


fprintf(stderr, "Client ns %s rank %d: Calling PMIx_Commit\n", myproc.nspace, myproc.rank);
    /* push the data to our PMIx server */
    if (PMIX_SUCCESS != (rc = PMIx_Commit())) {
        fprintf(stderr, "Client ns %s rank %d: PMIx_Commit failed: %d\n", myproc.nspace, myproc.rank, rc);
        goto done;
    }
fprintf(stderr, "Client ns %s rank %d: PMIx_Commit completed \n", myproc.nspace, myproc.rank);

sleep(1);
fprintf(stderr, "\n \n Client ns %s rank %d: calling fence. \n", myproc.nspace, myproc.rank);
    /* call fence to synchronize with our peers - instruct
     * the fence operation to collect and return all "put"
     * data from our peers */
    PMIX_INFO_CREATE(info, 1);
    flag = true;
    PMIX_INFO_LOAD(info, PMIX_COLLECT_DATA, &flag, PMIX_BOOL);
    if (PMIX_SUCCESS != (rc = PMIx_Fence(&proc, 1, info, 1))) {
        fprintf(stderr, "Client ns %s rank %d: PMIx_Fence failed: %d\n", myproc.nspace, myproc.rank, rc);
        goto done;
    }
    PMIX_INFO_FREE(info, 1);

sleep(1);
fprintf(stderr, "\n \n Client ns %s rank %d: COLLECT completed \n", myproc.nspace, myproc.rank);

if ( myproc.rank == 0 ) {
    /* check the returned data */
    for (n=0; n < nprocs; n++) {
        if (0 > asprintf(&tmp, "%s-%d-global", myproc.nspace, n)) {
            exit(1);
        }
fprintf(stderr, "\n \n Client ns %s rank %d: Calling get for key :%s\n", myproc.nspace, myproc.rank,tmp);
        if (PMIX_SUCCESS != (rc = PMIx_Get(&myproc, tmp, NULL, 0, &val))) {
            fprintf(stderr, "Client ns %s rank %d: PMIx_Get %s failed: %d\n", myproc.nspace, myproc.rank, tmp, rc);
            //goto done;
        } else {
            fprintf(stderr, "Client ns %s rank %d: PMIx_Get %s passed _Get: %d\n", myproc.nspace, myproc.rank, tmp, rc);
        }
    }
}

 done:
        sleep(60);
    /* finalize us */
    fprintf(stderr, "Client ns %s rank %d: Finalizing\n", myproc.nspace, myproc.rank);
    if (PMIX_SUCCESS != (rc = PMIx_Finalize(NULL, 0))) {
        fprintf(stderr, "Client ns %s rank %d:PMIx_Finalize failed: %d\n", myproc.nspace, myproc.rank, rc);
    } else {
        fprintf(stderr, "Client ns %s rank %d:PMIx_Finalize successfully completed\n", myproc.nspace, myproc.rank);
    }
    fflush(stderr);
    return(0);
}


PMIX trace.
[root@cranberry0 examples]# cat /tmp/o
[cranberry0.americas.sgi.com:42995] posting notification recv on tag 0
[cranberry1.americas.sgi.com:04409] posting notification recv on tag 0
[cranberry0.americas.sgi.com:42995] pmix: init called
[cranberry1.americas.sgi.com:04409] pmix: init called
[cranberry0.americas.sgi.com:42995] ptl_base_connect: attempting to connect to server
[cranberry1.americas.sgi.com:04409] ptl_base_connect: attempting to connect to server
[cranberry0.americas.sgi.com:42995] usock_peer_try_connect: attempting to connect to server on socket 7
[cranberry1.americas.sgi.com:04409] usock_peer_try_connect: attempting to connect to server on socket 7
[cranberry0.americas.sgi.com:42995] send blocking of 64 bytes to socket 7
[cranberry0.americas.sgi.com:42995] blocking send complete to socket 7
[cranberry0.americas.sgi.com:42995] waiting for blocking recv of 4 bytes
[cranberry1.americas.sgi.com:04409] send blocking of 64 bytes to socket 7
[cranberry0.americas.sgi.com:42995] blocking receive complete from remote
[cranberry0.americas.sgi.com:42995] waiting for blocking recv of 4 bytes
[cranberry0.americas.sgi.com:42995] blocking receive complete from remote
[cranberry0.americas.sgi.com:42995] sock_peer_try_connect: Connection across to server succeeded
[cranberry1.americas.sgi.com:04409] blocking send complete to socket 7
[cranberry1.americas.sgi.com:04409] waiting for blocking recv of 4 bytes
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x620370, 0x7fffffffd410, 1, 14 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] blocking receive complete from remote
[cranberry1.americas.sgi.com:04409] waiting for blocking recv of 4 bytes
[cranberry1.americas.sgi.com:04409] blocking receive complete from remote
[cranberry1.americas.sgi.com:04409] sock_peer_try_connect: Connection across to server succeeded
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7
[cranberry0.americas.sgi.com:42995] posting recv on tag 101
[cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 8
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 101 with NON-NULL msg
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x620c50, 0x7fffffffd410, 1, 14 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7
[cranberry1.americas.sgi.com:04409] posting recv on tag 101
[cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 8
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 101 with NON-NULL msg
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7
[cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 101 SIZE 26
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 26
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 26 BYTES FOR TAG 101 ON PEER SOCKET 7
[cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 26 bytes for tag 101 on socket 7
[cranberry1.americas.sgi.com:04409] checking msg on tag 101 for tag 101
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 101
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack: found 1 values for 1 provided storage
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cf0, 0x7ffff61d1c90, 1, 3 )
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_byte * 18
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE
[cranberry1.americas.sgi.com:04409] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.dbg.init
[cranberry1.americas.sgi.com:04409] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.dbg.init
[cranberry1.americas.sgi.com:04409] HASH:FETCH rank -2 key pmix.dbg.init
[cranberry1.americas.sgi.com:04409] HASH:FETCH proc data for rank -2 not found
[cranberry1.americas.sgi.com:04409] pmix:client get completed -46
Client ns slurm.pmix.475.13 rank 1: Running
[cranberry1.americas.sgi.com:04409] pmix_register_event_hdlr shifting to progress thread
[cranberry1.americas.sgi.com:04409] pmix: register event_hdlr with 0 infos
[cranberry1.americas.sgi.com:04409] pmix: _add_hdlr
[cranberry1.americas.sgi.com:04409] pmix: _add_hdlr sending to server
[cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 101 SIZE 26
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 26
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 26 BYTES FOR TAG 101 ON PEER SOCKET 7
[cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff61d1d10, 1, 14 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x636e10, 1, 4 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x636e10, 1, 15 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff0002d60, 1, 4 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff0002d60, 1, 15 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 26 bytes for tag 101 on socket 7
[cranberry0.americas.sgi.com:42995] checking msg on tag 101 for tag 101
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 101
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack: found 1 values for 1 provided storage
[cranberry1.americas.sgi.com:04409] posting recv on tag 102
[cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 36
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0ce0, 0x7ffff61d0c80, 1, 3 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_byte * 18
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 102 with NON-NULL msg
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7
[cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 102 SIZE 8
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 8
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 102 ON PEER SOCKET 7
[cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 8 bytes for tag 102 on socket 7
[cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.dbg.init
[cranberry1.americas.sgi.com:04409] checking msg on tag 102 for tag 102
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 102
[cranberry1.americas.sgi.com:04409] pmix: regevents callback recvd
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack: found 1 values for 1 provided storage
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cf0, 0x7ffff61d1c60, 1, 20 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_status * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE
[cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.dbg.init
[cranberry0.americas.sgi.com:42995] HASH:FETCH rank -2 key pmix.dbg.init
[cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank -2 not found
[cranberry0.americas.sgi.com:42995] pmix:client get completed -46
Client ns slurm.pmix.475.13 rank 0: Running
[cranberry0.americas.sgi.com:42995] pmix_register_event_hdlr shifting to progress thread
[cranberry0.americas.sgi.com:42995] pmix: register event_hdlr with 0 infos
[cranberry0.americas.sgi.com:42995] pmix: _add_hdlr
[cranberry0.americas.sgi.com:42995] pmix: _add_hdlr sending to server
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff61d0d00, 1, 14 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x636500, 1, 4 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x636500, 1, 15 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff0002d60, 1, 4 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0000a00, 0x7ffff0002d60, 1, 15 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1
[cranberry0.americas.sgi.com:42995] posting recv on tag 102
[cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 36
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 102 with NON-NULL msg
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7
[cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 102 SIZE 8
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 8
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 102 ON PEER SOCKET 7
[cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 8 bytes for tag 102 on socket 7
[cranberry0.americas.sgi.com:42995] checking msg on tag 102 for tag 102
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 102
[cranberry0.americas.sgi.com:42995] pmix: regevents callback recvd
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack: found 1 values for 1 provided storage
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0ce0, 0x7ffff61d0c50, 1, 20 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_status * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE
[cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.dbg.notify
[cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.dbg.notify
[cranberry0.americas.sgi.com:42995] HASH:FETCH rank -2 key pmix.dbg.notify
[cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank -2 not found
[cranberry0.americas.sgi.com:42995] pmix:client get completed -46
[cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.univ.size
[cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.univ.size
[cranberry0.americas.sgi.com:42995] HASH:FETCH rank -2 key pmix.univ.size
[cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank -2 not found
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack: found 1 values for 1 provided storage
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0cb0, 0x7ffff61d0c90, 1, 21 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0cb0, 0x7ffff61d0c98, 1, 14 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix:client get completed 0
Client slurm.pmix.475.13:0 universe size 2
[cranberry0.americas.sgi.com:42995] pmix: executing put for key slurm.pmix.475.13-0-global type 15
[cranberry0.americas.sgi.com:42995] HASH:STORE rank 0 key slurm.pmix.475.13-0-global
[cranberry0.americas.sgi.com:42995] pmix: put slurm.pmix.475.13-0-global data for key global in local cache
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0002cf0, 0x7ffff0000a20, 1, 28 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 27
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0002cf0, 0x7ffff0000a08, 1, 15 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1
[cranberry0.americas.sgi.com:42995] pmix: put slurm.pmix.475.13-0-global data for key global in remote cache
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003550, 0x7ffff0000a20, 1, 28 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 27
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003550, 0x7ffff0000a08, 1, 15 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1
Client ns slurm.pmix.475.13 rank 0: Calling PMIx_Commit
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d0de0, 1, 14 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d0dd0, 1, 32 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff7dd8840, 1, 26 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff0002d28, 1, 15 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 45
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d0dd0, 1, 32 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff7dd8848, 1, 26 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff0003588, 1, 15 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 45
[cranberry0.americas.sgi.com:42995] posting recv on tag 103
[cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 136
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 103 with NON-NULL msg
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7
[cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 103 SIZE 8
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 8
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 103 ON PEER SOCKET 7
[cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 8 bytes for tag 103 on socket 7
[cranberry0.americas.sgi.com:42995] checking msg on tag 103 for tag 103
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 103
[cranberry0.americas.sgi.com:42995] pmix:client wait_cbfunc received
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE
Client ns slurm.pmix.475.13 rank 0: PMIx_Commit completed
[cranberry1.americas.sgi.com:04409] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.dbg.notify
[cranberry1.americas.sgi.com:04409] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.dbg.notify
[cranberry1.americas.sgi.com:04409] HASH:FETCH rank -2 key pmix.dbg.notify
[cranberry1.americas.sgi.com:04409] HASH:FETCH proc data for rank -2 not found
[cranberry1.americas.sgi.com:04409] pmix:client get completed -46
[cranberry1.americas.sgi.com:04409] pmix: get_nb value for proc slurm.pmix.475.13:4294967294 key pmix.univ.size
[cranberry1.americas.sgi.com:04409] pmix: getnbfn value for proc slurm.pmix.475.13:-2 key pmix.univ.size
[cranberry1.americas.sgi.com:04409] HASH:FETCH rank -2 key pmix.univ.size
[cranberry1.americas.sgi.com:04409] HASH:FETCH proc data for rank -2 not found
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack: found 1 values for 1 provided storage
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cc0, 0x7ffff61d1ca0, 1, 21 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cc0, 0x7ffff61d1ca8, 1, 14 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix:client get completed 0
Client slurm.pmix.475.13:1 universe size 2
[cranberry1.americas.sgi.com:04409] pmix: executing put for key slurm.pmix.475.13-1-global type 15
[cranberry1.americas.sgi.com:04409] HASH:STORE rank 1 key slurm.pmix.475.13-1-global
[cranberry1.americas.sgi.com:04409] pmix: put slurm.pmix.475.13-1-global data for key global in local cache
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0002cf0, 0x7ffff0000a20, 1, 28 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 27
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0002cf0, 0x7ffff0000a08, 1, 15 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1
[cranberry1.americas.sgi.com:04409] pmix: put slurm.pmix.475.13-1-global data for key global in remote cache
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003550, 0x7ffff0000a20, 1, 28 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 27
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003550, 0x7ffff0000a08, 1, 15 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1
Client ns slurm.pmix.475.13 rank 1: Calling PMIx_Commit
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d1df0, 1, 14 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d1de0, 1, 32 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff7dd8840, 1, 26 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff0002d28, 1, 15 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 45
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff61d1de0, 1, 32 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff7dd8848, 1, 26 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x7ffff0003db0, 0x7ffff0003588, 1, 15 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 45
[cranberry1.americas.sgi.com:04409] posting recv on tag 103
[cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 136
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 103 with NON-NULL msg
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7
[cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 103 SIZE 8
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 8
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 103 ON PEER SOCKET 7
[cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 8 bytes for tag 103 on socket 7
[cranberry1.americas.sgi.com:04409] checking msg on tag 103 for tag 103
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 103
[cranberry1.americas.sgi.com:04409] pmix:client wait_cbfunc received
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE
Client ns slurm.pmix.475.13 rank 1: PMIx_Commit completed


 Client ns slurm.pmix.475.13 rank 0: calling fence.
[cranberry0.americas.sgi.com:42995] pmix: executing fence
[cranberry0.americas.sgi.com:42995] pmix: fence_nb called
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd950, 1, 14 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd960, 1, 4 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd960, 1, 15 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffdb50, 1, 22 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 18
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd970, 1, 4 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x7fffffffd970, 1, 15 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int64 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x636690, 1, 24 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_byte * 13
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int16 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x636898, 1, 9 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x636ca0, 0x6368a0, 1, 1 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_bool * 1
[cranberry0.americas.sgi.com:42995] posting recv on tag 104
[cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 98
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 104 with NON-NULL msg
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT


 Client ns slurm.pmix.475.13 rank 1: calling fence.
[cranberry1.americas.sgi.com:04409] pmix: executing fence
[cranberry1.americas.sgi.com:04409] pmix: fence_nb called
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd950, 1, 14 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd960, 1, 4 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd960, 1, 15 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffdb50, 1, 22 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 18
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd970, 1, 4 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x7fffffffd970, 1, 15 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int64 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x636fa0, 1, 24 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_byte * 13
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int16 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x6371a8, 1, 9 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x6375b0, 0x6371b0, 1, 1 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_bool * 1
[cranberry1.americas.sgi.com:04409] posting recv on tag 104
[cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 98
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 104 with NON-NULL msg
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7
[cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 104 SIZE 8
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 8
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 104 ON PEER SOCKET 7
[cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 8 bytes for tag 104 on socket 7
[cranberry0.americas.sgi.com:42995] checking msg on tag 104 for tag 104
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 104
[cranberry0.americas.sgi.com:42995] pmix: fence_nb callback recvd
[cranberry0.americas.sgi.com:42995] client:unpack fence called
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack: found 1 values for 1 provided storage
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_buffer( 0x7ffff61d0ce0, 0x7ffff61d0c70, 1, 20 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_status * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_unpack_int32 * 1
[cranberry0.americas.sgi.com:42995] client:unpack fence received status 0
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE
[cranberry0.americas.sgi.com:42995] pmix: fence released
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7
[cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 104 SIZE 8
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 8
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 104 ON PEER SOCKET 7
[cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 8 bytes for tag 104 on socket 7
[cranberry1.americas.sgi.com:04409] checking msg on tag 104 for tag 104
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 104
[cranberry1.americas.sgi.com:04409] pmix: fence_nb callback recvd
[cranberry1.americas.sgi.com:04409] client:unpack fence called
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack: found 1 values for 1 provided storage
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_buffer( 0x7ffff61d1cf0, 0x7ffff61d1c80, 1, 20 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_status * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_unpack_int32 * 1
[cranberry1.americas.sgi.com:04409] client:unpack fence received status 0
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE
[cranberry1.americas.sgi.com:04409] pmix: fence released


 Client ns slurm.pmix.475.13 rank 0: COLLECT completed


 Client ns slurm.pmix.475.13 rank 0: Calling get for key :slurm.pmix.475.13-0-global
[cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:0 key slurm.pmix.475.13-0-global
[cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:0 key slurm.pmix.475.13-0-global
[cranberry0.americas.sgi.com:42995] HASH:FETCH rank 0 key slurm.pmix.475.13-0-global
[cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank 0 not found
[cranberry0.americas.sgi.com:42995] HASH:FETCH rank 0 key slurm.pmix.475.13-0-global
[cranberry0.americas.sgi.com:42995] pmix:client get completed 0
Client ns slurm.pmix.475.13 rank 0: PMIx_Get slurm.pmix.475.13-0-global passed _Get: 0


 Client ns slurm.pmix.475.13 rank 0: Calling get for key :slurm.pmix.475.13-1-global
[cranberry0.americas.sgi.com:42995] pmix: get_nb value for proc slurm.pmix.475.13:0 key slurm.pmix.475.13-1-global
[cranberry0.americas.sgi.com:42995] pmix: getnbfn value for proc slurm.pmix.475.13:0 key slurm.pmix.475.13-1-global
[cranberry0.americas.sgi.com:42995] HASH:FETCH rank 0 key slurm.pmix.475.13-1-global
[cranberry0.americas.sgi.com:42995] HASH:FETCH proc data for rank 0 not found
[cranberry0.americas.sgi.com:42995] HASH:FETCH rank 0 key slurm.pmix.475.13-1-global
[cranberry0.americas.sgi.com:42995] HASH:FETCH data for key slurm.pmix.475.13-1-global not found
[cranberry0.americas.sgi.com:42995] pmix:client get completed -46
Client ns slurm.pmix.475.13 rank 0: PMIx_Get slurm.pmix.475.13-1-global failed: -46


 Client ns slurm.pmix.475.13 rank 1: COLLECT completed
Client ns slurm.pmix.475.13 rank 0: Finalizing
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 pmix:client finalize called
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_buffer( 0x631630, 0x7fffffffd9b0, 1, 14 )
[cranberry0.americas.sgi.com:42995] pmix_bfrop_pack_int32 * 1
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 pmix:client sending finalize sync to server
[cranberry0.americas.sgi.com:42995] posting recv on tag 105
[cranberry0.americas.sgi.com:42995] QUEIENG MSG TO SERVER OF SIZE 8
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:send_handler SENDING TO PEER pmix-server:42986 tag 105 with NON-NULL msg
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler SENDING MSG
[cranberry0.americas.sgi.com:42995] ptl:base:send_handler MSG SENT
Client ns slurm.pmix.475.13 rank 1: Finalizing
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 pmix:client finalize called
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_buffer( 0x620c50, 0x7fffffffd9b0, 1, 14 )
[cranberry1.americas.sgi.com:04409] pmix_bfrop_pack_int32 * 1
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 pmix:client sending finalize sync to server
[cranberry1.americas.sgi.com:04409] posting recv on tag 105
[cranberry1.americas.sgi.com:04409] QUEIENG MSG TO SERVER OF SIZE 8
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:send_handler SENDING TO PEER pmix-server:4400 tag 105 with NON-NULL msg
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler SENDING MSG
[cranberry1.americas.sgi.com:04409] ptl:base:send_handler MSG SENT
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 ptl:base:recv:handler called with peer pmix-server:4400
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate new recv msg
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler read hdr on socket 7
[cranberry1.americas.sgi.com:04409] RECVD MSG FOR TAG 105 SIZE 8
[cranberry1.americas.sgi.com:04409] ptl:base:recv:handler allocate data region of size 8
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 105 ON PEER SOCKET 7
[cranberry1.americas.sgi.com:04409] [base/ptl_base_sendrecv.c:495] post msg
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 message received 8 bytes for tag 105 on socket 7
[cranberry1.americas.sgi.com:04409] checking msg on tag 105 for tag 105
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 EXECUTE CALLBACK for tag 105
[cranberry1.americas.sgi.com:04409] pmix:client finwait_cbfunc received
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 CALLBACK COMPLETE
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 ptl:base:recv:handler called with peer pmix-server:42986
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate new recv msg
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler read hdr on socket 7
[cranberry0.americas.sgi.com:42995] RECVD MSG FOR TAG 105 SIZE 8
[cranberry0.americas.sgi.com:42995] ptl:base:recv:handler allocate data region of size 8
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 RECVD COMPLETE MESSAGE FROM SERVER OF 8 BYTES FOR TAG 105 ON PEER SOCKET 7
[cranberry0.americas.sgi.com:42995] [base/ptl_base_sendrecv.c:495] post msg
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 message received 8 bytes for tag 105 on socket 7
[cranberry1.americas.sgi.com:04409] slurm.pmix.475.13:1 pmix:client finalize sync received
[cranberry0.americas.sgi.com:42995] checking msg on tag 105 for tag 105
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 EXECUTE CALLBACK for tag 105
[cranberry0.americas.sgi.com:42995] pmix:client finwait_cbfunc received
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 CALLBACK COMPLETE
[cranberry1.americas.sgi.com:04409] listen_thread: shutdown
[cranberry0.americas.sgi.com:42995] slurm.pmix.475.13:0 pmix:client finalize sync received
[cranberry0.americas.sgi.com:42995] listen_thread: shutdown
Client ns slurm.pmix.475.13 rank 1:PMIx_Finalize successfully completed
Client ns slurm.pmix.475.13 rank 0:PMIx_Finalize successfully completed
[root@cranberry0 examples]#
Comment 1 jim stoffel 2018-01-24 09:58:02 MST
Updated the version number.
Comment 3 Artem Polyakov 2018-01-29 09:23:07 MST
I think this is the issue of the testing program. It is derived after <pmix>/examples/client.c which is buggy:
https://github.com/pmix/pmix/issues/648
Comment 5 Artem Polyakov 2018-01-29 14:07:50 MST
This issue was resolved in the most recent PMIx master:https://github.com/pmix/pmix/pull/650
Comment 6 Felip Moll 2018-01-30 04:35:03 MST
Hi Jim,

Since comment 5 from Artem states it is solved, is it ok for you to close the issue?

Thanks
Comment 7 jim stoffel 2018-01-30 06:25:46 MST
https://pmix.org/support/faq/rm-provided-information/ has a list of Keys that
are RM provided.  This page states that SLRUM supports many of them starting
with 16.05. I am using Slurm 17.11.  What test case should I try for testing that
keys PMIX_LOCAL_SIZE and PMIX_LOCAL_PEERS are set by the RM.   I am getting
return code of -46 when I try to _Get these values.   A pointer to a working test case would be best so I may determine what I am doing incorrectly.
Thanks,
Jim
Comment 8 Boris Karasev 2018-01-30 08:12:55 MST
Created attachment 6035 [details]
test PMIX_LOCAL_SIZE and PMIX_LOCAL_PEERS keys
Comment 9 Boris Karasev 2018-01-30 08:16:04 MST
Hi Jim,
Can you clarify please for which rank you are trying to get PMIX_LOCAL_SIZE and PMIX_LOCAL_PEERS?

Also, please see the simple test in attachment above.
Comment 10 jim stoffel 2018-01-30 10:20:48 MST
This can be closed now.   I have been able to get the KEYs as expected.
Comment 11 Artem Polyakov 2018-01-30 10:41:08 MST
Thank you, Boris!
Comment 12 Felip Moll 2018-01-31 02:34:44 MST
Closing bug as solution provided.