Ticket 4494

Summary: gcc clz/ctz builtins not working as expected on big endian arches
Product: Slurm Reporter: Philip Kovacs <pkdevel>
Component: ContributionsAssignee: Felip Moll <felip.moll>
Status: RESOLVED FIXED QA Contact:
Severity: 2 - High Impact    
Priority: --- CC: brian, felip.moll, tim
Version: 17.11.0   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=4469
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 17.11.1 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm_gcc_builtins

Description Philip Kovacs 2017-12-08 19:02:34 MST
Created attachment 5708 [details]
slurm_gcc_builtins

For 17.11, the unit test `bitstring-test` fails on big endian certain architecture, e.g. ppc64 and s390x. 

I note the new use of certain gcc builtins in src/common/bitstring.c

__builtin_clzll : count leading zeroes
__builtin_ctzll : count trailing zeroes
__builtin_popcountll : count one bits.

The first two builtins clz/ctz are not working properly on big endian arches and I don;t know why.  It could be a gcc bug (i am using 7.2.1), an emulation problem when using kvm/qemu, or some other problem related to the way in which they are used in this codebase.

The workaround for me is to modify the configure.ac and avoid running the clz and ctz checks when the arch is big endian.   This causes the alternate code to be used and clears the problem for me.

I've attached that patch.

You might want to investigate further.

Phil
Comment 2 Felip Moll 2017-12-11 12:09:26 MST
(In reply to Philip Kovacs from comment #0)
> Created attachment 5708 [details]
> slurm_gcc_builtins
> 
> For 17.11, the unit test `bitstring-test` fails on big endian certain
> architecture, e.g. ppc64 and s390x. 
> 
> I note the new use of certain gcc builtins in src/common/bitstring.c
> 
> __builtin_clzll : count leading zeroes
> __builtin_ctzll : count trailing zeroes
> __builtin_popcountll : count one bits.
> 
> The first two builtins clz/ctz are not working properly on big endian arches
> and I don;t know why.  It could be a gcc bug (i am using 7.2.1), an
> emulation problem when using kvm/qemu, or some other problem related to the
> way in which they are used in this codebase.
> 
> The workaround for me is to modify the configure.ac and avoid running the
> clz and ctz checks when the arch is big endian.   This causes the alternate
> code to be used and clears the problem for me.
> 
> I've attached that patch.
> 
> You might want to investigate further.
> 
> Phil

Hi Phil,

In order to see if it is system issue vs slurm issue;

Could you please compile and run this code in your big endian machines?:
------------8<-------------
#include <stdio.h>
#include <stdlib.h>

int main()
{
    int num = 16;
    int clz = 0;
    int ctz = 0; 
    int ones = 0;

    clz = __builtin_clz(num);
    printf("Number of leading zero's in %d is %d\n", num, clz);  

    clz = __builtin_clz(-num);
    printf("Number of leading zero's in %d is %d\n", -num, clz);  

    ctz = __builtin_ctz(num);  
    printf("Number of trailing zero's in %d is %d\n", num, ctz);

    ones = __builtin_popcount(num);  
    printf("Number of one's in %d is %d\n", num, ones);    

    return 0;
}
-------------->8----------

My output is:

Number of leading zero's in 16 is 27
Number of leading zero's in -16 is 0
Number of trailing zero's in 16 is 4
Number of one's in 16 is 1
Comment 3 Philip Kovacs 2017-12-11 13:41:16 MST
Same output on little/big endian with num = 16.   With num = 0, however:

little endian (num = 0)
-----------------------
Number of leading zero's in 0 is 31
Number of leading zero's in 0 is 31
Number of trailing zero's in 0 is 32
Number of one's in 0 is 0

big endian (num = 0)
--------------------
Number of leading zero's in 0 is 32
Number of leading zero's in 0 is 32
Number of trailing zero's in 0 is -1
Number of one's in 0 is 0

The gcc docs says undefined result for ctz/ctl if input is zero.  Is there any chance at all that zero might be used as input?
Comment 4 Philip Kovacs 2017-12-12 10:11:26 MST
I think at least part of the problem is the way slurm is trying to completely reverse the bits of the words within bitstrings on big endian:

/* mask for the bit within its word */
#ifdef SLURM_BIGENDIAN
#define _bit_mask(bit) ((bitstr_t)1 << (BITSTR_MAXPOS - ((bit)&BITSTR_MAXPOS)))
#else
#define _bit_mask(bit) ((bitstr_t)1 << ((bit)&BITSTR_MAXPOS))
#endif

I wrote this small program:

#include <stdio.h>
#include <stdint.h>

int main ()
{
        int i;
        union _u
        {
                uint64_t a;
                uint8_t b[8];
        } u;

        u.a = 9879879876;
        printf("bytes(%lu):\n", u.a);
        for (i=0; i<8; i++) {
                printf("b[%u] = %u\n", i, u.b[i]);
        }
        printf("ctzll(%lu) = %d\n", u.a, __builtin_ctzll(u.a));
        printf("clzll(%lu) = %d\n", u.a, __builtin_clzll(u.a));
        return 0;
}

The output on x86_64 (little endian) is:

bytes(9879879876):
u.b[0] = 196
u.b[1] = 0
u.b[2] = 227
u.b[3] = 76
u.b[4] = 2
u.b[5] = 0
u.b[6] = 0
u.b[7] = 0
ctzll(9879879876) = 2
clzll(9879879876) = 30

and the output on ppc64 (big endian) is:

bytes(9879879876):
b[0] = 0
b[1] = 0
b[2] = 0
b[3] = 2
b[4] = 76
b[5] = 227
b[6] = 0
b[7] = 196
ctzll(9879879876) = 2
clzll(9879879876) = 30

Notice how the ctz/clz apis are agnostic to the endianness of the machine.  Also notice that only the byte order changes, there is not a complete reversal of all bits on the word.
Comment 5 Felip Moll 2017-12-12 13:04:39 MST
(In reply to Philip Kovacs from comment #4)
> I think at least part of the problem is the way slurm is trying to
> completely reverse the bits of the words within bitstrings on big endian:
> 
> /* mask for the bit within its word */
> #ifdef SLURM_BIGENDIAN
> #define _bit_mask(bit) ((bitstr_t)1 << (BITSTR_MAXPOS -
> ((bit)&BITSTR_MAXPOS)))
> #else
> #define _bit_mask(bit) ((bitstr_t)1 << ((bit)&BITSTR_MAXPOS))
> #endif
> 
> I wrote this small program:
> 
> #include <stdio.h>
> #include <stdint.h>
> 
> int main ()
> {
>         int i;
>         union _u
>         {
>                 uint64_t a;
>                 uint8_t b[8];
>         } u;
> 
>         u.a = 9879879876;
>         printf("bytes(%lu):\n", u.a);
>         for (i=0; i<8; i++) {
>                 printf("b[%u] = %u\n", i, u.b[i]);
>         }
>         printf("ctzll(%lu) = %d\n", u.a, __builtin_ctzll(u.a));
>         printf("clzll(%lu) = %d\n", u.a, __builtin_clzll(u.a));
>         return 0;
> }
> 
> The output on x86_64 (little endian) is:
> 
> bytes(9879879876):
> u.b[0] = 196
> u.b[1] = 0
> u.b[2] = 227
> u.b[3] = 76
> u.b[4] = 2
> u.b[5] = 0
> u.b[6] = 0
> u.b[7] = 0
> ctzll(9879879876) = 2
> clzll(9879879876) = 30
> 
> and the output on ppc64 (big endian) is:
> 
> bytes(9879879876):
> b[0] = 0
> b[1] = 0
> b[2] = 0
> b[3] = 2
> b[4] = 76
> b[5] = 227
> b[6] = 0
> b[7] = 196
> ctzll(9879879876) = 2
> clzll(9879879876) = 30
> 
> Notice how the ctz/clz apis are agnostic to the endianness of the machine. 
> Also notice that only the byte order changes, there is not a complete
> reversal of all bits on the word.

Yes, I had just arrived at the same conclusion some hours ago.

I am working on it, thanks for looking into that.
Comment 7 Felip Moll 2017-12-14 03:26:14 MST
Philip, we have fully identified the issues and are working on that internally.

Just a question, have you been able to run Slurm 17.11 successfully on a ppc64 or s390x machine?

Due to the bitstring problem it is possible that you receive errors on execution, so by now I don't recommend to do so.

I will keep you informed of the progress.
Comment 8 Philip Kovacs 2017-12-14 12:07:51 MST
> Just a question, have you been able to run Slurm 17.11 successfully on a 
> ppc64 or s390x machine?

I don't personally perform operational tests for every architecture I package, no.  I don't have the time or resources.  When your unit tests alert me to problems, I try to look into them and file bugs here.  If there is basic testing beyond the unit tests you want me to check on ppc64 after you're done here,  I can try to help you verify correctness.

Also note that there are many free resources to POWER architectures you might want to try:

https://developer.ibm.com/linuxonpower/cloud-resources/
Comment 15 Felip Moll 2017-12-16 12:28:39 MST
Philip,

We have commited a fix for this issue in 5a23c48adde45862ce65ce5bcd19b57b5fac1298. It turned out to be a wider problem than we though at first time that finally made Slurm not to work on big-endian.

This fix will be in 17.11.1 and later.

Thanks for reporting.
Comment 16 Philip Kovacs 2017-12-16 15:30:11 MST
Looks good now on ppc64, thanks.

./bitstring-test
	NOTE: Testing static decl
	PASSED: bit 9 set
	PASSED: bit 12 not set
	PASSED: bit 14 set
	NOTE: Testing basic vixie functions
	PASSED: bit 9 set
	PASSED: bit 12 not set
	PASSED: bit 14 set
	PASSED: first bit set = 9 
	PASSED: last bit set = 14
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: ffs
	PASSED: ffc
	PASSED: ffc
	NOTE: Testing and/or/not
	PASSED: and
	PASSED: and
	PASSED: or
	PASSED: or
	PASSED: or
	PASSED: or
	PASSED: not
	PASSED: not
	NOTE: testing bit selection
	PASSED: pick
	PASSED: pick
	PASSED: pick
	NOTE: Testing realloc
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	NOTE: Testing bit_fmt
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	NOTE: Testing bit_nffc/bit_nffs
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	PASSED: bitstring
	NOTE: Testing bit_unfmt
	PASSED: bitstring
	PASSED: bitstring

Totals:
	#passed:		55
	#real failed:		0