Hello , We are trying to run a application built using MPICH 3.4.1 and encountering below issues when running on Azure cyclecloud Cyclecloud version- CycleCloud 8.2.2-1902 Loading mpi version 2021.2.0 **************************************************************************** * hwloc 2.0.3rc2-git received invalid information from the operating system. * * Group0 (cpuset 0x00ffffff,0xfc000000,,0x0) intersects with Package (P#1 cpuset 0x0fffffff,0xf0000000,0x0) without inclusion! * Error occurred in topology.c line 1386 * * The following FAQ entry in the hwloc documentation may help: * What should I do when hwloc reports "operating system" warnings? * Otherwise please report this error message to the hwloc user's mailing list, * along with the files generated by the hwloc-gather-topology script. * * hwloc will now ignore this invalid topology information and continue. thread_monitor Resource temporarily unavailable in pthread_create thread_monitor Resource temporarily unavailable in pthread_create thread_monitor Resource temporarily unavailable in pthread_create thread_monitor Resource temporarily unavailable in pthread_create thread_monitor Resource temporarily unavailable in pthread_create thread_monitor Resource temporarily unavailable in pthread_create *** Error in `./ginestra-core-sim.run': double free or corruption (!prev): 0x000055ff3e109e00 *** *** Error in `./ginestra-core-sim.run': double free or corruption (!prev): 0x000055ff3dfe3cd0 *** thread_monitor Resource temporarily unavailable in pthread_create thread_monitor Resource temporarily unavailable in pthread_create Please guide. Thank You Shraddha
Hello Please let me know if any more logs/ data needs to be shared to investigate on this further. Thank You Shraddha
Hi Shraddha, SchedMD must put the cloud nodes/GPUs under AMAT's support agreement to assign these support tickets to the Slurm engineers. May I get the annual average total of instances + GPUs being consumed in AWS? -Jess Arrington