Ticket 178 - HTC Broken
Summary: HTC Broken
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Bluegene select plugin (show other tickets)
Version: 2.4.x
Hardware: IBM BlueGene Linux
: 2 - High Impact
Assignee: Danny Auble
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2012-11-27 10:10 MST by Don Lipari
Modified: 2012-11-27 10:52 MST (History)
0 users

See Also:
Site: LLNL
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Don Lipari 2012-11-27 10:10:37 MST
I'm not sure whether this applies to BG/Q, but it certainly applies to our BG/P machines.  When rzdawndev was updated to v2.4, --conn-type=HTC_? no longer worked.  The block that was provided indicated "ConnType=Small" no matter which of the HTC_ options were requested.

I have located what I believe is the cause of the problem in bg_job_place.c, lines 2026-2032:

	if (jobinfo->conn_type[0] != SELECT_NAV) {
		for (dim=0; dim<SYSTEM_DIMENSIONS;
		     dim++)
			jobinfo->conn_type[dim] =
				bg_record->conn_type[
					dim];
	}

Prior to this section jobinfo->conn_type[0] == 4 (SELECT_HTC_S).  After it, it gets overwritten to 3 (SELECT_SMALL).

If I create a build with the above lines commented out, the problem goes away!  But that can't be the right solution.  Your thoughts?
Comment 1 Danny Auble 2012-11-27 10:52:12 MST
This only applies to BGP.  You fix was in the right area.  A safer patch has been added to 2.4 (https://github.com/SchedMD/slurm/commit/27e7b048baefe02d1a72eba03faab1d2e25a43a9).  Thanks for reporting.