| Summary: | Bus error (core dumped) | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Alex Mamach <alex.mamach> |
| Component: | slurmd | Assignee: | Tim McMullan <mcmullan> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | bart |
| Version: | 20.02.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=10492 | ||
| Site: | Northwestern | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
syslog
dmesg slurmd.log |
||
|
Description
Alex Mamach
2020-12-21 13:21:20 MST
Alex - can you send us the slurmd.log, syslog, and the output of dmesg from that system? Created attachment 17239 [details]
syslog
Created attachment 17240 [details]
dmesg
Created attachment 17241 [details]
slurmd.log
I've uploaded the requested files. Fore reference, the job in question ran from 2020-12-20T14:25:34 to 2020-12-20T14:27:30 Hi Alex, I'm not seeing anything too telling in these logs as far as the bus error goes. It may be helpful to run the logs at least at the "debug" level to see a little more info about what is going on here from the slurm perspective. Where is this bus error showing up? Is this bus error showing up in the output of the users job? Bus errors are usually more about bad access than out of memory to allocate. Thanks! --Tim Hi Alex! I just wanted to check in and see if you were able to get some of the additional logs! Thanks! --Tim Hi Tim, The error was showing up in the users' job output, but we haven't been able to replicate it after asking them to allocate more memory to their jobs. If that changes I can open another ticket, but for now I think we're good to close this one. Thanks for your help! Thanks, Alex |