| Summary: | Slurm-gcp v5 errors when enable_reconfigure is enabled | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Simon Gao <simon.gao> |
| Component: | GCP | Assignee: | Jacob Jenson <jacob> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | 6 - No support contract | ||
| Priority: | --- | ||
| Version: | 22.05.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
Terraform error log
terraform var-file |
||
Created attachment 26319 [details]
terraform var-file
|
Created attachment 26318 [details] Terraform error log When set "enable_reconfigure" to true, deploying cloud example full cluster on GCP experienced following errors and failed (For detailes error information see attached files) Error: local-exec provisioner error │ │ with module.slurm_cluster.module.slurm_controller_instance[0].module.reconfigure_notify[0].null_resource.notify_cluster, │ on ../../../../../slurm_cluster/modules/slurm_notify_cluster/main.tf line 51, in resource "null_resource" "notify_cluster": │ 51: provisioner "local-exec" { │ │ Error running command '/home/luser/Documents/git/test/slurm-gcp/scripts/notify_cluster.py --type='reconfig' 'g2-slurm-events-WrPqCf9D'': exit status 1. Output: │ Traceback (most recent call last): │ File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 50, in error_remapped_callable │ return callable_(*args, **kwargs) │ File "/usr/local/lib64/python3.9/site-packages/grpc/_channel.py", line 946, in __call__ │ return _end_unary_response_blocking(state, call, False, None) │ File "/usr/local/lib64/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking │ raise _InactiveRpcError(state) │ grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: │ status = StatusCode.NOT_FOUND │ details = "Resource not found (resource=g2-slurm-events-WrPqCf9D)." │ debug_error_string = "{"created":"@1660343082.509674303","description":"Error received from peer │ ipv4:142.250.217.74:443","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Resource not found │ (resource=g2-slurm-events-WrPqCf9D).","grpc_status":5}" │ > When set "enable_reconfigure" to false, the same terraform command completed successfully. The terraformuser has all the suggested permissions.