The only indication that a Cloud node is being problematic is a single nondescript log message "debug3: problems with <NodeName>" besides finding this log message the only way to discover this problem is to notice jobs are not being scheduled on the problem node.
We got to this state because our ResumeProgram program took too long to start up the cloud node, but did not immediately notice as the node did not appear at all in sinfo, and scontrol reported it as "not found".
Additionally, what would be considered best practices for monitoring to detect this situation in the future?