0 votes
132 views
in Cloud by
While attempting to update Proxmox from version 7.3 to 7.4, I encountered an issue where the monitor unexpectedly went into an unknown status. Despite trying various methods to restart it, none of them worked. To resolve the problem, I decided to update all nodes, and this action brought the monitor back online.

However, during this process, one of the nodes suddenly went down. Fortunately, since the VMs have High Availability (HA) enabled, we were able to minimize the downtime. I have attached the syslog for reference. Could the problem have been caused by a network surge that occurred during the update? Any advice on this matter would be appreciated.

1 Answer

0 votes
by

Based on the information provided, it is difficult to determine the exact cause of the issue without further analysis and examination of the syslog. Several factors could contribute to the monitor going into an unknown status and the subsequent node going down during the update process.

Possible reasons for the monitor issue and node failure could include:

  1. Software or Configuration Errors: There might have been errors during the update process, leading to issues with the monitor and node stability.

  2. Hardware Failure: A hardware problem on the affected node could be responsible for the unexpected shutdown.

  3. Network Surge or Connectivity Issues: A sudden network surge or connectivity problem during the update could have disrupted the communication between nodes.

  4. Resource Exhaustion: The update process could have caused resource exhaustion on the node, leading to instability.

To pinpoint the exact cause, you should carefully review the syslog for any error messages, warnings, or critical events that occurred during the update process. If you're unsure about how to interpret the syslog, it might be helpful to seek assistance from experienced system administrators or the Proxmox support team.

In the future, before performing major updates, it is recommended to take proper backups, ensure system compatibility, and test updates in a controlled environment to minimize the risk of disruptions on production systems.

...