Welcome back to my series on maintenance operations with Nutanix! In Part 1, I reviewed some scenarios on using maintenance mode with Nutanix AHV, using both Maintenance Mode functions within Prism Element and the CLI.

As a quick recap from Part 1, you can use both the GUI functionality in AOS 6.x+ to place a host in maintenance mode or use the CLI commands for a variety of reasons, including but not limited to the following:

  • AOS or LCM operations (Maintenance mode is automated with Upgrade operations)
  • Node Hardware replacement (PSU, NIC, etc)
  • Physically moving hardware

In Part 2, we will touch on Maintenance Mode when the Nutanix cluster is vSphere-based and the different options there.

Maintenance Mode with vSphere

With vSphere, integrating Nutanix and VMware is extremely beneficial for maintenance mode operations such as AOS, LCM updates, or general host maintenance. Honestly, the biggest benefit of the integration is the automation with maintenance mode for activities like Hypervisor upgrades (say, upgrading the hosts from 7.0.3g to 7.0.3n, which I’ll walk through later in Part 3).

When we stop to look at maintenance mode operations with Nutanix + vSphere, there are a few different scenarios we can look at:

  1. LCM Patching operations
    1. AOS upgrade operations, where the CVM is entered and exited from maintenance mode through the upgrade process. However, the host is not impacted.2. Firmware operations, where the CVM and the hosts are placed into maintenance mode, including host reboots.
  2. vSphere Hypervisor upgrade
  3. General host maintenance

Integrating Nutanix with vSphere

Unlike with AHV and how tightly integrated with Prism Element, we have an external management source with vSphere – the vCenter instance; we need to provide integration between Nutanix and vSphere so that Prism can speak with vCenter through APIs to enable automation activities to occur, without requiring administrative overhead.

To ensure we have integration between Prism and vCenter, once the Nutanix hosts are onboarded into a Nutanix cluster, we will want to register Prism Element to vCenter, which is done under Settings > vCenter Registration.

In my cluster, I’ve added the nodes to an existing cluster within vCenter, and now I can register Prism Element to the vCenter instance. This requires credentials that can initiate maintenance mode activities in vCenter, I generally create a local vCenter user (prismsvc)in the vsphere.local domain for vCenter, and either add that to the Administrator group or a custom role.

Pro Tip: You can also script this using the sample command below:

ms register ip-address=10.100.15.10 admin-username=administrator@vsphere.local admin-password=xyz port=443

vSphere Licensing and Maintenance Mode

DRS – you have it, right? To provide the best level of automation between Nutanix and vCenter, ideally, the hosts are licensed with Enterprise+ on the host side so that we can capitalize on leveraging DRS to automate migrations of workloads (vMotion) and maintenance mode for the hosts.

If you don’t have DRS enabled at the cluster level, maybe it’s because you have workloads that you don’t want to be automated for workload placement, or it’s that you don’t have Enterprise+ licensing – maybe only vSphere Standard or even the Essentials/Essentials Plus licensing, you do lose a bit of the automation between Nutanix and vSphere. It requires you to engage in a bit more hand-holding when doing upgrades.

Leveraging DRS with Nutanix will give you fully automated operations for host maintenance, software, firmware, and hypervisor upgrades.

Native Maintenance Mode vs Nutanix-Integrated

With AHV, this makes sense since everything is managed from within Prism, but for those of us who’ve been around VMware long enough, you may ask yourself why would I use Prism to place a host into maintenance mode rather than the normal VMware Maintenance mode operation – great question!

You can still use the traditional VMware maintenance mode functions by right-clicking on the host and entering maintenance mode. Still, when working in a Nutanix environment, we also have to shut down the CVM running on that host manually. Remember, with the host maintenance operations in vCenter, it will migrate workloads to other hosts through vMotion if DRS is enabled (remember the earlier DRS licensing comment…), and the operations will stall if there are VMs pinned that cannot be migrated. Since the CVMs are pinned to the host they are running on and cannot be migrated, this requires admin intervention by shutting down the CVM running on that node using the proper cvm_shutdown, to ensure that the Nutanix cluster remains stable.

Using the Nutanix-integrated maintenance mode, Prism will initiate the maintenance mode operations automatically on the host, migrating workloads AND placing the CVM offline in a stable manner, ensuring the Nutanix cluster remains stable.

Environment Validation

Just like in Part 1, this article will focus on features available in AOS 6.5+, which, at the time of this writing (1/27/24), is the latest LTS release from Nutanix. For vSphere, on the host side, we will start with 7.0.3g and eventually upgrade the hosts to 7.0.3n, the latest release for ESXI – stay tuned for Part 3. A pre-existing vCenter 8 instance, which lives outside the 4-node Nutanix cluster, will be used.

Node Maintenance with vSphere

Part 2 in this series will cover Node maintenance with vSphere. Much like AHV, when running a Nutanix cluster based on vSphere, we have a few ways to get an ESXi host into maintenance mode.

I’m not covering placing the CVM into maintenance mode in this part, as it’s the same process for the CVM between AHV and ESXi environments, using the ncli host edit commands for the CVM we want to target.

So for Part 2, let’s focus on the following activities:

  1. Placing a single host into maintenance mode.
    1. Using Automated mechanisms2. Manually
  2. Performing a rolling reboot for hosts.

Friendly Reminder

Note: You must exit the node from maintenance mode using the same method that you have used to put the node into maintenance mode. For example, if you use the CLI to put the node into maintenance mode, you must use the CLI to exit the node from maintenance mode. Similarly, if you use the web console to put the node into maintenance mode, you must use the web console to exit the node from maintenance mode.

What does this mean? Later in this post, we’ll use vCenter to place the host into maintenance mode, so follow instructions and be cool. Use the same workflow to return the host out of maintenance mode!

Single Host Maintenance Mode

Fully Automated Maintenance Mode

Getting a single host into maintenance mode is very easy when doing it via Prism Element, and you have the proper vSphere licensing for the hosts.

In this example, I will place Host 3 into maintenance mode from Prism, which will again migrate workloads, properly shut down the CVM, and finally place the host into maintenance mode.

Now, remember previously, we integrated Prism Element with vCenter, but when I click Enter Maintenance Mode in Prism, I’m prompted to enter the vCenter credentials again. Why is that? Great question, and one that I’ve yet to get a good answer for – if we’ve already integrated Prism with vCenter, why do I have to do this again…

After entering the vCenter information again, we can see in vCenter that DRS operations are kicking off, and ultimately, the CVM is shut down, and the ESXi hosts enter maintenance mode.

We can also validate the state of the CVM using the command ncli host ls, and we can see that the CVM for Host 3 is in maintenance mode.

Removing the host from maintenance mode, we can also use Prism, and after clicking on the host, we use the Exit Maintenance Mode link, and again entering our vCenter information, we can see the host come out of maintenance mode, the CVM powers on, and the CVM exits maintenance mode.

We can also see that via NCLI, the CVM has exited maintenance mode and will start participating in the Nutanix cluster again after all services are restored.

CVM Maintenance

If you need to place a CVM into maintenance mode (but not reboot it), you can use the ncli commands from Part 1, which will not impact the host or the workloads running on it. Pretty handy!

Maintenance Mode without DRS – but from Prism

Now that we’ve evaluated the fully automated method for entering a single host into maintenance mode let’s look at another scenario where DRS is disabled. This might be because DRS cannot be enabled in a cluster due to either licensing or workload requirements.

Rather than changing licensing, I will disable DRS and enter hosts into maintenance mode.

Now, suppose I use Prism to do this. In that case, Prism will initiate the communication with vCenter to enter maintenance mode, the CVM will still shut down correctly, and cluster stability will be maintained. However, none of the workloads will be migrated automatically with vMotion, requiring admin intervention.

Maintenance Mode using vCenter

You can still use vCenter to place a host into maintenance mode (with DRS or without); however, using this method, the admin will need to intervene with the CVM to shut it down properly and keep the cluster stable for the host to enter maintenance mode. NoteIf DRS is not enabled, the admin will also need to migrate the other workloads manually to alternate hosts

Without properly shutting down the CVM, the host maintenance mode operation will stall at 17% and eventually timeout since the CVM is still operational.

So let’s fix that…

  1. SSH into the CVM on the host being entered into maintenance, and enter the command cvm_shutdown -P. This command will initiate the CVM to let the other Cluster CVM’s know it’s going offline and redirect IO from the hypervisor to alternate CVM’s in the cluster. This is the same operation during AOS, LCM upgrades, or even the fully automated maintenance mode actions.
  2. Eventually, the CVM will be shut down, and the ESXi host will enter maintenance mode.

Now, the return trip…

When we remove the host from maintenance mode via vCenter, this will not power back up the CVM automatically, requiring it to be powered on manually. Once the CVM is back online, we can check the status with cluster statusto ensure it’s participating in the cluster again, and also within Prism to ensure data resiliency is showing OK. One thing to note when shutting down the CVM manually is patience! It will initially show a Maintenance status, but will eventually return to full operation!

Automated vs Manual

Hopefully, you can see the benefits of the automated workflow for host maintenance mode versus manual. There’s a time and a place for both, especially being licensing dependent. When available, definitely leverage the integration between Prism and vCenter for a simple and automated host maintenance process

Rolling Host Reboot

I will touch on this item briefly, as this isn’t specific to ESXi but a feature of Prism and, therefore, also pertains to AHV environments.

In Prism, you will find the Reboot menu item under Settings. This allows you to systematically reboot each host in an automated fashion, and keep the underlying cluster stable. As we’ve touched on before, if you don’t have DRS enabled, you will need to manually migrate workloads from each host as it’s being placed into maintenance mode and rebooted, so yet another win for Enterprise+ licensing and DRS!

In my cluster, I’ve elected to reboot all 4 hosts.

After selecting reboot, we’re presented yet again with the vCenter information window (AHV won’t show this of course), and after entering our vCenter information, the host reboot process will begin.

If you’ve followed along with this post, you can decipher what actions will be taken as part of this reboot process, especially when fully automated. This process will go host by host, evacuating the workloads, shutting down the CVM, and ultimately rebooting the hosts. Super easy and fully automated!

Hey, remember that time I brought up no DRS? If you don’t have DRS enabled on the cluster (either on purpose or due to licensing), this fully automated reboot fun isn’t for you…

Wrap-up

Much like with Part 1 covering AHV, leveraging a vSphere-based Nutanix cluster provides great integration between Prism and vCenter. While there are some nuances to the full functionality with licensing and DRS, the maintenance operations give flexibility of using Prism, CLI or vCenter to achieve the ultimate outcomes!

Thanks for reading, and please let me know if you have any thoughts or feedback on these articles. There is always more than one way to achieve some of these activities!

Mastering Maintenance Mode Operations in Nutanix: A Guide for AHV and ESXi: Part 2
Tagged on:                 

Leave a Reply

Your email address will not be published. Required fields are marked *