I recently had the opportunity to deploy 12 Nutanix nodes for a customer across 2 sites (Primary and DR), 6 of which were 3055-G5 nodes with dual NVIDIA M60 GPU cards installed and dedicated to running the Horizon View desktop VMs for this customer. This was my first experience doing a Nutanix deployment using the NVIDIA GPU cards with VMware, and thankfully there is plenty of documentation out there on the process.

The Nutanix deployment with GPU cards installed is no different than without, you still go thru the process of imaging the nodes with Foundation just like you’d do without GPU cards. In this case, each site was configured with 2 Nutanix clusters, one for Server VMs and a second cluster specific to VDI. The VDI cluster was configured in a 3 node cluster, using the NX-3055-G5 nodes, running Horizon View 7.2.0 specifically.

I’ll touch on some details of the M60 card below, and then get into some of the places where I had a few issues with the deployment and how I fixed them, and finally some Host/VM configuration and validation commands.

NVIDIA M60 Card and Requirements

The installed M60 cards from NVIDIA are workhorses, and the 3055-G5 nodes were installed with 2 cards each, and we used the M60-1B profile to maximize density but still provide the application performance required.

M60 GPU Profiles

As you can see the M60 card provides many different profiles that can be used based on the type of end user, and this can vary from VM to VM and Desktop Pool to Desktop Pool.

There was a few items in regards to deployment with the M60 card (and M10) for that matter that differ from the older K1 cards from NVIDIA.

First is that the M60 card also requires that users connecting to VMs with vGPU capabilities obtain a license. Accompanying the M60 card in an installation is the NVIDIA License Server, which can be run on Windows or Linux and is a low usage VM that runs the NVIDIA licensing component. Within the VDI VMs the NVIDIA Control Panel software is configured to look at the License Server (in my case, a backup License Server was also installed to provide a Highly-Available solution for licensing). Downloading the License Server software and getting it installed and configured on a VM is pretty easy, nothing unexpected there.

The second deployment item is that the M60 by default comes in 2 different operating modes; compute mode and graphics mode. The different modes are provided for different configuration options, and the mode of the GPU is established at Power-on. Graphics mode is what’s typically used in GPU scenarios where graphics are considered a high requirement, as opposed to using the compute mode in certain HPC scenarios. Per NVIDIA, Graphics Mode should be used in the following scenarios:

  • GPU passthrough with hypervisors that do not support large BARs. At the time of publication, this includes Citrix XenServer 6.2, 6.5, VMware ESXi 5.1, 5.5, 6.0, Red Hat Enterprise Linux 7.0, 7.1.
  • GPU passthrough to Windows VMs on Xen and KVM hypervisors.
  • GRID Virtual GPU deployments.
  • VMware vSGA deployments.

In order to change the operating mode of the M60 card, NVIDIA provides a utility called gpumodeswitch that will update the M60 card from Compute to Graphics mode. The utility can be installed on an ESXi host directly using the NVIDIA provided .vib file, making the change pretty easy to do so.

Modifying the Operating Mode

One of the nice things during the Nutanix imaging process is that it detects the model of nodes that are being imaged, and installs all of the components, including VMware ESXi and associated .vib files, which also include the NVIDIA_Host_Driver and GPUModeSwitch_Drver .vibs, GREAT!

When I went to run the gpumodeswitch utility, I got the below results – odd that I know the M60 cards are in there but the commands not showing the M60.

GPUMODESWITCH Utility

Enter NVIDIA Support’s KB entry… The reason behind this is that while Nutanix is awesome at taking care of installing the associated .vib files for us, you cannot run the gpumodeswitch utility because the NVIDIA Driver is loaded with the OS, so the utility cannot modify the GPU mode since the OS has taken control of the card.

Since Nutanix took care of the imaging for us, the VMware NVIDIA Host Driver was installed and was the reason why the GPU Mode couldn’t be changed. So to fix this, we now have to:

  • Remove the NVIDIA Host Driver
  • Reboot
  • Run the gpumodeswitch utility to change the operating mode
  • Reboot
  • Reinstall the NVIDIA Host Driver
  • Reboot

I’ve included the commands I ran thru SSH on each of the VMware hosts to be able to get the utility to run and change the operating mode.

vim-cmd hostsvc/maintenance_mode_enteresxcli software vib remove -n NVIDIA-VMware_ESXi_6.5_Host_Driveresxcli software vib remove -n NVIDIA-VMware_ESXi_6.0_GpuModeSwitch_Driverrebootesxcli software vib install -v /tmp/NVIDIA-GpuModeSwitch-1OEM.600.0.0.2494585.x86_64.vib --no-sig-checkgpumodeswitch --gpumode graphics --auto rebootesxcli software vib install -v /tmp/NVIDIA-VMware_ESXi_6.5_Host_Driver_384.73-1OEM.650.0.0.4598673.vibrebootvim-cmd hostsvc/maintenance_mode_exit

vSphere Host Modification

Now that we’ve got the host all prepared and ready to be used (Host Driver reinstalled, Correct GPU mode), there was still a few steps that we needed to take on the host itself to ensure that the M60 cards can be fully utilized.

To be able to take advantage of our M60 GPU, we need to modify the graphics configuration on each host from Shared to Shared Direct, and either reboot the host or restart the xorg service by issuing the command /etc/init.d/xorg restart.

Host Graphics Config

VM Configuration for GPU

Now that we have our GPU cards in the proper mode, our hosts configured to allow GPU passthru, the final step was to prepare the master images to have GPU capabilities. These steps are pretty simple, however there was 2 items that caught me off guard with the VM configuration requirements.

To provide vGPU capabilities to our VMs, it’s as simple as adding a Shared PCI device to the VM, and then selecting the NVIDIA Grid vGPU and profile we want to use.

VM PCI Options

The first item that caught me off guard was the fact that to enable the GPU to support NVIDIA Grid vGPU, you must reserve 100% of the memory for the VM. Simple to do, yet I just wasn’t expecting this requirement.

The second item that caught me off guard was the fact that once the VM is powered on and using the NVIDIA vGPU capabilities, one cannot connect to the VM Console thru vCenter anymore, as a black screen is shown. The workaround for this is to also install the Horizon View Direct Connect Agent to the VMs who will be using the vGPU capabilities – think your Master Image here!

Validating our GPU Usage

One of the benefits with the M60 card is being able to get the utilization of the card (memory and processes) via CLI, but the biggest benefit I see from this capability as well is being able to see the individual VMs using the GPU resources, shown below thru the command Nvidia-smi.

NVIDIA-SMI Results

Nutanix has a great table of troubleshooting commands when dealing with vGPU installations.

2018-01-03_13-43-14.png
GPU Troubleshooting GPU Commands

Lessons Learned

To highlight back on the above, a few lessons learned with deploying Nutanix nodes with VMware and the NVIDIA Grid Cards.

Thanks for reading!  As this was my first time running thru a NVIDIA M60 installation with Nutanix, I might have not done this the most efficient way, so feel free to drop me a note if you’ve got any feedback or questions!

Nutanix deployment with NVIDIA M60 GPU
Tagged on:             

Leave a Reply

Your email address will not be published. Required fields are marked *