Prism Central Backup Best Practices and Gotchas

Overview

Introduction

Prism Central is the command and control plane for your Nutanix infrastructure, managing multiple clusters, providing unified visibility, and orchestrating critical operations across your environment. Given its central role, protecting Prism Central is not just important—it's essential. A failure or data loss event affecting PC can impact your ability to manage your entire infrastructure.

Here's the thing: I've lost count of how many times I've had conversations with customers who insist on backing up their Prism Central VM with their existing backup solution—Veeam, Cohesity, Rubrik, you name it. "We back up everything with our backup tool," they say, "and Prism Central is just another VM, right?" Wrong. And I get it—it's a perfectly natural impulse. You've invested in enterprise backup software, your operations teams are trained on it, and you want consistency across your environment.

But here's the reality check: Prism Central is not "just another VM." It's more like vCenter, Cisco Firepower Management Center, or any number of appliance-based management VMs that have their own specialized backup mechanisms built in. You wouldn't try to back up vCenter with a third-party tool and expect a clean restore (at least, not anymore after learning that lesson the hard way). The same principle applies here.

In this post, I'll explain why you need to trust Prism Central's native backup capabilities, walk you through the supported methods, and highlight the gotchas that can leave you with a corrupted management plane if you go rogue. Consider this your friendly intervention before you make a mistake that'll have you opening a support case at 2 AM.

Let's dive into each of the processes a bit deeper...

The Golden Rule: Use Native Backup Methods Only

Critical Warning: Backup and restore of Prism Central using third-party backup software (such as HYCU, Veeam, Cohesity, or Rubrik), Protection Domains, or VM snapshot-based approaches is not supported and may lead to an inconsistent Prism Central instance post-recovery.

This is perhaps the most important gotcha to understand. While it may be tempting to include your Prism Central VM in your existing backup infrastructure alongside other workloads, doing so can result in a corrupted management plane that may appear functional but contain inconsistent data or configurations.

Why the restriction? Prism Central manages a complex database of configuration, performance metrics, and state information across multiple clusters. The Insights Data Fabric (IDF) requires consistent point-in-time snapshots coordinated across multiple components. Third-party backup tools and Protection Domains don't have the necessary integration to ensure this consistency.

Supported Backup Methods

Nutanix provides two native backup solutions for Prism Central, both designed to ensure data consistency and reliable recovery:

1. Continuous Backup (Cluster-Based)

Continuous backup replicates your Prism Central configuration and data to up to three registered Nutanix clusters as backup targets.

Key Characteristics:

  • RPO: 30 minutes (some configurations can achieve as low as 200ms)
  • RTO: Approximately 2 hours
  • Replication Frequency: Every 30 minutes over TCP port 9440
  • Backup Targets: Up to three Nutanix clusters (AHV or ESXi)

Requirements:

  • Backup clusters must run AOS 6.0 or later
  • At least one backup cluster must run AOS 6.5.3.1 or later
  • Clusters must be registered with Prism Central
  • NTP must be configured to synchronize time between PC and registered clusters

How It Works: The Insights Data Fabric (IDF) from Prism Central synchronizes to designated Prism Element clusters every 30 minutes. When recovery is needed, Prism Central can be rebuilt on one of the backup clusters and re-seeded with data from the IDF backup.

2. Point-in-Time Backup (Object Storage-Based)

Point-in-time backup creates multiple off-site backups to object storage, allowing you to restore from backups up to one month old.

Key Characteristics:

  • RPO: Configurable—1, 2, 4, 6, 8, 12, or 24 hours (as of NCI 7.3)
  • RTO: Approximately 2 hours
  • Retention: Up to one month of recovery points

Supported Targets:

  • AWS S3
  • S3-compatible Nutanix Objects Storage (introduced in NCI 7.3)

Why This Matters: The addition of Nutanix Objects as a target is particularly valuable for air-gapped or "dark site" deployments where external cloud connectivity isn't permitted. You can now maintain enterprise-grade backup and restore capabilities entirely within your own infrastructure.

Best Practices and Recommendations

1. Implement a Layered Resilience Strategy

Don't rely on a single protection mechanism. Organizations should consider enabling both high availability and backup/restore capabilities for Prism Central. These are complementary resilience options that address different failure scenarios:

  • High Availability: Protects against VM or host failures within the same datacenter
  • Continuous Backup: Protects against cluster-level failures or site disasters
  • Point-in-Time Backup: Protects against logical corruption, ransomware, or the need to recover to a specific point in time

2. Choose the Right RPO for Your Needs

With the configurable RPO options introduced in NCI 7.3, you can balance recovery objectives against backup frequency and storage consumption. Consider:

  • 1-2 hour RPO: For mission-critical environments where minimal data loss is acceptable
  • 4-8 hour RPO: For most production environments balancing protection and resource usage
  • 12-24 hour RPO: For less critical environments or where regulatory compliance requires extended retention with less frequent updates

3. Test Your Recovery Process

A backup is only as good as your ability to restore from it. Periodically test your recovery procedures in a non-production environment to:

  • Validate backup integrity
  • Verify RTO meets business requirements
  • Train staff on recovery procedures
  • Identify any gaps or issues before a real disaster occurs

4. Monitor Backup Status and Alerts

Regularly verify that backups are completing successfully. Pay attention to:

  • Backup job completion status
  • Available recovery points
  • Storage capacity on backup targets
  • Connectivity between PC and backup destinations

5. Document Your Recovery Plan

Create a runbook that includes:

  • Step-by-step recovery procedures
  • Contact information for key personnel
  • Backup target locations and credentials
  • Decision criteria for choosing between recovery points
  • Post-recovery validation checklist

Migration Scenarios: Moving PC to a New Cluster

The backup and restore functionality also enables migration scenarios where you need to move Prism Central to a different cluster. This might be necessary for:

  • Hardware refresh cycles
  • Datacenter consolidation or relocation
  • Separating PC onto dedicated infrastructure
  • Disaster recovery site activation

Migration Process:

  1. Configure the target cluster as a backup destination
  2. Allow backup replication to complete
  3. Initiate recovery on the target cluster
  4. Validate the recovered PC instance
  5. Critical: Shut down or delete the original Prism Central VM to prevent conflicts

Important Gotcha: If the original cluster comes back online and you haven't shut down or deleted the original PC VM, you'll have two instances of Prism Central trying to manage the same infrastructure. This will cause conflicts, data inconsistency, and operational issues. Always ensure the old instance is properly decommissioned.

Common Gotchas and Unsupported Activities

1. Third-Party Backup Solutions

The Problem: Tools like Veeam, Cohesity, Rubrik, and HYCU are excellent for backing up workload VMs, but they're explicitly unsupported for Prism Central backups.

Why It Fails: These tools take VM-level snapshots that don't account for the distributed nature of Prism Central's data. The Insights Data Fabric spans multiple services and databases that must be captured in a consistent state. A VM snapshot alone cannot guarantee this consistency.

What Happens: You may successfully restore the VM, but the recovered Prism Central instance could have:

  • Inconsistent database states
  • Mismatched configuration data
  • Corruption in the IDF
  • Loss of historical performance data

The Solution: Use native PC backup methods exclusively. If you're using Veeam or another solution for your workload VMs, that's fine—just exclude Prism Central from those backup jobs.

2. Protection Domains

The Problem: Protection Domains are designed to replicate application VMs and their data between Nutanix clusters. However, they are not supported for Prism Central protection.

Why It Fails: Similar to third-party backup tools, Protection Domains don't provide the specialized logic needed to ensure consistency across Prism Central's distributed data architecture.

The Solution: Use continuous backup to other clusters instead, which provides similar functionality but with PC-aware consistency mechanisms.

3. VM Snapshots

The Problem: Taking manual VM snapshots of Prism Central through Prism Element is not a supported backup method.

Why It Fails: VM snapshots capture only the virtual disks at a point in time. They don't coordinate with the IDF synchronization process or ensure that all in-flight transactions are properly captured.

The Solution: Rely on the native backup methods which handle snapshot coordination internally as part of the backup process.

4. Cluster Unavailability Impact

The Gotcha: If the cluster hosting Prism Central becomes unavailable, PC itself becomes unavailable until recovery completes, even though you have backups on other clusters.

The Reality: This is a fundamental limitation of the architecture. Prism Central must be running on a functional cluster to operate. The 2-hour RTO for recovery reflects the time needed to deploy a new PC instance on a backup cluster and restore the data.

Mitigation: This is where implementing PC high availability alongside backup becomes valuable. HA protects against VM or host failures without requiring a full restore, significantly reducing downtime for common failure scenarios.

New Features in Recent Releases

Nutanix continues to enhance Prism Central resilience capabilities. Recent improvements include:

NCI 7.3 Enhancements

  • Nutanix Objects Support: Point-in-time backups can now target S3-compatible Nutanix Objects, enabling fully on-premises backup solutions
  • Configurable RPO: Choice of 1, 2, 4, 6, 8, 12, or 24-hour RPO for point-in-time backups (previously fixed at 2 hours)
  • Enhanced Ransomware Protection: Improved protection against ransomware, data loss, and availability-zone failures

Conclusion

Protecting Prism Central requires a different approach than typical VM backup strategies. The key takeaways:

  1. Use only native backup methods—continuous backup, point-in-time backup, or both
  2. Never use third-party backup tools, Protection Domains, or VM snapshots for PC backup
  3. Implement layered resilience with both HA and backup capabilities
  4. Choose appropriate RPO/RTO based on your business requirements
  5. Test recovery procedures regularly to ensure they work when needed
  6. Remember to decommission the old PC instance when migrating or recovering

While the restrictions on backup methods may seem limiting, they exist to ensure that your recovered Prism Central instance is consistent, functional, and reliable. Following these best practices and avoiding the documented gotchas will ensure that your Nutanix infrastructure management plane remains protected and recoverable.

Additional Resources