VCF 9 Power Management Deep Dive: DRS, vSphere DPM, and the Hidden Watt Savings

VCF 9 Power Management Deep Dive: DRS, vSphere DPM, and the Hidden Watt Savings

The Most Powerful Green Button You Have not Clicked

I have spent over 25 years in IT, and in that time I have seen a lot of features get shipped, celebrated at a conference keynote, and then quietly forgotten in production. vSphere Distributed Power Management — DPM — is one of the most egregious examples of a brilliant capability that collects dust in almost every enterprise I have ever worked with.

Here in Dubai, where data centre cooling costs are a very real operational concern and sustainability commitments are increasingly tied to national and corporate mandates, leaving DPM disabled is not just a missed technical opportunity. It is leaving watts — and dirhams — on the table, every single night.

This post is a practical deep dive into the full power management stack available in VCF 9: how DRS and DPM work together, where the real watt savings hide, and exactly how to configure it without waking up to a capacity crisis at 3 AM.

Understanding the Stack: DRS First, Then DPM

Before we talk about DPM, we need to talk about DRS (vSphere Distributed Resource Scheduler) — because DPM is not a standalone feature. It is an extension of DRS logic, and it cannot operate without DRS being fully enabled and in Fully Automated mode.

Think of it this way:

  • DRS is the brain that continuously evaluates CPU and memory demand across all hosts in a cluster and uses vMotion to rebalance VMs for optimal utilization.
  • DPM is the next logical step: once DRS has consolidated workloads onto fewer hosts, it asks the question — “Do we actually need all these hosts powered on right now?”

If the answer is no, DPM migrates the remaining VMs off the underutilised host using vMotion, and then powers it off via its out-of-band management interface — iDRAC, iLO, or IPMI. When demand rises again, DPM reverses the process: it powers the host back on, waits for it to rejoin vCenter, and rebalances.

In VCF 9, this entire workflow operates within each Workload Domain, which has its own vCenter instance. DPM is therefore cluster-scoped, which is exactly the right scope for this kind of automation.

The Numbers That Should Get Your Attention

Let me give you a real-world frame of reference. A typical modern server in an enterprise data centre — say a Dell PowerEdge R750 or an HPE ProLiant DL380 Gen10 — consumes anywhere between 300W and 500W at idle, and upwards of 700W–900W under load.

If your cluster has 8 hosts and overnight utilisation drops to a level where only 4 are genuinely needed, DPM can put 4 hosts into standby. At an average idle draw of 400W per host, over 8 hours that is:

4 hosts × 400W × 8 hours = 12,800 Wh = 12.8 kWh saved per night

Scale that across 365 nights, and you are looking at over 4,600 kWh per year — just from one cluster, just from enabling a feature that used to ship with VMware vSphere by default.

Now factor in your data centre’s PUE (Power Usage Effectiveness). In many regional data centres, a PUE of 1.5 or higher means that for every watt saved at the server, you are also saving approximately 0.5W in cooling overhead. The real savings are closer to 6,900 kWh per year from that single cluster.

This is what I mean by “hidden watts.”

How DPM Decides When to Act

DPM’s intelligence is governed by the DemandCapacityRatio — an internal calculation that determines whether the cluster has excess capacity beyond what its workloads actually need.

By default, DPM targets a utilisation band of 45% to 81%:

  • If CPU and memory utilisation drops below 45% across all powered-on hosts, DPM will consider powering one off.
  • If utilisation climbs above 81% on all hosts, DPM powers a standby host back on.

The evaluation windows are deliberately asymmetric — and this is important to understand:

  • Power-on decisions are evaluated over a 5-minute window — fast response to rising demand.
  • Power-off decisions are evaluated over a 40-minute window — slow and conservative, to avoid flapping.

This asymmetry is intentional. VMware designed DPM to be quick to respond to capacity needs but cautious about removing capacity. In practice, this means DPM will rarely cause a performance incident — it will, however, leave a host powered on longer than you might expect before shutting it down.

The key advanced parameters you can tune in vSphere DRS settings are:

ParameterDefaultWhat It Controls
DemandCapacityRatioTarget63%The utilisation target DPM aims for
DemandCapacityRatioToleranceHost18%The tolerance band (±) around the target
MinPoweredOnCpuCapacity1 MHzMinimum CPU capacity that must stay online
MinPoweredOnMemCapacity1 MBMinimum memory capacity that must stay online
PercentIdleMBInMemDemand25How aggressively idle memory is counted in demand

The last one deserves special attention. DPM does not just look at active memory — it accounts for a percentage of idle consumed memory too. At the default value of 25, DPM calculates memory demand as:

Active Memory + 25% of Idle Consumed Memory

Setting this lower makes DPM more aggressive in consolidating hosts. Setting it higher makes it more conservative. The right value depends on your workload profile.

Hardware Prerequisites: The Part People Skip

DPM is only as reliable as its ability to wake a host back up. This requires out-of-band management — and this is where many DPM deployments quietly fail.

Before you enable DPM in production, verify the following on every host in the cluster:

1. BMC/iDRAC/iLO is configured and reachable vCenter needs network access to the host’s management controller. This must be on a dedicated management network — not the same interface as your vSphere management traffic.

2. BMC credentials are entered in vCenter Go to Host > Configure > Power Management in vCenter and enter the IPMI/iLO/iDRAC credentials. Without this, DPM cannot issue power-on commands.

3. Run the standby mode test Before trusting DPM in production, manually test that a host can enter and exit standby correctly. Navigate to the host in vCenter, right-click, and select “Enter Standby Mode.” Verify the host goes to standby and then powers back on via a DPM test recommendation. If the exit from standby fails, vCenter will flag the host as “DPM Disabled Due to Failed Exit Standby Mode Testing” — and you do not want to discover this during a capacity event.

4. Redfish vs IPMI Modern servers support Redfish, which provides significantly faster and more reliable power-on responses than legacy IPMI. If your hardware supports Redfish, prioritise it. The practical difference in wake time can be 2–3 minutes versus 6–8 minutes on old IPMI — which matters when DPM is trying to respond to a demand spike.

The VCF 9 Context: Where to Configure This

In a VCF 9 environment, all compute clusters are managed through individual vCenter instances per Workload Domain. DRS and DPM are configured at the cluster level within vSphere, not through SDDC Manager directly.

Here is the recommended configuration path in VCF 9:

  1. Log in to the vCenter Server managing your target Workload Domain.
  2. Navigate to Cluster > Configure > vSphere DRS.
  3. Ensure DRS is set to Fully Automated — DPM will not function correctly in Manual or Partially Automated mode.
  4. Under Power Management, change the setting from Off to Automatic.
  5. Start with the DPM Threshold slider at position 3 (default/middle) — this balances responsiveness with conservatism.
  6. Add your advanced parameters under Advanced Options if you want to tune the target utilisation ranges.

For monitoring DPM activity in VCF 9, Aria Operations (VCF Operations) tracks DPM actions in the vCenter events log. You can build a custom dashboard view to track how many host-hours of standby your clusters have accumulated — which is a genuinely useful sustainability metric to report on.

Where DPM Makes the Most Sense — and Where to Be Careful

Not every cluster is a good DPM candidate, and part of good VCF Operations is knowing where to apply which tool.

High-value DPM targets:

  • Dev/Test clusters — workloads are non-critical, utilisation varies hugely between business hours and off-hours, and SLAs are tolerant of a few minutes of capacity lag.
  • VDI clusters with predictable demand patterns — if your VDI peak is 8 AM–6 PM, DPM can safely consolidate overnight.
  • Batch/analytics clusters — if jobs run in defined windows, you can even combine DPM with a scheduled scale-up/scale-down approach using Aria Operations automation.

Where to proceed with caution:

  • Production clusters with strict latency SLAs — the 5–8 minute host wake time may not be acceptable for burst scenarios.
  • vSAN clusters — DPM works with vSAN, but you must ensure your vSAN fault domains and minimum host count requirements are respected. NEVER allow DPM to power off hosts below your vSAN minimum redundancy threshold. Use MinPoweredOnCpuCapacity and MinPoweredOnMemCapacity to enforce this.
  • Clusters with HA enabled — and all clusters should have HA enabled — review your HA admission control settings. HA needs reserved failover capacity; DPM should not be allowed to power off hosts that HA is relying on for failover headroom.

The good news is that vSphere HA and DPM are designed to be aware of each other. HA admission control will prevent DPM from powering off hosts if doing so would violate HA’s reserved capacity requirements. But always verify this in your specific configuration — never assume.

Closing the Loop: Measuring Your Watt Savings

One of my goals with every GreenOps initiative is to make the sustainability impact measurable — because what gets measured gets managed, and what gets reported gets funded.

For DPM, here is a simple framework for calculating and reporting your savings:

  1. Baseline your host idle power draw — pull this from iLO/iDRAC power history or from Aria Operations’ host power metrics.
  2. Track DPM standby hours — export vCenter events for DPM power-off and power-on actions. The delta is your standby duration.
  3. Calculate energy saved — multiply standby hours by idle watt draw, then apply your PUE factor.
  4. Convert to carbon — use your data centre’s energy mix carbon intensity (for UAE grid, this is approximately 400–500g CO₂/kWh).

This gives you a per-cluster, per-month sustainability figure you can roll up into your organisation’s carbon reporting. It also gives you the data to justify enabling DPM on additional clusters — because the numbers speak louder than any whitepaper.

Summary: The Hidden Watts Are Hiding in Plain Sight

DPM has been part of vSphere since version 4. It works well. It is well-tested. It ships with every VCF 9 licence you already own.

And in most enterprise environments, it is sitting there disabled, silently burning kilowatt-hours through the night.

Enabling DPM is not a complex project. It does not require a change request the size of a phone book. It requires a BMC configuration check, a standby mode test, and a toggle in vCenter.

For GreenOps practitioners, this is one of the fastest paths to measurable energy reduction in a private cloud — no hardware refresh required, no new tool to purchase, no new process to invent.

Turn it on. Measure it. Report it. Then come back and tell me how many watt-hours you saved.

Leave a Reply