Fixing cluster failover error: The system could not find the environment option that was entered

Not the error message you want to see on your database cluster.

A problem with your database server is never pleasant. But with your high availability cluster, it’s obviously a disaster. Recently, we’ve encountered this type of cluster problem twice. Two different customers with seemingly the same issue. That’s why we think more people will benefit from reading about our solution.

Failover fails: “The system could not find the environment option that was entered”

It started the same way with both customers. We had just patched the secondary node of the cluster and wanted to begin working on the primary node. To do this, we first perform a failover so we can free up the primary. But the failover fails.

The cluster role won’t come online on the secondary node. No matter what we try. The cluster log shows a specific error message:

Cluster resource ‘Cluster Name’ of type ‘Network Name’ in clustered role ‘Cluster Group‘ failed. The error code was ‘0xcb’ (‘The system could not find the environment option that was entered.’)

When you start googling, you come up empty-handed. You’ll find multiple cases of this error message, but they’re not failover clustering related:

Missing Windir systems environment variable
Corrupt Windows installation
A 10-year-old Reddit post with someone who had the exact same problem but unfortunately: no solution

After extensive investigation with Procmon, we discovered that two crucial registry values were empty on the secondary:

The values:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Domain
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\NV Domain
$HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Domain$

After entering the domain name there and performing a reboot of the secondary, everything worked properly again.

Root cause: “The system could not find the environment option that was entered”

It’s obviously not normal behavior for these registry values to suddenly be empty on a domain member server. We’re therefore still working on finding a scenario where we can reproduce this exactly.

With customer A, it appears to have occurred after a migration from Hyper-V to vSphere, which resulted in the NIC hardware of the cluster nodes being changed.

With customer B, it appears to have occurred after they replaced their DNS servers. As a result, they had to adjust the preferred and alternate DNS server settings on the NICs of the cluster nodes. The problem seems to have emerged after that.

With both customers, the values were only cleared on one node.

We suspect a bug, but the investigation is still ongoing. We’ll provide an update when the root cause is found.

The system could not find the environment option that was entered

Not the error message you want to see on your database cluster.

Failover fails: “The system could not find the environment option that was entered”

Root cause: “The system could not find the environment option that was entered”

Meer berichten

Twintos heeft nu de ISO 27001:2022-certificering voor informatiebeveiliging

De EPD Boost maakt duizenden uren vrij in ziekenhuizen

Het IJsselland Ziekenhuis kiest ook voor de komende 5 jaar voor Twintos; Jos van Zuijlen vertelt waarom

Performance gekelderd sinds SQL Server 2022? Komt door een enorme hoeveelheid preemptive_os_queryregistry waits. Zo los je het op.

Meer berichten

Twintos heeft nu de ISO 27001:2022-certificering voor informatiebeveiliging

De EPD Boost maakt duizenden uren vrij in ziekenhuizen

Het IJsselland Ziekenhuis kiest ook voor de komende 5 jaar voor Twintos; Jos van Zuijlen vertelt waarom

Performance gekelderd sinds SQL Server 2022? Komt door een enorme hoeveelheid preemptive_os_queryregistry waits. Zo los je het op.