In the first part of this article we introduced the various Exchange Cluster Resource DLLs. We also looked at a figure illustrating the interactions between the various components. Today we continue from this figure which I reproduce here for your convenience.
FIGURE 1.1-Cluster Components and Exchange Resource DLLs
The above figure shows the functions defined in EXRES.DLL. The Exchange-Specific functions are mapped with the cluster-specific functions. For example, Cluster's IsAlive and LooksAlive functions are mapped with ExchangeIsAlive and ExchangeLooksAlive respectively. However, there are no static mappings defined within EXRES.DLL. EXRES.DLL knows which function to execute. The same way, other Exchange functions are also mapped to related cluster functions as shown in Figure 1.1.
ExchangeIsAlive and ExchangeLooksAlive functions are executed by EXRES.DLL at a predefined interval. Most of the monitoring task is done by performing ExchangeIsAlive query. ExchangeIsAlive is implemented in such a way that it performs all the checks for Exchange Resources. It checks to make sure all the:
Exchange resources are online. (Also checks Exchange Services in services.msc snap-in)
Exchange resources are configured with correct dependencies.
Dependent Exchange Resources are online.
The registry entries for Exchange resources are configured correctly.
ExchangeLooksAlive is generally not used. Please note that the Resource Monitor executes IsAlive and LooksAlive queries against the whole Cluster Group. It is the responsibility of the Resource DLL (EXRES.DLL) to execute ExchangeIsAlive and ExchangeLooksAlive against its Exchange resources. The IsAlive and LooksAlive checks are performed every 10 seconds.
When you setup an Exchange cluster for the first time, the Cluster Service running on the node takes a snapshot of the cluster configuration and saves it in HKLM\Cluster key. This Key contains the cluster configuration such as the resource name, their GUID, node holding the resources and status. This is generally called cluster configuration database. As an example, for Exchange it includes the following resources:
Resource Name |
GUID |
Node Name |
Status |
Flags |
Exchange POP3 |
{GUID1} |
Node1 |
Online |
1 |
Exchange SMTP |
{GUID2} |
Node1 |
Online |
1 |
Exchange IMAP4 |
{GUID3} |
Node1 |
Online |
1 |
Exchange System Attendant |
{GUID4} |
Node1 |
Online |
1 |
Exchange Information Store |
{GUID5} |
Node1 |
Offline |
0 |
Before the Resource Monitor executes any cluster function against the Exchange Cluster Group, it looks at the cluster configuration database to check the status of all resources and their GUIDs. For example, let say we have a cluster group named "ExchangeVS". All the Exchange resources reside in this group. When IsAlive interval expires, the Resource Monitor executes the IsAlive call against the "ExchangeVS" Cluster Group. It hands over the Resource GUID and Status to the Exchange Resource DLL (EXRES.DLL). EXRES.DLL in turn executes the ExchangeIsAlive call to check the resource availability. Please note that EXRES.DLL doesn't really know about the status of Exchange Resources. It is the Resource Monitor who supplies this information to EXRES.DLL.
Next we look at ExchangeOpen, ExchangeClose, ExchangeOnline and ExcahngeOffline. These functions are called whenever the Exchange Resources are moved or taken offline/online or when there is the need to call them. For example, you might want to take Exchange Resources offline for maintenance purpose on a node. In that case, the Resource Monitor executes the Offline function and in turn EXRES.DLL executes the ExchangeOffline function to take the resource offline. We will discuss these functions later in this article. As a whole, these functions are executed by the Cluster Service and supported by the Exchange Resource DLL. That's why Exchange 2000 and later versions are known as pure cluster-aware applications!
The Resource Monitor determines the state of resources by checking the flag value at the registry. This value could be either 1 or 0. 1 is for Online and 0 is for Offline. For example, if you stop an Exchange Service on a cluster node, the value 0 is set for that service or resource at the registry. If you stop the service using command line or any other tool, the value is not set. It is left intact because this operation occurred out of the cluster operation. Operations occurring out of the cluster are not reflected at the cluster configuration database. In this case, the IsAlive query may not function correctly. The value supplied by the resource monitor will indicate that the Exchange Service is running. Thus ExchangeIsAlive will not take any action against the stopped service.
Status Messages and the Resource Monitor
The status messages shown above are generated through IsAlive calls. When IsAlive interval expires, the Resource Monitor executes the Cluster IsAlive calls. The Exchange Resource DLL in turn executes ExchangeIsAlive against all Exchange Resources. The messages returned by these calls include one of the followings:
Online/Offline
Online/Offline Pending
Failed
The above status messages are passed back to the Resource Monitor. In turn this reports about the need to take any action to the Cluster Service.
As shown in Figure 1.1, the Resource Monitor sits between the Exchange Resource DLLs and the Cluster Service. Any calls made to Exchange Resources have to take place at EXRES.DLL first. For example, if the Cluster Service needs to check the availability of Exchange resources, it will make a call to the Resource Monitor; in turn this will ask EXRES.DLL to check the status of the Exchange Resources and report back. If the Resource Monitor doesn't receive any response from EXRES.DLL or it cannot detect the resources availability, it will pass the status back to Cluster Service. Cluster Service then passes this status message to related Managers as shown in above figure. Managers take the action as per the status passed by lower layer components. The status message could indicate a failure of Exchange resources or could indicate a simple status message. These messages and cluster actions are discussed later in this article with an example.
In addition, if functions executed by the Resource Monitor do not exist in the Resource DLL, the request is simply discarded and no operation is carried out.
How do Exchange Resource DLLs help in the failover process?
Exchange Server doesn't really utilize its own mechanism to failover the resources on the surviving node. Instead Resource DLLs are written to "support" the failover process. The following figure shows a simple failover process:
FIGURE 1.2 - EXRES.DLL and Status Messages in Exchange Cluster Failover Process
After IsAlive interval expires, Cluster Service asks the Resource Monitor to report the status of Exchange Resources, obviously after 10 seconds.
Resource Monitor checks the status of Exchange Resources in Cluster configuration database (HKLM\Cluster). It provides EXRES.DLL with the Exchange Resources GUID and their current status.
EXRES.DLL executes its own function (ExchangeIsAlive) after it receives a signal from the Resource Monitor to perform a check on the Exchange Resources. It checks and reports back the status to Resource Monitor. EXRES.DLL will report the following status messages:
Online/Offline
Online/Offline Pending
Failed
-
After the Resource Monitor receives the status, it compares the status messages received from EXRES.DLL with the one stored (Cluster configuration database). It then takes the action as per the status reported by the EXRES.DLL as listed below:
If comparison is successful, no action is taken. For example, status message received in step 2 is "Online" and ExchangeIsAlive query also reports the same status.
If comparison is unsuccessful, the following actions are taken:
If status message received in step 2 is "Online" and ExchangeIsAlive query reports "Offline", the Resource Monitor executes an "Online" function. EXRES.DLL receives this message and executes ExchangeOnline function to bring the Exchange resource online.
Note: The Resource Monitor doesn't take any action for Online/Offline status messages because an Administrator might have stopped the resource for maintenance purpose but the same should also be reflected in the Cluster configuration database before IsAlive is called. The Resource Monitor only takes action when the comparison is not successful as stated above.
Furthermore, there shouldn't be any inconsistencies at the Cluster configuration database. If there were any, these wouldn't last longer than 10 seconds since IsAlive calls always update the status at the Cluster configuration database.
-
The mechanism isn't really straight forward. There could be one more message returned by EXRES.DLL that is "Failed". In this case the Resource Monitor sends a message (Restart) back to EXRES.DLL to restart the resource. EXRES.DLL in turn executes the "ExchangeOnline" function to bring the failed resource online.
Note: EXRES.DLL doesn't really implement a separate Restart function. Instead it always uses its own implemented ExchangeOnline function. If a resource doesn't come online within the specified interval or after a few attempts, the resource is considered to be failed.
After a resource has failed, the message is passed back to the Resource Monitor. The Cluster Service receives this message from the Resource Monitor and starts the failover process with the help of the Failover Manager.
If the resource is started successfully after a few attempts, the failover process doesn't occur.
Thus if there is no Resource DLL for Exchange Server, the failover process could take a longer time to move the resources from one node to another surviving node. Because Exchange Resource DLL is competent enough to handle the cluster functions executed by the Clustering Software, it doesn't need to wait to decide which action to take. As stated above, the cluster-aware functions are mapped with Exchange-specific functions, so it is easier for an Exchange Resource DLL to execute these functions as soon as they are executed from the Resource Monitor.
Conclusion
To summaries, Exchange Server 5.5 was not a fully cluster-aware application. Instead it used a generic DLL provided by the Windows Clustering software. Exchange 2000 and 2003 Servers are cluster-aware applications as they ship along with a cluster Resource DLL.
We saw how the Cluster Service doesn't talk to EXRES.DLL directly. In fact, it uses its Resource Monitor. The status messages passed by the Exchange Resource DLL are received by the Resource Monitor to perform any appropriate action.
Finally we also saw how the Exchange Resource DLL plays an important role. Resource DLLs allow Exchange Server to be a fully cluster-aware messaging application. The functions executed by the Resource Monitor on behalf of the Cluster Service are supported by the Exchange Resource DLLs. This makes the failover process faster.
References
Exchange Cluster Resource DLLs and the Failover Process (Part 1)