Off topic: When Windows NLB won’t NLB (Quick Tip)…
One of the best things about working on this blog is that over time it has become a mechanism to make friends with people whom normally I would not have had the opportunity to speak with. One such friend is a chap called Josh Andrews. Josh and I speak every now and then and the other day Josh dropped me a line about a problem that he was having Load Balancing a pair of Windows 2008 based Terminal Servers.
Essentially the following was Josh’s scenario;
He had a pair of Windows 2008 Terminal Servers each with a Pair of Broadcom Gigabit Network Adapters Each Terminal Server was assigned a unique IP address for the LAN (e.g the non NLB Interface) and a unique IP Address for the NLB Adapter When Josh created the NLB cluster (using the NLB interfaces in Each server) he found that although each interface converged correctly – he could ping the LAN addresses of the member servers, and indeed connect to them using RDP, however, if he tried to RDP or ping the Clustered NLB addresses he would get “Request Timed Out (Ping)” or “Cannot Connect (RDP)”.
Below is a simple diagram which illustrates a typical NLB configuration (in the example I have used fake IP addresses to protect Josh’s details), however you should assume that all IP addresses are all in the same Subnet and all connections are located on the same Switch) – when comparing this to Josh’s situation the following facts should be applied: From his local machine (in the example below the TSCLIENT) He COULD Ping and RDP to the LAN IP addresses of: 184.108.40.206, 220.127.116.11 COULD NOT Ping or RDP to the NLB addresses of 18.104.22.168, 22.214.171.124 COULD NOT Ping or RDP to the CLUSTER IP address of 126.96.36.199
However he COULD ping and RDP to the both the Unique NLB and CLUSTER IP addresses from each node in the NLB Cluster. At first I went through the usual suspects with him – for example;
- Was the Windows Firewall configured correctly to allow PING and RDP on the NLB Connections?
- Was the RDP Configuration within the Terminal Services manager configured to listen on the correct interface address?
- Was the port range for the Clustered IP address of the NLB configured to allow RDP and ICMP?
- As he was using a single Subnet and each server had two NIC’s had he given the NLB NIC the same Default Gateway has the Public NIC?
Essentially the answers to the above questions resulted in answers which indicated that all had been configured correctly so I started to look at other reasons for this problem. My next suggestion (perhaps foolishly when looking back on it) was to install the Terminal Services Session Broker – this would ensure that he had followed the best practices guides for deploying Windows 2008 Terminal Services in a NLB (and I at that point suspected that the problem could be to do with the absence of the TS Session broker) – if you are interested in the recommended deployments of TS NLB on Windows 2008 you should review the following links:
Again, after installing the TS Session Broker – still nothing.
At this point I though that perhaps I needed to actually see what was happening within Josh’s environment so I asked Josh if he would be willing to allow me to use Cross Loop for a remote session into his network – to which he agreed. After some investigation I established the following:
- Josh’s network consisted of a single subnet where all of the interfaces for the Terminal Servers and indeed the clients were patched into the same Switch – there were in VLANS in play.
- From assessment Josh had configured the Terminal Servers correctly, this included the NIC’s and indeed the NLB Clustering.
- No firewalls were interfering with traffic
- The only way to establish a RDP connection to either of the Terminal Services was to use the Public Interface Address.
This was, of course very frustrating – everything seemed to be working correctly, however you could not use RDP or Ping the cluster interface outside the cluster nodes. At this point I jumped onto trusty Google to see if it could yield any useful advice:
Firstly I came across this article on TechNet http://support.microsoft.com/kb/898867 – admittedly this was slightly outside the scope of the problems that we were having but I thought that I would give it a go – nothing changed. I then came across this article on TechNet http://support.microsoft.com/kb/816910 – which seemed to be more like it, however, the Network cards in Josh’s server were indeed server adapters and to boot – recent models therefore I discounted this.
After 20 minutes of floundering on Google I decided to rest my head on the keyboard, sob a little and contemplate defeat. Then a moment of “clarity” – I decided to run the “ARP -a” command on the Preferred NLB node in the cluster. So I dropped to a Windows 2008 command prompt and typed in “arp -a” which revealed an interesting clue. Essentially the ARP output reported the “Virtual” Interface of the cluster – but no other meaningful data.
I thought this was odd. I pondered this for a little while, whilst I ran through the facts of the configuration. It was at this point I decided to look at the configuration settings of the NLB network card. Josh was using a Broadcom Gigabit server adapter – within his Dell servers, which to all intents and purposes are essentially the same devices as used in HP servers (just with slightly different drivers and obviously the branding is different) – so I accessed the NIC Properties and clicked on the “Configure” button – see below;
Clicking on the “Configure” button brought up the following dialog box – here I clicked on the “Advanced” tab which revealed a set of options similar to below (note if you have a True Broadcom card the options may differ from that below).
I reviewed the options that were available in the “Property” list. The option that I was most interested in was “Locally Administered Address“. Given what I had seen from running the ARP command and the configuration of Josh’s network, I decided to set the value for “Locally Administered Address” to “Not Present” (as I at this point suspected a MAC addresses issue) for the NLB NICs in both nodes within the cluster.
I then tried to ping the Cluster IP address from Josh’s PC – and……… Finally a reply!!!!! I then attempted an RDP connection which also functioned correctly – yee haaa! Josh was now able to proceed with configuring his NLB Windows 2008 Terminal Servers. I learned a number of things from this problem, however the main point that sticks out is that the Network card configuration will be one of the first points of scrutiny in future – especially if all other settings appears to be as they should. Both Josh and I hope that this will help someone else along the way.