Proud to be an Exchange MVP
Video Guides
Surveys
Favorite Exchange Blogs
Microsoft Exchange Blogs
Exchange Training
Site Stats
Members : 2Content : 147
Content View Hits : 140535
| HP Racking and Server air flow 101 vs the Outsourcer whom (obviously) should know it all… |
|
|
| Written by Andy Grogan |
| Tuesday, 28 July 2009 17:59 |
|
I know that this is way off topic, but it is something that I want to get off my chest, its a rant. I cannot apologise for it – I had to speak out. I will not mention any names, but it is a rant about poor service and bad contracts – with a small sprinkling of server and racking air flow information. You all will mainly know me for my work with Exchange and Microsoft products, however my main role (and indeed why I have not posted so much on the blog recently) is head of operations (or the Networks & Infrastructure Support Manager) for a large organisation in London. Like perhaps some others this role means that although I like to work with Exchange and AD primarily I am actually responsible for an entire Data Operations and Data Centre team. My organisation has just commenced a full refurbishment of its Data Centre facilities which includes new Air Conditioning, Fire Suppression, UPS replacement and a new Rotary Backup generator. Additionally we are changing all of the flooring and ceiling infrastructure. The above obviously represents a significant task – but to make matters harder for us, we have to complete all of this work with all the existing equipment within the room (we cannot decant it to another location) this amounts to around 50 fully populated 42U racks and 10 fully populated 47U racks, plus an IBM SAN, IBM 3584 and XIOMAG. Because of this we are completing the works to the room by performing a number of staged moves of all the servers racking over the period of 8 weekends. Now as you can imagine, this requires a significant amount of planning and logistical work (it can be really challenging moving a single rack in a weekend let alone 12 fully populated racks to 6 meters away from where it was previously) – but via some excellent work by my server and networking team we managed to get through phase 1 without issues – well at least for our racking (which amounted to 10 of the 12 that we shifted). The remaining two racks that are relevant to this post are looked after by a 3rd party company (whom for legal reasons I will not name here – all I will say is that they are a large IT “outsourcing” provider here in the UK – although they have not managed to get a foot hold in my I.T department for reasons that you will work out later on!), we host their racking in our Data Centre for historic reasons, rather than contractual ones – and indeed I suspect that these days they wish for their kit to stay housed where it is (in our DC) because they have lost a number of high profile contracts and cannot afford (or do not wish to take on) the financial housing of these racks. From their point of view why bother?? - there is the convenience of being able to have someone else pay for the electricity and provide a Tier3 standard room whilst effectively claiming the equivalent to “Electronic” squatters rights and get to moan when things don’t go their way. Now getting this company on board with our moves has been a little difficult – at first (over a month prior to the phase 1 moves) we had asked this 3rd party to move their equipment offsite (as they have done with some of their other systems) to a location elsewhere in the U.K. as it would perhaps be the best idea considering that the services that these two hosted racks provide are key to revenue in my organisation and given how much disruption would be going on within the room there was a good argument on the balance of risk to eliminate them from the equation. This of course was met with a number of exceptionally weak technical excuses from the supplier as to why these racks could not be moved elsewhere (bandwidth being quoted as the major reason – Hmmmm can’t see that one on a 10MB feed that operates at peak with 2.3 max utilisation) so to cut a very long story short we then suggested that they would then have to move their kit in the room just like us (we basically had to say either you move it – or we will). This of course is where the fun really started (please bear in mind that they only have two racks in the room) – in order to complete the moves (which they then started to claim they had little notice of, after two weeks of arguing about hosting them offsite) the following requests came through:
The upshot of the above was a week wasted chasing them for their information and requirements for their racks (power and patching) to which we got the following (edited slightly to protect the not-so-innocent): “30 power sockets and 29 connections” That was it – honestly – the sum total of the all the above, and indeed the “review” they had on our site was a three line e-mail from their “project manager” whom gave the above as their requirements. I had a good mind at the time to leave it at that and let them crash and burn, but given the impact to my organisation I wrote back and stated: “HP Racks (which these were two) either have 16AMP or 32AMP PDU’s – therefore you might have a requirement for 30 power sockets – these will already exist in your rack – what I need is how how many PDU’s you have in your respective racks which terminate as either 16AMP or 32AMP commando sockets – therefore you will have something like a requirement for x2 or x3 or x4 sockets depending on the rating and number of PDU’s”
HP 32A Modular PDU – sockets in the rack are connected to this (has a capacity of x 4 blocks with x 8 sockets per block)
32A Commando Socket – attached to the above PDU
Additionally I also said: “You have have the 29 connections – but what are they currently – 100M/FD – 1000M/FD etc etc” This obviously stirred something in the said supplier as two days later I was sent an e-mail which pretty much admitted that had no clue how their racks were put together from a power point of view, but contained some poor recommendations (to all intents a wish list) which asked for 10 connections more than they already had and gave no reason for wishing to have the additional connections. I wrote back and explained that this was a little better (ahem) – but, they would need to firm up the requirement for their own sake, as their racks contained kit that was over 8 years old and had not been touched in around that period of time we would have to guess the power requirements (and indeed as they had also originally stated that they would be onsite on the day after our moves) this could cause them major issues if my team was not around. Over the next week or so, we saw people coming to site (from the company) surreptitiously analysing their racks – and on the day before the moves we finally got a project plan from them (quite unexpected really – but it was pants – however the major change was that they would now be onsite on the SAME day as my team – hmmmmmm – I wonder what could have changed their minds???). Come the Day – Come the Power;So cometh the day where my team rolls in at 06:30 am in the morning and begins our task of moving our cabinets to their new locations to allow for the floor to be laid. By 10:00 we had all 10 of our racks in place so we were well into running new patch panels, fibre, and power – then at 11:30 am in rolls their move team (all 4) from the said company. They then needed a 20 minute conversation with my Senior networking engineer (as they had not worked out their Networking requirements properly) before they could begin. I think that by 14:30 they had their racks in place (not connected – but just moved to the new locations – they needed a lunch break between 13:00 and 14:00 it would appear) – it took them that amount of time (between 11:50 and 13:00) to strip down the redundant wiring so they could find their PDU’s and disconnect from the mains so the racks could be moved. Additionally during their move they managed to kill their KVM and asked to borrow one of ours, needed to borrow both my Networking and Electrical Services people and had the cheek to tell me no to take the KVM back that I had “loaned” them until they had notified my that I could – cheeky buggers. They managed to get their racks powered back on and at around 16:00 their moved team buggered off – only for 20 minutes after they left one of their test users walked into my department and asked where they were as their systems were not working.
Anyway to cut another long, painful story (in relation to them) short we got through the day and our moves were very successful so we finished up at around 20:30. Lets now fast forward to 08:30 on Sunday, I am bleary eyed, in the bath room at home getting ready to take my little boy to Legoland when my work mobile rings. It is my on call engineer whom is onsite within one of the representative from the said company whom explains to me that there is a concern that the companies racks are now running “too hot” and it presents a risk to their service. My chap explains to me that he has been forced (to placate the companies representative) to open the front and rear doors to their racks and place a household desktop fan at the back of their racks. I explained down the phone that:
I asked - “what is the test that has been applied here which leads the the conclusion that their racks are over heating” – the answer (wait for it) was that the companies engineer has placed his hand on the back of each of his servers and they are hot to the touch!. It was at this point where I might have sworn a little bit, and then explained that due to the way in which servers work – the back end of the server will always be hotter than the front. Cold air is sucked in through the front venting – blown through the server and is expelled out the back via (in certain HP models) via the PSU’s. – see below for an example:
When placed in a rack environment the following scenario applies to most Data Centres (when I say ** Most ** I mean DC’s which have a raised floor, make use of CRAC, use Hot and Cold isles and have a return hot air path):
So by opening both doors and adding in a Desktop fan they were actually taking a retrograde step. I asked to speak to the representative of the company myself on the phone and explained that by testing the temperature of the servers merely by using their hand was akin to Luke Skywalker using the “force” and that in order for them to get an accurate measurement they would need to us an alternative means. It was at this point (and I am not joking – I promise) the chap at the other end of the phone asked my if there was a “Thermometer” that he could use. Slightly aghast, I asked if he was feeling unwell? to which (obviously missing the sarcasm) he replied that he wanted to take the temperature inside his racking. Again I had to explain to him that this was not the best way forward. I told him to logon to one of his servers and open the HP Management Tools Homepage and from within there he would be able to tell if each of his servers was getting hot – this took three attempts to get what this tool was over to him, I also explained that by default HP servers are configured to reboot every 2 minutes when they are in thermal Critical status – I asked him – are your servers randomly rebooting – the answer was “No”. It would seem at this point that I had gotten the point over the representative and we exchanged pleasantries and bid each other farewell for the day. On Monday I get forwarded an e-mail by my Head of IT in which the “Account Manager” for the subject company had stated that “One of their servers had nearly blown up” and that there were “Major Heat issues” which permutated to a huge Health and Safety issue. Of course this e-mail had gone to everyone and anyone whom is important. I was, to say the least incandescent with rage (perhaps I was the only thing at risk to “blow up” – apart from the fact that the message was stirring scaremongering at best – it had been written by someone whom has the same understanding of Servers, Server Rooms and technology as the can of beer that I am drinking now. It was so bad that the server engineer from the company whom had been onsite and been wrong actually sent a personal apology to us because of its content. Now you all could be forgiven for wondering why have a put this article up here – we a few reasons really:
Now I would like to apologise to my readers whom work for Outsourcing / Managed Services companies – I am sure that your firms are very good and indeed that my experience is a limited one – however I have felt compelled to write about my recent exploits with ours. This is just one example of many where their service has been well below par – but indeed as is in their nature the claim to be beacons of expertise and brilliance. Well they are not. Not in the slightest. Its a shame, because there are some really nice and good people within the firm whom I deal with day in and day out – but get to the higher levels and it all goes to heck in a hand basket If you are entering into any such arrangement with a supplier I recommend that the following is put in place:
Rant Over!
|
| Last Updated on Tuesday, 28 July 2009 18:06 |







