Exchange 2007 CCR – Going Pro (as in Production) – Part 2 – deriving production CCR Hardware Specifications for 8000 users…

In the first part of this article – which I wrote many moons ago (located here: Exchange 2007 CCR – Going Pro (as in Production) – Part 1 – some food for thought (or perhaps not)…) I rambled about a number of things which initially stemmed from the perspective of should you consider CCR in a production environment and indeed the considerations which you should take into account before setting out.

Given that the purpose of this series is to run through CCR in production I will now assume that you would like to implement CCR in a live environment – and therefore I would like to share further with you one of my experiences of such an implementation.

As mentioned the primary focus of the first part was to get people thinking about if they should explore use CCR as their mailbox server of choice and explored the base hardware and indeed data centre environmental issues which might influence your choice.

I finished the first part of the article with the following summary statement which gave the following overview of part 2 (which your are reading now):

I hope that you have enjoyed this part as much I have writing it. I know that it is a bit of a ramble – but I hope that it gets people think about their CCR deployments and what they want to achieve, perhaps some of you will now not go with CCR – perhaps some of you will now change the way in which they look at it. Remember that CCR does not have to be used in isolation – you can combine it with SCR for example to further increase recovery times but reduce other costs.

In the next part I would like to further expand on effective cost cutting and go through some recommendations for recovery – I will then move onto builds and scripts which might help you along the way.

Well, considering that it has been over 19 weeks since I posted that article – I have decided to slightly change the subject matter of this, the second part to encompass some of the changes that have happened since I last worked on the topic:

  1. I have undertaken a new, reasonably large Exchange 2007 CCR roll out, which I have remained very close to – e.g. I did not just work through the project lifecycle of: specification, design, implementation, and migration – but, also I have been managing the installation for a while as the main administrator.
  2. The project above took place even though the sexiness of Exchange 2010 was just around the corner (the RTM was imminent) – this of course included the rather horney hunk of love that is Database Mobility which includes DAG – this gave me the serious dilemma of CCR vs DAG (more on this later)
  3. I have worked on a number of new scripts that I would like to share with you (in a latter part), which can be used / modified for migrations – and more to the point for helping train new staff in mailbox management procedures without having to put them through Exchange 2007 administration training courses immediately
  4. I would like to do a little focus on the process of deriving a sound hardware specification for you CCR installation

Point number 1 of the above might seem a little bit of an odd statement to make, considering that I work with and manage a multiple Exchange CCR environment for my main company – but it stems from a migration project that I did for a friend.

It was and should have been the usual consultancy gig – but due to some changes in the structure of my friends company I was asked to look after the environment until they could place a new messaging consultant.

This of course has presented a few challenges:

  • Looking after someone else’s Exchange environment is different from:
    • Migrating it
    • Managing your own

As you can imagine from your own experiences – I am used to administering my OWN Exchange Servers (where it is my own personal neck on the block) – this means I work to my own organisations internal politics and procedures (many of which I setup and defined) – so when looking after an installation for someone else – they are the boss – so if they ask for a change which you know is “bonkers” I have had to capitulate and do as asked (unless of course the request is completely mad – then I would just hand back the whole installation and let them get on with it – as is sometimes required).

There is a totally different ownership model from the basic freelance consultant which main engages in “Design / Implement” to local company whom employs you on a permanent basis to “Manage / Develop” to freelance consultant whom ends up in the situation of “Design / Implement / Manage” – generally speaking consultants do not run final installations for any great period of time (if at all) – there comes a point where they hand over and move onto the next contract.

However I ended up in the situation of “Design / Implement / Manage” as my friends organisation was in a deep state of transition (for a number of reasons) in terms of its IT team, they had been left with 4 permanent positions in place with a further 9 vacancies needing appointments. Of course of the 4 existing positions, none of them had been trained to migrate or run and Exchange installation – so it was been my job (after the implementation and migration) to write a number of scripts and introduce processes which will allow them to manage the installation in terms of day to day tasks – but until that point had occurred it was also down to me to manage the installation.

As a result I learned a number of things some of which I had taken for granted in my own environment – whereas I could write part of this article explaining the processes that I had gone through to arrive at the final Exchange 2007 CCR design, and also supply you with a few scripts which can help the migration; the icing on the cake of some management scripts may never of happened.

Therefore I would like to present to you; the processes, consideration, and indeed some technical bits (in a later part) that I went through to build a 8000 seat capable CCR environment.

Evolution of the species gave me cause to ponder – “to DAG or to CCR?

Let’s start with the relevance of Exchange 2007’s CCR vs Exchange 2010 DAG.

I am sure that you will all be aware that in Exchange 2007 there were (are) a number of HA and Resilience features; however the two core cluster based technologies are SCC and CCR.
SCC (Single Copy Clusters) were variations on the Shared Storage EVS model used in Exchange 2000 / 2003 however the CCR (Continuous Cluster Replication) model was very much the departure and innovation in HA for Exchange 2007 as it brought the clustering focus back from to a DAS model rather than a shared storage model. Essentially you had two copies of the database – Primary and Secondary which were kept in sync via log shipping and replay between the two nodes in the CCR cluster.

Essentially in Exchange 2007 having a cluster technology that was not restricted to the Shared Quorum / Shared Storage model which also kept two copies of the database between nodes was an excellent idea. Combine that with the geographically dispersed cluster node principle that was added in Exchange SP1 and Windows 2008 IMHO made CCR THE clustering option for Exchange of Choice.

However in Exchange 2010 Microsoft and the Exchange team went one step further – they introduced Database Mobility which included the concept of DAG.

Now a DAG can have an alternative meaning to people whom have seen the film SNATCH, but in the context of Exchange 2010 is it the next generation of Exchange’s HA features. In Exchange 2010 databases are liberated from the confined context of the server – but become an Organisational resource where a single Database can have a copy on many Mailbox Servers within your Active Directory site.
Gone is the principle of SCC, SCR, LCR and to an extent CCR – the HA now exists under the banner of Database Mobility and DAG.

DAG for the most part is a huge expansion on CCR (as it still uses log shipping and replay to target databases), but has a number of improvements over CCR – for example unlike CCR it is designed with remote servers in mind between disparate data centres (CCR was not designed to do this, and indeed only gained the ability to transcend subnets in 2007 SP1). There have also been a number of architectural changes which make DAG very different from CCR.

However this is not an article specifically about Exchange 2010 or DAG – it is still true to the concept of CCR under Exchange 2007.

In the context of my friends migration and indeed the prospect of the original article – I was faced with the conundrum(s) of:

  • Given the fact that when I was commissioned to perform the migration, I knew that Exchange 2010 was perhaps only 2 months away from RTM (at the time that I was asked to perform the work)
  • The new hardware that my friends company had, would allow (in its vanilla state) me to configure DAG

However (and indeed getting back to the point of this series) after discussions with my friends company they decided that despite the close proximity of Exchange 2010 they had mapped out there messaging philosophy around Exchange 2007 and for their Mailboxes Servers CCR as their HA weapon of choice. There were many reasons for this – licensing (not just for Exchange but for a number of the other applications that they make use of are only certified against Exchange 2007), the fact that (from their point of view) they were not prepared to work with a “bleeding” edge technology such as Exchange 2010 (don’t say it I know – that was their opinion not mine), and they had booked all of their training for Exchange 2007 (which was fair enough).

Therefore – Exchange 2007 CCR it was.

Elements and Specification of the Installation:

The key requirements for the installation were (this is of course summarised):

  • In scope roles were: MBX, HT, and CAS (including Active Sync)
  • The whole system must support an initial 6000 users and scale to 8000
  • Maintain resilience to N+1 throughout each role within the context of a single Data Centre (therefore if we had a single CAS, we would need another load balanced to maintain resilience [ N = the number of functional devices which must be online in all failure scenarios / +1 = Number required to maintain N]
  • Migrate all existing 6000 users to the new infrastructure
  • x 2 CCR Clusters (total of 4 individual nodes) located within the same Data Centre – each of the nodes are split between two racks; located on “A” and “B” power feeds from the the local distribution boards
  • As per above for CAS (x 2 servers) and HT (x 2 servers)
  • A number of scripts which perform the migration, and also provide simplified support for common post migration tasks (e.g. Move Mailbox, Disable user – yaa da) until the team was fully staffed and trained

Mailbox Server Specification:

As mentioned given that this series of articles are based around CCR clusters in production I though that it might be an idea to go through the specification that I came up with (in terms of server hardware) for the above user requirement – the other roles such as HT and CAS are beyond scope – but if you would like to get an idea of how I think around around the HT roles for the above requirement – I did an article here: Suggested Hub Transport Hardware Config for Exchange 2007 Installations of 5000 users which still rings true today, and I also plan to do an article on CAS requirements along the same line in the next few weeks.

From a hardware perspective I think that frequent readers of my site / blog know that I am a HP freak – as I always say – before I go into hardware recommendations – please don’t think that I am trying to convert you all to HP. The specifications that I suggest can be commuted to just about any server vendor (especially Dell or IBM) so if HP is NOT your weapon of choice – don’t worry – I prefer to write about them as I have a better grounding in the product range – so I can bore the pants of people more! 🙂

Given this particular project – the specific server that I have used from the HP Proliant range, is the HP ML 370 G6 (I have also used the DL 370 in the past for other projects) – but converted into its 4U rack form factor – the following is a representational picture of what the ML looks like.

19ed4361-cd81-47d1-99dc-271d1f003e34

I used this particular model as it offers a huge amount in terms of flexibility and scalability (as it can scale to 192 GB of RAM, 8 processor cores and 24 DAS drives – plus you can add up to 9 expansion cards [including FC adapters] and comes with 4 NIC’s + 1 for iLO (integrated Lights out) it also can handle x 2 power supplies) – plus given the following Exchange CCR considerations seemed to fit the purpose of the project very well:

CCR Mailbox Server components 101 – Memory and processors – simplified calculations:

Memory:

Usually memory calculations for your Mailbox server (or per CCR node) are actually very straight forward – as Microsoft gives you an indicator to the upper most limit for RAM regardless of your scenario which is 32GB which is a good base line – to quote Microsoft:

32 GB is not a physical limitation, but rather it is currently the most cost-efficient maximum memory configuration. Depending upon the number of memory slots in a server, the most cost-efficient maximum memory configuration might be less than 32 GB (for example, 16 GB). This needs to be considered when choosing server hardware.

To go further there are several reasons why MS does not recommend more than 32 GB of memory for Mailbox servers, which aside from the cost (which I would argue is getting less of an issue these days – more on that later), as MS states there is an impact on “non-transactional disk I/O, and cold state operations”:

  • What is Non-transactional disk I/O?

    • A Mailbox server makes use of additional RAM by caching data, this reduces transactional disk I/O.
      Mailbox Servers (including CCR Clusters) contain a number of NTDIO elements.

      • Database online maintenance (defragmentation)
      • Database offline maintenance (offline defragmentation or repair)
      • Fatality Prevention / Recovery Operations (Backup, Restore, and RSG)
      • Mailbox management

All of the above create significant disk I/O whilst operating your server. When adding more RAM a server you want to reduce the disk I/O requirements, which reduces the cost of your disk. However due to non-transactional I/O elements, your storage provision cost might not be reduced much further by adding more than 32 GB of memory (e.g. the money that you are saving on disk is removed by the amount you are spending on RAM).

  • What are Cold state operations?

    • A Cold state operation is not the first thing that you do in the morning when checking your Exchange servers before a coffee.
      Essentially it is the state of your Mailbox servers after a restart, or a Information Store restart. The read/write cache is small in size as a result of this; so read I/O operations are far higher. When it begins to processes messages, the cache grows, which reduces disk I/O. With more RAM in the Mailbox Server – it takes longer for the optimisation of the cache to reach its full potential.
      If your disk is sized for more than 32Gb of RAM a number of factors from the above Client connections might experience poor performance when the server restarts – which is really significant for clusters as a node failover constitutes a cold state operation.

OK – all of the above given, you folks are probably more interested in what was the memory configuration that I decided on for the customer to support 6000 to 8000 users.

Well I operated on the principle that although 8000 users would be the upper maximum – given the proliferation of e-mail and converged messaging services (UM) and the overall objectives of my friends business over the next couple of years – I plumbed for x 4 CCR cluster nodes (forming x 2 CCR clusters) initially with 3 storage groups per CCR cluster.

I had also decided that it would be safe to assume (from the profile analysis that I had done on the existing Exchange 2003 installation) that the user profile would be “Heavy” – the following table is taken from http://technet.microsoft.com/en-us/library/aa998874(EXCHG.80).aspx and is used to define what is a “Heavy” usage use:

User type (usage profile)

Send/receive per day approximately 50-kilobyte (KB) message size

Light

5 sent/20 received

Average

10 sent/40 received

Heavy

20 sent/80 received

Very heavy

30 sent/120 received

I then equated the usage profile into physical memory requirements as per the following table:

User type

Mailbox server memory recommendation

Light

2 GB plus 2 MB per mailbox

Average

2 GB plus 3.5 MB per mailbox

Heavy

2 GB plus 5 MB per mailbox

Very heavy

2 GB plus 5 MB per mailbox

And then combined that information with the following physical memory metric data based upon the amount of storage groups from the next table (taken from http://technet.microsoft.com/en-us/library/bb738124(EXCHG.80).aspx):

NOTE: You will see that in the table above you are given a initial physical memory quotation (e.g. 2GB plus 2 MB per mailbox) – you can ignore this and base the initial physical memory requirement BEFORE mailboxes on the following table which deals with the amount of Storage Groups:

Storage group count

Exchange 2007 Service Pack 1 and above minimum required physical memory

1-4

2GB

5-8

4GB

9-12

5GB

13-16

6GB

17-20

7GB

21-24

8GB

25-28

9GB

29-32

10GB

33-36

11GB

37-40

12GB

41-44

13GB

45-48

14GB

49-50

15GB

Therefore at this point I had the following calculation:

2 GB [Minimum Physical Memory for Server with 3 SG’s] + (8000 [Users] x 5 [MB]) = 42GB of RAM

Now I then divided that end RAM amount by two (as I have two CCR Clusters) which makes the equation look like the following:

2 GB [Minimum Physical Memory for Server with 3 SG’s] + (8000 [Users] x 5 [MB]) / 2 [Number of CCR Clusters] = 21 GB of RAM (Rounded up to 22GB).

This means that each of my CCR clusters can support up-to 4000 users – which gives me the initial specification of x 2 CCR Clusters with x 4 nodes.

Processors:

Of course the key consideration to make when choosing a processor for any production Exchange 2007 role is that is MUST support x64 (either Intel’s EMT or AMD’s AMD64) – however these days it is almost impossible to buy a server which does not meet this requirement (well it is possible, I guess but why you would want to buy a server without that support I do not know!).

Remember that Exchange 2007 does not support x32 in production – and will not run on IA(Itanium) Processors.

Generally speaking Processor consideration for CCR Clusters (or more to the point Mailbox Servers) are even more easy than memory calculations as once again there is a finite amount of power that you can throw at a server without diminishing your price / performance return.

Microsoft are keen to push Multi-core processors – and indeed since their release they are pretty much the de-facto standard in most Servers – the question is how many physical processors with how many cores to you place within your Exchange server – and indeed more to the point – what was the conclusion that I came to for my specification?

The following table (taken from: http://technet.microsoft.com/en-us/library/aa998874(EXCHG.80).aspx) gives a very basic overview of how you can determine the optimal processor configuration – but as you might expect, it is pretty generic:

Exchange 2007 server role

Minimum

Recommended

Maximum

Edge Transport

1 x processor core

2 x processor cores

6 x processor cores

Hub Transport

1 x processor core

4 x processor cores

12 x processor cores

Client Access

1 x processor core

4 x processor cores

6 x processor cores

Unified Messaging

1 x processor core

4 x processor cores

6 x processor cores

Mailbox

1 x processor core

4 x processor cores

12 x processor cores

Multiple server roles (combinations of Hub Transport, Client Access, Unified Messaging, and Mailbox server roles)

1 x processor core

4 x processor cores

6 x processor cores

As you can see from the above table – the recommended processor configuration is around x 4 processor cores this give you the following options (these are not exhaustive):

  • A four socket server with x 4 single core processors (not recommended as this will limit scalability)
  • A two socket server with x 2 dual core processors
  • A single socket populated with a single quad core processor

My personal recommendation is a server with two (or four) sockets which support quad core processors – as this will give you a good trade off between cost / scalability and performance.

I have also found that most server vendors these days will give you a more cost effective trade off in favour of purchasing quad core processors over dual cores (I managed to get all of the CCR nodes for this project for 10% less than the cost of the Dual Core models).

From Microsoft guidelines they state that a four core server will potentially support “several thousands of mailboxes” – but I have found that if you have the budget go for a pair of quad core processors within your server (which is what I opted for in the case of this deployment) this ensured that each node (and therefore CCR cluster) had enough room to grow.

CCR Mailbox Server components 101 – Storage – simplified calculations:

Now that I had settled on the the ideas of what I would like my Memory and Processor Configuration to look like, it was time for perhaps the most critical aspect of the server design.

Getting the Disk configuration of a mailbox server correct is a key part of a successful Exchange 2007 implementation.

Over the years working with Exchange I have found that the main gripe that users have with e-mail (aside from it not being available) is “my Outlook client keeps retrieving data from the Exchange Server”.

Now whereas it is completely correct to say “that is not always down the underlying disk design” – but I have seen a number of Exchange installation where the disk technology, RAID level and Database layout has contributed to a number of users clients just “sitting” waiting for an I/O time slot.

Disk Type / Speed and Size:

There are a number of options that you can pursue in terms of the Disk Technologies that can be used with Exchange 2007. Speaking openly and from my own personal perspective I would say that there has been a swing from SAN based storage (in Exchange 2003) back to DAS being a recommendation in Exchange 2007 (however you still have the option of SAN storage in 2007 in fact it is likely for large installations with many storage groups it will potentially be a requirement (unless of course you are using “mount points”) – you can also consider iSCSI.

Local (DAS) or SAN?

As mentioned above – one of the first quandaries that you might come across in terms of deploying CCR – do you make use of DAS or SAN based storage?

My own personal perspective is with CCR you should avoid SAN storage (where you can) with CCR.

You might be thinking why?

Well the main reason is, unless you are planning to make use of a physical SAN per node (unlikely) – or use iSCSI by interconnecting two SANs located on separate sites (or zones of the data centre (possible, but again not a common scenario)) you create an inherent single point of failure.

If you allow for both CCR nodes within a single CCR cluster to use the same SAN (via Zoning for example) the SAN itself becomes the risk – consider the following diagram (which appeared in part 1 of this series):

SAN-SinglePointofFailure

The above provides a pictorial example of the situation you could find yourself in – if you replicate to the SAME disk array in the same enclosure = lose the SAN / Enclosure = you lose the CCR.

I am not saying of course that you cannot use SAN architecture with CCR you can – and to elaborate; there might be some higher end scenarios where there is more than one SAN in place where it might be more cost effective (for your organisation) to make use of a SAN – but I have found that many places choose to distribute their risk and performance for CCR across DAS.

OK, now that I had settled on the type of storage that I was looking to use for my friends production scenario (DAS) – it was now time to delve deeper into the type of DAS disk technology to use.

SATA:

SATA in my humble opinion is still a little slow for Enterprise class solutions, don’t get me wrong, I know that there is the SATA E class drives which are designed to run 24 x 7 and at speeds of 10K RPM, but I am not certain if there is much of a financial saving between buying SATA E or just opting for SAS.

I will say that in Exchange 2010 making use of SATA E is a completely viable idea – and given the design principles of Database Mobility are indeed exceptionally cost effective drives – but as my friends company had opted for Exchange 2010 SAS I settled on as the weapon of choice – as it was still slightly more cost effective and also provided me with a starter capacity of 300Gb drives which operate at 15K RPM – whereas in the HP range the SATA range is aimed at the more entry level / non-mission critical + high performance systems.

SAS:

SAS (Serial Attached SCSI) is designed for Enterprise class performance and reliability. Access times and platter speeds are pretty good (although when using the smaller form factor drives you are limited to smaller disk sizes) – HP provide SAS in 2.5” and 3.6” form factors where as in my scenario (with the ML 370 G6) I opted for the 2.5” drivers which are available in the following capacities:

10K RPM = 146 GB / 300 GB

15K RPM = 72 GB / 146 GB

Now that I had arrived at my disk technology, the disk platter and controller type, I needed to consider the size of the disks, and the RAID levels that I would be using – however to offset this I (and yourself) would (will) need to consider the Storage Group, Database and Transaction log layouts which match the number of users and usage profiles for the customer base.

In Part 3:

In part 3 I intend to cover the following aspects of my customers design:

  • Disk Sizing / Layout
  • RAID Controller Design
  • RAID Levels
  • Database Placement
  • Transaction Log Placement
  • Nic Configuration
  • Overall Solution Design (as a summary)
  • Migration
Sharing is caring!:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.