Or, Why your next data center should be a rental unit…
A recent project provided an opportunity to revisit my thoughts on the use of colocation facilities in preference to new enterprise data center construction or upgrades. Although I have written on the topic several times (here, data center report here and another report here), the following paper condenses the background and genesis of the problem, business scenario and requirements, basic architecture and system design, colocation benefits and challenges and my recommendations to enterprise IT executives into a succinct package.
Forget plastics and manufacturing, today’s business school graduates are preparing for the digitized enterprise in which the creation, movement, storage, aggregation and analysis of data and information is far more important than materiel and widgets. The digital economy means that profitability and competitive advantage derives from the intelligent and clever use of data where the information consumer is often peripatetic customers and employees using mobile apps that tap an assortment of remote databases. But the explosion of data, applications, users and gadgets — with more than two billion mobile devices expected to ship just this year — produces a consequent boom in the need for IT infrastructure and the facilities to house it all and IT is struggling to keep up.
Cloud-scale data centers are fat, hot and crowded. The question for most organizations is whether owning, operating and managing these increasingly complex, specialized and expensive facilities is the best use of their capital and highly-coveted IT labor or whether, like other capital-intensive assets — think office buildings, shopping centers, semiconductor fabs or even airports — it makes more sense to rent versus own. In most cases, the answer is yes: your next data center should be a colocation facility.
Business Scenario and Typical Requirements
Moore’s Law aside, the nexus of pervasive data collection, associated big data analytics, ubiquitous and constant mobile device usage and concomitant applification of business, has left IT infrastructure struggling to keep pace with accelerating resource demands. Throughout the client-server and early Internet eras, IT hardware capability notoriously exceeded the demands of most software. Virtualization provided a temporary band aid, but with server consolidation ratios commonly well into double-digits, that bounty of untapped capacity has been tapped. With cloud-first applications architected for broadly distributed and redundant infrastructure, the name of the game for hardware designers is shrink down and scale out.
But what seems like a perfect engineering solution to the problem of scarcity, a Matrix-like proliferation of compact, yet high-performance servers, runs smack into the physics of parasitic transistor losses and the thermodynamics of heat transfer. Translated to the real world of data center equipment, it means that while a rack full of servers can now accommodate hundreds, if not thousands of virtual workloads, those powerful compute engines can consume tens of kilowatts, a power density that most enterprise facilities can’t accommodate and associated waste heat they can’t dissipate.
Coping with cloud-scale densities and resource usage requires a new generation of information factory that bears about as much resemblance to the traditional captive enterprise data center as a Walmart Supercenter does to the corner convenience store. The need for scale, efficiency and redundancy means, much like what happened when production-line manufacturing displaced small shops of craftsmen, size matters. The result is warehouse-sized facilities with 6- or 7-figures of usable space that cost upwards of a billion dollars; a figure well beyond the budgets of most IT organizations.
Most IT organizations have recently coped with a major data center project. According to a 2013 Uptime Institute survey, 70% of those operating data centers have built or renovated a site in the past five years at an average cost of between $5–10 million per MW of capacity. A 2013 Digital Realty survey showed similar results with 66% of respondents having built or acquired a new data center in the past two years with an average capacity of 2.6 MW.
The circumstances for data center upgrade and expansion decisions typically include new or upgraded business applications with significant incremental resource requirements and incessant growth in application usage and data storage. Whatever the specifics, the result is the need for more servers, disk arrays and network switches. While today’s hardware is vastly more capable than that of even a few years ago, adding capacity isn’t as simple as merely swapping out systems in an aging private facility designed for the era of tower servers and internal client-server software not hyperscale systems, distributed cloud applications and remote users.
Many organizations turn to cloud services, whether raw infrastructure (IaaS) or fully packaged applications (SaaS) to meet incremental IT needs, however for all but the most risk-loving startups, public, multitenant services aren’t the best fit for every situation. Although security is often cited as an excuse, with an abundance of FUD sewed about public cloud security, it’s almost surely better than that in most internal data centers. The fact is that most organizations find it easier, or just more reassuring, to have full control over application, data and network administration, usage, security policy, governance and auditing. This means that hybrid public/private infrastructure is becoming the norm for most organizations. Nevertheless, private infrastructure needn’t require private facilities.
Whether it’s a virtual private cloud (VPC) at an IaaS provider or suite of fully owned and operated equipment in a colo facility, it’s possible to meet enterprise IT requirements for control, availability, security, resilience, manageability and cost without actually building and running the data center. If it’s good enough for the CIA, which signed a 10-year, $600 million dollar deal with Amazon to build and operate a private cloud facility, it’s good enough for most enterprises where it’s just profitability, not national security, on the line.
The CIA’s requirements are indicative, if not more extreme, than those faced by enterprise IT and serve as a useful model. It needs the ability to rapidly meet demands for added capacity by spinning up new virtual servers and storage for applications as diverse as Hadoop clusters and Web farms or ECM repositories and test and development sandboxes. It also wants to maximize resource utilization by rapidly shifting workloads between systems as needed, yet still have the application and network isolation, high availability, resiliency, operational control and visibility typical of a mission critical enterprise data center.
Add one final stipulation: since co-lo customers won’t have convenient physical access to servers, switches and cables, they must also have the ability to remotely manage, monitor and audit systems, performing both routine maintenance and fire drill troubleshooting without laying hands on the hardware.
Technology is often a double-edged sword. While high-density, power-hungry servers and switches create the need for new, efficient data centers, they’re also the key to meeting new resource demands. Likewise, while mobile users and ubiquitous connectivity are the source of significant incremental resource consumption, the same ever-present networking provides affordable, pervasive gigabit-scale networks and Internet resilience. Together with software advances like sophisticated data center infrastructure management (DCIM) systems, these technologies mean a facility’s location, at least at the continental scale where latency differences are minuscule, is irrelevant. Much like central power generation and distribution long ago displaced local steam turbines, it’s no longer necessary for individuals or enterprises to own and operate the IT infrastructure central to their work and business.
For this discussion, we assume that organizations are well on the way to running most, if not all applications virtualized, have access to ample, as in gigabit-scale, bandwidth to Internet exchange points and operate relatively small (by today’s standards; i.e. well under 100,000 square feet and 5 MW) facilities.
Typical enterprise data centers are quite inefficient. The aforementioned Uptime Institute survey found that of organizations actually measuring data center PUE (34% don’t), the average self-reported value is 1.65, which certainly overstates the actual number since 4% of respondents claimed a technically-impossible PUE of less than one. The Digital Realty results, with an average PUE of 2.9, are probably more realistic. This compares to figures around 1.1 for large cloud operators like Google and Facebook. As a measure of data center inefficiency (i.e. higher numbers mean more wasted energy), this means that for every MW of useful power, the typical enterprise data center uses about 50% to 100% more energy than the best cloud or colo facility. Using an average commercial rate of 10-cents per kW/hr that translates to at least $44,000, to well over $100,000 per month in wasted electricity per MW of IT workload.
Moving enterprise IT equipment to a colo facility amounts to little more than a change of (physical) address that is transparent to most users in organizations large enough to operate from multiple locations and where existing private data centers are unlikely to be in the same building as employees or customers. The key are WAN links with adequate capacity, reliability and scalability, whether using MPLS, VPNs over public internet or a cloud-based WAN service that allows secure, fast, low-latency connections between application users, whether office workers or customers or IT administrators and the underlying hardware.
The following is a simplified network topology:
Indeed, networking is often the major impediment to a colo design as existing circuits between major sites, namely existing data centers and central offices, and the Internet may be at capacity and dedicated links to a colo site are both expensive and, since they create a disincentive to change providers, strategically unwise. Here too, technology has come to the rescue in the form of WAN optimization/virtualization appliances or cloud network services like Aryaka’s portfolio. In either case, the design entails replacing expensive dedicated WAN circuits like point-to-point OC[3,12,48], Frame Relay, etc. or multipoint MPLS with cheaper broadband or Metro Ethernet VPN overlays to the public Internet. Network connections from the colo facility outbound are vastly more capable and cheaper than from an enterprise data center since providers have direct connections to multiple carriers and cloud services like AWS and Google over multi-Gbps links. Indeed, SUPERNAP’s C.U.B.E feature currently peers with over 50 cloud providers.
Network routing details are a complexity out of scope of this paper, however the appendix shows two typical topologies from SUPERNAP, which operates some of the largest colo facilities in the world, illustrates how remote data centers can be stitched into an enterprise IT fabric.
Inside the Data Center
Colo services typically sell space by the rack, with large customers able to provision multi-rack cages physically separated from surrounding customer racks. Each customer’s infrastructure design is entirely customizable, thus it typically resembles standard equipment pods, i.e. one or more racks of servers, storage systems and top-of-rack (ToR) switches, already deployed internally. The difference is, with space such a precious commodity, it behooves colo buyers to maximize equipment density using 1 or 2U servers, converged blade systems or hyperscale (> 1 server per RU) products.
Since colo designs are by definition Greenfield, it is also wise to run applications virtualized to allow easy workload scaling and movement between physical systems. Thus, the intra-pod network is typically a flat fabric between ToR switches to maximize east-west throughput.
The application storage design isn’t affected by a colo deployment, however changes to backup and archive systems and policies when targeting a colo cluster at an on-premise archive system may be required due to WAN capacity constraints or cost. Due to high-speed cross connects between colo facilities and major cloud providers, it can make more sense to use a cloud service like AWS Glacier, Zetta or an Asigra-based service.
Summary of Colo Benefits and Challenges
Operating enterprise IT at a colo facility has many benefits, but also comes with some challenges. The following is a summary of each:
|Benefit||Challenge / Risks|
|Faster deployment versus building/upgrading facilities||OpEx may appear higher if not fully accounting for TCO including capital depreciation|
|Avoids 6- to 7-figure CapEx for data center construction, expansion or rebuild||Requires proficiency at remote infrastructure (server/storage/network) and application management|
|Supports high-density systems exceeding 15-20kW per rack||Generally requires remote visits for initial build out and physical expansion|
|Exploits economies of scale: more efficient facilities, cheaper high-speed network connections||Since priced by the rack unit, not suited for low-density legacy systems|
|Unsurpassed connectivity to cloud services, remote users/customers and the Internet writ large||Requires added high-speed, reliable, secure WAN connectivity|
|Physical and network security often superior to enterprise data centers||No control over some physical and network security policies or employee hiring; must carefully vet providers|
|Typically superior facility and network availability||Systems and application management dependent on reliable WAN connectivity; could be an issue in times of crisis (e.g. natural disasters, DDoS attacks, etc.)|
|Typically higher reliability and availability as many colo facilities are Tier III certified and with some offering Tier IV service||May require redesigning backup and archive systems and processes to account for large data sets remote from legacy backup systems|
|Frees up IT personnel and budgets (depending on exact TCO cost savings) from many operational tasks to address higher-value business initiatives||May result in skills mismatch necessitating retraining or replacement|
|24/7 facility and network gateway monitoring and management||Little control over scheduled maintenance windows (if any). Also adds another layer to the support and accountability chain during incident response|
IT organizations large and small must carefully consider whether owning, operating, maintaining and upgrading data centers offers any fundamental business advantage and is the best use of precious IT capital. In most cases, we think the answer is no. Here are some recommendations for those updating or overhauling their infrastructure strategy.
- Make colocation and/or virtual private cloud the default choice for new data center capacity. Building a new private facility should be the last option, but even expanding or upgrading existing facilities should be subjected to close technical and financial scrutiny.
Embrace and exploit the physical/virtual divide. When developing data center policies, decouple the application infrastructure, namely servers, storage, LANs and management software, from the physical environs. It’s less and less tenable for organizations to own and operate their own data center facilities, and those that persist in holding onto small facilities of a few thousand square feet, are compromising performance and efficiency.
Perform a detailed facilities assessment. Make an honest assessment whether your existing facilities can handle the next generation of high-density equipment. Do a thorough and complete TCO of everything required to operate what will eventually evolve into your private cloud: the HVAC, power, UPS, backup generation, WAN fiber, data carriers, physical security, everything. You may be surprised how expensive a DIY operation has become. To reiterate the aforementioned Uptime survey: those having renovated or built a new site in the past 5 years, the average capacity was about 3 MW at a median cost of between $5-10 million per MW; i.e. a $20+ million project.
Develop a hybrid cloud strategy that includes one or more next-generation virtualization platforms, whether vSphere, Windows Cloud OS or Linux/OpenStack that can be deployed in various ways: internally, at an external colo, as a managed service/VPC or shared multitenant IaaS.
Let virtualization-cum-private cloud strategy drive equipment standards. Aging vendor and equipment standards, developed for the era of client-server applications and static Web apps, are due for an upgrade. Virtualization, private cloud and big data distributed systems are opportune platforms for high-density, consolidated infrastructure and provide a good occasion to reassess vendor relationships, system standards and reference hardware architectures. These dense and often distributed systems are also tailor made for colo environments.
Start small with Greenfield applications needing new hardware. The pay-as-you-go, usage- and space-based colo pricing model makes it a good choice for new cloud or big-data applications that require incremental hardware investment that may grow over time. Of course, these same applications are well-suited for IaaS/PaaS deployment, but if you’re not yet comfortable using the public cloud, consider deploying at a colo to gain experience managing infrastructure and applications in a remote environment without risking mission critical business processes during early stages of the learning curve.
Two different routing protocols, BGP and HSRP, can be used to build redundant connections between legacy enterprise data centers and a colo facility. These are illustrated in the following diagrams (originals plus more examples at SUPERNAP):