Haswell Redesign of Intel Xeon E7 Made for Big Memory Workloads

By | May 13, 2015

Note: a version of this article appeared on TechTarget: The Intel E7 v3 processor entered the ring against IBM Power and other systems. But will the new BI features be enough to take over scale-up workloads?

Intel’s tick-tock CPU development strategy of alternating between process node shrinks and new microarchitectures has a long, consistent track record of product improvements. With the Broadwell release earlier this year, its consumer products have begun the ‘tick’ phase transition to a 14nm process, however the data center Xeon product line always lags and is just finishing up a ‘tock’ migration to the Haswell architecture. The high-end E7 is the third and final Xeon series to get the Haswell treatment, with Intel’s announcement of 12 new version 3 products now shipping.

Source: Intel

Source: Intel

Building upon the same core and internal dual ring interconnect as the E5v3 released last fall, the E7 adds several key features to support scale up, high memory, mission critical workloads. The processor core itself is the main attraction and the Haswell engine process instructions 10% faster than their Ivy Bridge predecessors, with E7 models sporting up to 18 cores sharing 45 MB of last-level cache. But the E7v3 adds other features designed for big workloads that improve memory performance, power management and I/O throughput, along with new transaction-related and crypto-acceleration features (TSX and AES-NI), better memory performance (DDR4) and greater system resiliency (Run Sure, MCA/machine check).

The E7 is designed for what used to be considered mainframe workloads — OLTP, big data business intelligence, scientific simulations — the type of applications that crunch a lot of data, require high I/O throughput, aren’t easily segmented across separate machines and support critical business functions. Unlike the E5 series, which is made for 2-socket, scale out, cloud-native workloads, the E7 provides concentrated compute firepower for a sweet spot of 4- and 8-socket systems (up to 32S) with up to 6 to 12 terabytes of memory spread across 96 and 192 memory sockets respectively.

Screenshot 2015-05-10 at 11.44.26 PM

Big, consolidated systems running equally big data applications require both bulletproof reliability and maximum transaction throughput. The new E7 delivers on the first count via a set of reliability, availability, and serviceability (RAS) features including memory mirroring and sparing (like RAID for RAM), recovery from parity errors with DDR4 memory and circuitry that allows firmware to intercept and handle both corrected and uncorrected error events. The E7v3 enhances transaction throughput via additions and updates to Intel’s transaction extensions (TSX) that speed multithreaded database applications using a technique known as hardware lock elision. A base set of TSX functions were included with the E5v3 product last fall, but later disabled due to unspecified bugs. The Haswell E7 fixes and improves upon the E5 TSX feature set and provides up  to 6-times greater OLTP throughput, for example on SAP HANA,  by enabling fine-grain locking performance using coarse-grained code.

Big Systems Ripe For Upgrades

Intel believes the E7v3 can exploit a significant upgrade opportunity in 4-8 socket systems, typically refreshed about every five years, which puts Xeon 7400-series in the crosshairs. Intel also sees the Haswell E7 appealing to organizations consolidating virtualized x86 infrastructure on fewer servers and as a replacement for aging POWER systems.

Intel-Xeon-Shipment-vols

The generational improvements in Xeon performance are dramatic. For example, Intel estimates that the OLTP performance provided by 10 racks of circa 2010 7400-series Xeons can be provided by a single rack of v3 systems. For virtualized workloads, Intel’s benchmarks on VMware ESXi show the Haswell E7 yielding up to a 2.7x performance improvement over a first-generation E7-4800 series. Unfortunately, in an era of high-core-count CPUs, software licenses can sometimes cost much more than the underlying hardware. In these situations, one of the new segment-optimized processor SKUs, which trade off core count for CPU frequency and/or power budget, can provide more bang for the buck. For example, for OLTP applications with high fees per core, Intel estimates substituting its 4-core, high frequency E7-8893 part for the mainstream 18-core 8890 product can deliver roughly the same performance for a 20% savings.

The E7v3 argument against POWER8 systems centers on ROI, namely price/performance and long-term-TCO. According to as yet unpublished SPECint_rate benchmarks from Intel, a high-end E7v3 provides roughly equivalent performance to a POWER8 system cost 10-times the price, based on an analysis including initial CapEx, facilities expenses (power, cooling) and software plus support licenses. Intel’s performance claims of a SPECint_rate_base2006 score of nearly 5,000 for an 8-socket system is plausible since published scores for Cisco and Dell 4-socket E7-8890v3 systems are 2770 and 2740 respectively. Whether by coincidence or a preemptive counter to the E7v3 release, IBM just announced two POWER8 configurations optimized for SAP HANA, one with 24 cores and 1TB of memory, the other with 40 cores and 2TB and both designed for HANA applications. IBM didn’t announce benchmarks for either configuration, but based on the specs these should compete well with a loaded E7 8000 series system.

Intel-Xeon-OLTP-Perf-per-rack

The performance advantages of in-memory databases for analytics workloads operating on large data sets are significant, but costly. 16GB DDR4 server DIMMs run about $200, meaning a terabyte system has $13,000 of RAM. In contrast, a 960GB enterprise SSD can be had for under $700. This 17:1 price difference is the driving force behind innovative new flash storage designs and interfaces. IBM has used the high speed, low-latency CAPI interface to its POWER processors for a memory adapter that makes flash look and perform like internal RAM. At the OpenPOWER Summit, Redis Labs showed comparative results from a large NoSQL app in which a system with 90% CAPI flash provided virtually identical performance (200K IOPS, sub-millisecond latency) to a 100% in-memory database with over a 70% cost savings. At EMC WORLD, the company demonstrated the DSSD rack-scale PCIe flash product executing Hadoop queries for a typical analytic app. On synthetic performance benchmarks the flash array nearly matched native RAM speeds.

DSSD System Source: https://community.emc.com/thread/213405

DSSD System
Source: https://community.emc.com/thread/213405

Creative new flash memory system designs and processor interfaces like DSSD and CAPI along with others sure to follow may mitigate demand for the E7’s target market, namely very high memory, scale-up systems, however the processor’s RAS and other features optimized for mission critical workload should still insure a healthy future for the pinnacle of x86 processor engineering.


 

E7v3 Product SKUs

Intel® Xeon® processor SKU

Cores

Frequency (GHz)

Cache

Price

E7-8890  v3 18 2.5 45M $7175
E7-8880  v3 18 2.3 45M $5896
E7-8880L v3 18 2.0 45M $6062
E7-8870 v3 18 2.1 45M $4672
E7-8893  v3 4 3.2 45M $6841
E7-8891 v3 10 2.8 45M $6841
E7-8867 v3 16 2.5 45M $4672
E7-8860 v3 16 2.2 40M $4060
E7-4850 v3 14 2.2 35M $3004
E7-4830 v3 12 2.1 30M $2169
E7-4820 v3 10 1.9 25M $1502
E7-4809 v3 8 2.0 20M $1224