The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads
by Johan De Gelas on March 31, 2016 12:30 PM EST- Posted in
- CPUs
- Intel
- Xeon
- Enterprise
- Enterprise CPUs
- Broadwell
We have been spoiled. Since the introduction of the Xeon "Nehalem" 5500 (Xeon 5500, March 2009), Intel has been increasing the core counts of their Xeon CPUs by nearly 50% almost every 18 months. We went from four to six (Xeon 5600) on June 2010. Sandy Bridge (Xeon E5-2600, March 2012) increased the core count to 8. That is only 33% more cores, but each core was substantially faster than the previous generation. Ivy Bridge EP (Xeon E5-2600 v2, launched September 2013) increased the core count from 8 to 12, the Haswell-EP (Xeon E5-2600 v3, sept 2014) surprised with an 18-core flagship SKU.
However it could not go on forever. Sooner or later Intel would need to slow down a bit on adding cores, for both power and space reasons, and today Intel has finally pumped the brakes a bit.
Launching today is the latest generation of Intel's Xeon E5 processors, the Xeon E5 v4 series.Fifteen months after Intel's Broadwell architecture and 14nm process first reached consumers, Broadwell has finally reached the multi-socket server space with Broadwell-EP. Like past EP cores, Broadwell-EP is the bigger, badder sibling of the consumer Broadwell parts, offering more cores, more memory bandwidth, more cache, and more server-focused features. And thanks to the jump from their 22nm process to their current-generation 14nm process, Intel gets to reap the benefits of a smaller, denser process.
Getting back to our discussion of core counts then, even with the jump to 14nm, Intel has played it more conservatively with their core counts. Compared to the Xeon E5 v3 (Haswell-EP), Xeon E5 v4 (Broadwell-EP) makes a smaller jump, going from 18 cores to 24 cores, for an increase of 33%. Yet even then, for the new Xeon E5 v4 "only" 22 cores are activated, so we won't get to see everything Broadwell-EP is capable of right away.
Meanwhile the highest (turbo) clockspeed is still 3.6 GHz, base clocks are reduced with one or two steps and the core improvements are very modest (+5%). Consequently, performance wise, this is probably the least spectacular product refresh we have seen in many years.
But there are still enough paper specs that make the Broadwell version of the Xeon E5 attractive. It finds a home in the same LGA 2011-3 socket. Few people will in-place upgrade from Xeon E5 v3s to Xeon E5 v4s, but using the same platform means less costs for the server vendors, and more software maturity (drivers etc.) for the buyers.
They look very different but fit in the same socket: Xeon E5 v4 on top, Xeon E5 v3 at the bottom
Broadwell also has several features that make it a more attractive processor for virtualized servers. Finer granular control over how applications share the uncore (caches and memory bandwidth) to avoid scenarios where low priority applications slow down high priority ones. Meanwhile quite a few improvements have been made to make the I/O intensive applications run smoother on top of a virtualized layer. Most businesses run their applications virtualized and virtualization is still the key ingredient of the fast growing cloud services (Amazon, Digital Ocean, Azure...), and more and more telecom operators are starting to virtualized their services, so these new features will definitely be put to good use. And of course, Intel made quite a few subtle - but worth talking about - tweaks to keep the HPC (mostly "simulation" and "scientific calculation software) crowd happy.
But don't make the mistake to think that only virtualization and HPC are the only candidates for the new up-to-22-cores Xeons. The newest generation of data analytics frameworks have made enormous performance steps forward by widening the network and storage bandwidth bottlenecks. One example is Apache Spark, which can crunch through terabytes of data much more efficiently than its grandparent Hadoop by making better use of RAM. To get results out of a massive hump of text data, for example, you can use some of most advanced statistical and machine learning algorithms. Mix machine learning with data mining and you get an application that is incredibly CPU-hungry but does not need the latest and fastest NVMe-based SSDs to keep the CPU busy.
Yes, we are proud to present our new benchmark based upon Apache Spark in this review. Combining analytics software with machine learning to get deeper insights is one of the most exciting trends in the enterprise world. And it is also one of the reason why even a 22-core Broadwell is still not fast enough.
112 Comments
View All Comments
iwod - Thursday, March 31, 2016 - link
Maximum memory still 768GB?What happen to the 5.1Ghz Xeon E5?
Ian Cutress - Thursday, March 31, 2016 - link
I never saw anyone with a confirmed source for that, making me think it's a faked rumor. I'll happily be proved wrong, but nothing like a 5.1 GHz part was announced today.Brutalizer - Saturday, April 2, 2016 - link
It would have been interesting to bench to the best cpu today, the SPARC M7. For instance:-SAP: two M7 cpu scores 169.000 saps vs 109.000 saps for two of this Broadwell-EP cpus
-Hadoop, sort 10TB data: one SPARC M7 server with four cpus, finishes the sort in 4,260 seconds. Whereas a cluster of 32 PCs equipped with dual E5-2680v2 finishes in 1,054 seconds, i.e. 64 Intel Xeon cpus vs four SPARC M7 cpus.
-TPC-C: one SPARC M7 server with one cpu gets 5,000,000 tpm, whereas one server with two E5-2699v3 cpus gets 3.600.000 tpm
-Memory bandwidth, Stream triad: one SPARC M7 reaches 145 GB/sec, whereas two of these Broadwell-EP cpus reaches 119GB/sec
-etc. All these benchmarks can be found here, and another 25ish benchmarks where SPARC M7 is 2-3x faster than E5-2699v3 or POWER8 (all the way up to 11x faster):
https://blogs.oracle.com/BestPerf/entry/20151025_s...
Brutalizer - Saturday, April 2, 2016 - link
BTW, all these SPARC M7 benchmarks are almost unaffected if encryption is turned on, maybe 2-5% slower. Whereas if you turn on encryption for x86 and POWER8, expect performance to halve or even less. Just check the benchmarks on the link above, and you will see that SPARC M7 benchmarks are almost unaffected encrypted or not.JohanAnandtech - Saturday, April 2, 2016 - link
"if you turn on encryption for x86 and POWER8, expect performance to halve or even less". And this is based upon what measurement? from my measurements, both x86 and POWER8 loose like 1-3% when AES encryption is on. RSA might be a bit worse (2-10%), but asymetric encryption is mostly used to open connections.Brutalizer - Wednesday, April 6, 2016 - link
If we talk about how encryption affects performance, lets look at this benchmark below. Never mind the x86 is slower than the SPARC M7, let us instead look at how encryption affects the cpus. What performance hit has encryption?https://blogs.oracle.com/BestPerf/entry/20160315_t...
-For x86 we see that two E5-2699v3 cpus utilization goes from 40% without crypto, up to 80% with crypto. This leaves the x86 server with very little headroom to do anything else than executing one query. At the same time, the x86 server took 25-30% longer time to process the query. This shows that encryption has a huge impact on x86. You can not do useful work with two x86 cpus, except executing a query. If you need to do additional work, get four x86 xeons instead.
-If we look at how SPARC M7 gets affected by encryption, we see that cpu utilization went up from 30% up to 40%. So you have lot of headroom to do additional work while processing the query. At the same time, the SPARC cpu took 2% longer time to process the query.
It is not really interesting that this single SPARC M7 cpu is 30% faster than two E5-2699v3 in absolute numbers. No, we are looking at how much worse the performance gets affected when we turn on encryption. In case of x86, we see that the cpus gets twice the load, so they are almost fully loaded, only by turning on encryption. At the same time taking longer time to process the work. Ergo, you can not do any additional work with x86 with crypto. With SPARC, it ends up with 40% cpu utilization so you can do additional work on SPARC, and process time does not increase at all (2%). This proves that x86 encryption halves performance or worse.
For your own AES encryption benchmark, you should also see how much cpu utilization goes up. If it gets fully loaded, you can not do any useful work except handling encryption. So you need an additional cpu to do the actual work.
JohanAnandtech - Saturday, April 2, 2016 - link
Two M7 machines start at 90k, while a dual Xeon is around 20k. And most of those Oracle are very intellectually dishonest: complicated configurations to get the best out of the M7 machines, midrange older x86 configurations (10-core E5 v2, really???)Brutalizer - Wednesday, April 6, 2016 - link
The "dishonest" benchmarks from Oracle, are often (always?) using what is published. If for instance, IBM only has one published benchmark, then Oracle has no other choice than use it, right? Of course when there are faster IBM benchmarks out there, Oracle use that. Same with x86. In all these 25ish cases we see that SPARC M7 is 2-3x faster, all the way up to 11x faster. The benhcmarks vary very much, raw compute power, databases, deep learning, SAP, etc etcPhil_Oracle - Thursday, May 12, 2016 - link
I disagree Johan! You don't appear to know much about the new SPARC M7 systems and suggest you do a full evaluation before making such remarks. A SPARC T7-1 with 32-cores has a list price of about $39K outperforms a 2-socket 36-core E5-2699v3 anywhere from 38% (OLTP HammerDB) to over 8x faster (OLTP w/ in-memory analytics). A similarly configured *enterprise* class 2-socket 36-core E5-2699v3 from HPE or Cisco lists for $25K+, so in terms of price/performance, the SPARC T7-1 beats the 2-socket E5-2699v3. And if you take into account SW that’s licensed per core, the SPARC M7 is 60% to 2.6x faster/core, dramatically lowering licensing costs. With the new E5-2699v4, providing ~20% more cores at roughly the same price, gets closer, but with performance/core not changing much with E5 v4, SPARC M7 still has a huge lead. And the difference is while the E5 v3/v4 chips don't scale beyond 2-socket, you can get an SPARC M7 system up to 16-sockets with the almost identical price/performance of the 1-socket system.adamod - Friday, June 3, 2016 - link
BUT CAN IT PLAY CRYSIS?????