This is very interesting. I have also been waiting to see if any Xeon SKUs would come out that featured Crystal Well (even if only as an L4 cache). It would be much appreciated if Anandtech could get their hands on one of these for review.
Pretty interesting for low-cost Workstation: dual-core is not an issue for many tasks, and Iris Pro with certified Drivers would be hot.
I use a MacBook Pro 17Inch with Radeon HD 6750M that is SLOWER for nearly any usage than Iris Pro. It's still a workhorse for serious professional usage (including Photography and OpenCL-enabled tools such as Photoshop CC or PixelMator).
So a low-cost Workstation, that is not a performance Workstation but a solidly designed and build computer, created to work 10 or 12hrs a day, for under $1000, is a serious product!
Certain workstation and server loads should benefit tremendously from the L4 cache. But why cripple that chip with a 47 W TDP?! Just give us an unlocked version!
It's the same BGA as is used in mobile parts, just with the Xeon name slapped on now. My guess is that there aren't enough connection points designated for power delivery to support traditional desktop/server level TDPs.
Considering that the memory controller on all Haswell's supports ECC, it is effectively slapping a new sticker on the box. All Intel does to enable/disable ECC is to blow some small fuses on the die. Oddly, Intel does't list which package this chip uses but I'd be surprised if it was different than the other BGA packages used by other mobile Haswells.
I believe that's just a case of fusing it on/off on current processors; not creating separate die for each ram type. While I believe ECC does need a few additional datalines; if it comes out in a BGA package with the same number of balls or just enough extra to enable ECC; it'd clearly just be a reuse of the existing mobile version and thus have all the caveats it does. Only if it comes out in a substantially different package would I expect there to be enough additional power delivery capacity to power substantially higher TDPs.
Intel does have 65W crystalwell parts; I suspect that's probably the power cap, and that they're just testing the water with a single Xeon part. If sales are good they'll probably offer one at that level in the future (unless the higher power through the solider joints impacts system lifetime at high sustained loads); if not it'll vanish into the memory hole until the eDRAM cache moves on die in a future processor generation.
The 47W TDP is likely from the relatively high based clock seed of 750 Mhz on the GPU. Note that the other Crystalwell parts have a 200 Mhz based GPU speed. Things are reversed when talking about GPU turbo as the E3-1284L v3 only his 1 Ghz while the other parts scaled to 1.2 or 1.3 Ghz.
No, the iGPU consumes next to nothing when not under active use. The clock & power gating is really good there, at idle it's far more efficient than any discrete GPU. The 47 W TDP is there to allow the Haswell cores to actually complete some work :)
Hmm, why don't they make 150W TDP Crystalwell whose base clock is >3GHz rather than lower TDP one? It must be the best performing Xeon very suitable for HPC workloads. Maybe the eDRAM is easily affected by heat?
This might be very useful when we see the new mac pro use case.... You could render basic graphics with the integrated, while solely using the 2 big GPUs for openCL calculation, while right now one GPU is always bogged down by openGL screen management.
Oh, I thought Haswell variants of more than one socket Xeon have been released already, which is wrong. Yes, single-socket Xeons are not so suitable for HPC (I'm not sure whether lots of such Xeons densely integrated in racks will be OK though). I hope they'll release Crystalwell version of E5-2687W v2...
Crystalwell is connected over the same CPU IO hookup that is used for QPI in multi-processor links. In theory this means they could turn a 2P Xeon into a 1P + Cystalwell chip, or a 4P Xeon into a 2P model (replacing the ringbus connections with a single point to point link). I don't know if 8P models have more than 2 QPI ports, if they do they could be used to make 4P + crystalwell parts instead.
I think I was mistaken. I went back to doublecheck an article from last month, but it was talking about Crystalwell and the on package southbridge in Haswell SoC models sharing at set of IO pins.
I mean, it's purely x86 CPU, so no eDRAM is requred for it in Intel's opinion. IMHO, for pure GPU-less x86-CPU, eDRAM may only happen for many-core Knights Landing later on: http://www.realworldtech.com/knights-landing-detai... but not for conventional multi-core Haswell-E.
For Knights Landing, Intel has stripped out the GPU functionality. Rather it is a play towards extreme bandwidth considering the ultra high core count involved. Similarly, Haswell-e is likely going to reach 18 cores on the EX part. With the EX line still using FB-DIMM like buffers, the addition of eDRAM would help with any latency penalty due to the buffers. (An alternative would be to put the eDRAM into the memory buffers like IBM has done with the POWER8.)
You don't need a GPU to benefit from Crystalwell. A GPU shows the highest gains from it.. but is still remaining in "rather irrelevant" territory. Unlike a possible 0 - 30 % performance boost per clock for e.g. an i7 4770K.
Fascinating, but hard to guess the use case for low TDP + graphics in a server chip, even with the shiny L4 boost. Servers running virtual desktops? Xserve 2014 edition? :-)
There are a handful of HPC scenarios where this would advantageous. The eDRAM, TSX and coherent memory space with the CPU + GPU should be popular in the development circles even though ultimate performance of a node is rather lacking. Put a chip like this into a SeaMicro or HP's Project Moonshot system and performance can easily scale up by rapidly increasing node code. Overall compute density should be similar to popular dual LGA 2011 Xeon + PCI-e GPU setup today.
Outside of HPC, there are several clustered applications like Hadoop and Casandra that benefit from having many small, fast individual nodes.
At first I had gotten excited but with this being a BGA part, it was all lost. Still hoping for a socket 1150 based part. Otherwise, this part is just another laptop part but with ECC enabled to carry the Xeon name.
Some is probably power related. The BGA1164 parts are limited to a max 28W TDP. They're also limited to dual core parts; but other than impacting power needs I don't think that matters for pinout.
Not gonna happen either, because this L4 version requires 1364 contacts. It is even officially named "Crystall Well" in ARK database, not even "Haswell". This is a different product from LGA1150 by definition, so they won't refit it to LGA1150. Otherwise, this would have happen already. Stop waiting - be realistic. Welcome to Earth.
What a waste of transistors for the gpu in a server based gpu!. WHy don't they put two lots of the eDRAM module in there and really have that huge half a gig L4 cache. It shows how Intel needed the eDRAM just to keep up with the competition in terms of performance.
this smells like it's made with Cupertino in mind. I can easily imagine a refreshed Mac Mini with this CPU/GPU.Hopefully with design cues form teh new Mac Pro.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
35 Comments
Back to Article
S3anister - Monday, March 10, 2014 - link
This is very interesting. I have also been waiting to see if any Xeon SKUs would come out that featured Crystal Well (even if only as an L4 cache). It would be much appreciated if Anandtech could get their hands on one of these for review.iAPX - Friday, March 14, 2014 - link
Pretty interesting for low-cost Workstation: dual-core is not an issue for many tasks, and Iris Pro with certified Drivers would be hot.I use a MacBook Pro 17Inch with Radeon HD 6750M that is SLOWER for nearly any usage than Iris Pro. It's still a workhorse for serious professional usage (including Photography and OpenCL-enabled tools such as Photoshop CC or PixelMator).
So a low-cost Workstation, that is not a performance Workstation but a solidly designed and build computer, created to work 10 or 12hrs a day, for under $1000, is a serious product!
MrSpadge - Monday, March 10, 2014 - link
Certain workstation and server loads should benefit tremendously from the L4 cache. But why cripple that chip with a 47 W TDP?! Just give us an unlocked version!DanNeely - Monday, March 10, 2014 - link
It's the same BGA as is used in mobile parts, just with the Xeon name slapped on now. My guess is that there aren't enough connection points designated for power delivery to support traditional desktop/server level TDPs.jasonelmore - Monday, March 10, 2014 - link
Its got ECC Memory support, which is no simple task of slapping a new sticker on the side of the box.Kevin G - Monday, March 10, 2014 - link
Considering that the memory controller on all Haswell's supports ECC, it is effectively slapping a new sticker on the box. All Intel does to enable/disable ECC is to blow some small fuses on the die. Oddly, Intel does't list which package this chip uses but I'd be surprised if it was different than the other BGA packages used by other mobile Haswells.DanNeely - Monday, March 10, 2014 - link
I believe that's just a case of fusing it on/off on current processors; not creating separate die for each ram type. While I believe ECC does need a few additional datalines; if it comes out in a BGA package with the same number of balls or just enough extra to enable ECC; it'd clearly just be a reuse of the existing mobile version and thus have all the caveats it does. Only if it comes out in a substantially different package would I expect there to be enough additional power delivery capacity to power substantially higher TDPs.Intel does have 65W crystalwell parts; I suspect that's probably the power cap, and that they're just testing the water with a single Xeon part. If sales are good they'll probably offer one at that level in the future (unless the higher power through the solider joints impacts system lifetime at high sustained loads); if not it'll vanish into the memory hole until the eDRAM cache moves on die in a future processor generation.
Kevin G - Monday, March 10, 2014 - link
The 47W TDP is likely from the relatively high based clock seed of 750 Mhz on the GPU. Note that the other Crystalwell parts have a 200 Mhz based GPU speed. Things are reversed when talking about GPU turbo as the E3-1284L v3 only his 1 Ghz while the other parts scaled to 1.2 or 1.3 Ghz.MrSpadge - Tuesday, March 11, 2014 - link
No, the iGPU consumes next to nothing when not under active use. The clock & power gating is really good there, at idle it's far more efficient than any discrete GPU. The 47 W TDP is there to allow the Haswell cores to actually complete some work :)k2_8191 - Monday, March 10, 2014 - link
Hmm, why don't they make 150W TDP Crystalwell whose base clock is >3GHz rather than lower TDP one?It must be the best performing Xeon very suitable for HPC workloads.
Maybe the eDRAM is easily affected by heat?
ShieTar - Monday, March 10, 2014 - link
How much HPC is happening on Single-Socket Workstations, really?Torrijos - Monday, March 10, 2014 - link
This might be very useful when we see the new mac pro use case....You could render basic graphics with the integrated, while solely using the 2 big GPUs for openCL calculation, while right now one GPU is always bogged down by openGL screen management.
Adding-Color - Monday, March 10, 2014 - link
Freeing the big GPU from screen rendering work will be perfect.Hope this will be the case for most workstations in the not so distant future.
k2_8191 - Monday, March 10, 2014 - link
Oh, I thought Haswell variants of more than one socket Xeon have been released already, which is wrong.Yes, single-socket Xeons are not so suitable for HPC (I'm not sure whether lots of such Xeons densely integrated in racks will be OK though).
I hope they'll release Crystalwell version of E5-2687W v2...
Kevin G - Monday, March 10, 2014 - link
Haswell-E is rumored to also have eDRAM and a lot of it. With quad channel DDR4 support, Haswell-E is going to be a bandwidth monster.TiGr1982 - Monday, March 10, 2014 - link
I think, it won't happen. Haswell-E has NO GPU by definition, being just a renamed GPU-less Xeon E5 v3, so NO eDRAM as well, I suppose.DanNeely - Monday, March 10, 2014 - link
Crystalwell is connected over the same CPU IO hookup that is used for QPI in multi-processor links. In theory this means they could turn a 2P Xeon into a 1P + Cystalwell chip, or a 4P Xeon into a 2P model (replacing the ringbus connections with a single point to point link). I don't know if 8P models have more than 2 QPI ports, if they do they could be used to make 4P + crystalwell parts instead.Kevin G - Monday, March 10, 2014 - link
Where are you getting that they're using QPI for the eDRAM link? I was under the impression that it was something unique and proprietary.DanNeely - Monday, March 10, 2014 - link
I think I was mistaken. I went back to doublecheck an article from last month, but it was talking about Crystalwell and the on package southbridge in Haswell SoC models sharing at set of IO pins.http://www.anandtech.com/show/7744/intel-reveals-n...
TiGr1982 - Monday, March 10, 2014 - link
I mean, it's purely x86 CPU, so no eDRAM is requred for it in Intel's opinion.IMHO, for pure GPU-less x86-CPU, eDRAM may only happen for many-core Knights Landing later on:
http://www.realworldtech.com/knights-landing-detai...
but not for conventional multi-core Haswell-E.
Kevin G - Monday, March 10, 2014 - link
For Knights Landing, Intel has stripped out the GPU functionality. Rather it is a play towards extreme bandwidth considering the ultra high core count involved. Similarly, Haswell-e is likely going to reach 18 cores on the EX part. With the EX line still using FB-DIMM like buffers, the addition of eDRAM would help with any latency penalty due to the buffers. (An alternative would be to put the eDRAM into the memory buffers like IBM has done with the POWER8.)MrSpadge - Tuesday, March 11, 2014 - link
You don't need a GPU to benefit from Crystalwell. A GPU shows the highest gains from it.. but is still remaining in "rather irrelevant" territory. Unlike a possible 0 - 30 % performance boost per clock for e.g. an i7 4770K.tipoo - Monday, March 10, 2014 - link
It will be very interesting to see what that huge L4 cache can do for the more professional/heavy use side of things like servers.extide - Monday, March 10, 2014 - link
I wonder if this might end up in a Dell Prescision/HP ElieteBook laptop... That could be pretty sweet!twotwotwo - Monday, March 10, 2014 - link
Fascinating, but hard to guess the use case for low TDP + graphics in a server chip, even with the shiny L4 boost. Servers running virtual desktops? Xserve 2014 edition? :-)Kevin G - Monday, March 10, 2014 - link
There are a handful of HPC scenarios where this would advantageous. The eDRAM, TSX and coherent memory space with the CPU + GPU should be popular in the development circles even though ultimate performance of a node is rather lacking. Put a chip like this into a SeaMicro or HP's Project Moonshot system and performance can easily scale up by rapidly increasing node code. Overall compute density should be similar to popular dual LGA 2011 Xeon + PCI-e GPU setup today.Outside of HPC, there are several clustered applications like Hadoop and Casandra that benefit from having many small, fast individual nodes.
Kevin G - Monday, March 10, 2014 - link
At first I had gotten excited but with this being a BGA part, it was all lost. Still hoping for a socket 1150 based part. Otherwise, this part is just another laptop part but with ECC enabled to carry the Xeon name.dolphinboy150 - Monday, March 10, 2014 - link
According to Intel Ark, the Crystalwell BGA parts have more pins:http://ark.intel.com/products/76086
Sockets supported: FCBGA1364
As to what all those extra pins are for isn't clear tho.
DanNeely - Monday, March 10, 2014 - link
Some is probably power related. The BGA1164 parts are limited to a max 28W TDP. They're also limited to dual core parts; but other than impacting power needs I don't think that matters for pinout.TiGr1982 - Monday, March 10, 2014 - link
Not gonna happen either, because this L4 version requires 1364 contacts.It is even officially named "Crystall Well" in ARK database, not even "Haswell". This is a different product from LGA1150 by definition, so they won't refit it to LGA1150. Otherwise, this would have happen already. Stop waiting - be realistic. Welcome to Earth.
dstockwell23 - Monday, March 10, 2014 - link
once edram moves on die- is there much difference between adding L4 or just cranking up the amount of L3? bandwidth/latency/power gating? sram size?DanNeely - Monday, March 10, 2014 - link
Dram is slower but more transistor dense than sram, so the size/performance tradeoffs will be different.fteoath64 - Wednesday, March 12, 2014 - link
What a waste of transistors for the gpu in a server based gpu!. WHy don't they put two lots of the eDRAM module in there and really have that huge half a gig L4 cache. It shows how Intel needed the eDRAM just to keep up with the competition in terms of performance.CalaverasGrande - Friday, March 14, 2014 - link
this smells like it's made with Cupertino in mind.I can easily imagine a refreshed Mac Mini with this CPU/GPU.Hopefully with design cues form teh new Mac Pro.
GreenReaper - Friday, October 20, 2017 - link
Indeed, these ended up in the HP Moonshot/ProLiant m710/m710p/m710x server cartridges, specifically designed for video processing:https://www.hpe.com/uk/en/product-catalog/servers/...