One of these days I should give the A1's on AWS a test to see the performance is like. They are much cheaper per core, but what is the performance trade off for my work loads? And to test and verify your stack, you have todo it on AWS, not a local computer.
You'd think I'd have put NVIDIA in the title for SEO if that was the case. That being said, Ampere is a lead company for NVIDIA's recent CUDA-on-Arm push. So you might be able to buy Ampere on Ampere.
This is why it’s so hard to take the anti-ARM carping seriously, when the people engaged in it don’t even know the most basic facts about what they are criticizing...
I would expect it to have higher performance than Rome given Neoverse N1 has higher IPC and 25% extra cores. It should run at a higher frequency than Graviton2 since it uses far more than 25% extra power.
Arm having higher IPC is non-controversial - it's well known that recent Arm micorarchitectures are as wide or wider than x86. Of course they clock lower than the fastest desktop chips, however clocks are similar for server parts, hence the IPC advantage matters. You should be able to try it out for yourself soon of course.
As I said, I'll believe it when I see it. At the moment, it's all talk and no proof. N1 will be closely related to A76, which definitely has lower IPC than Zen 2. Even A77 has lower IPC. So I don't get where you get your 'facts'.
Don't tell me that you normalize that SPEC score? If CPU architectures have thought you anything, you would have known that performance doesn't scale linearly with clock frequency. Not only does it not scale linearly, an architecture designed to run sub-3GHz will often perform better normalized to an architecture designed to run over 4GHz (11-stage vs 19-stage). ARM cores just can't run at such high clocks.
You do understand what IPC means, right? It normalises on clock frequency since that is its definition! If you understand microarchitecture you would know why IPC doesn't vary much with frequency.
We're talking about servers here - high core count parts have base frequencies near 2GHz. Using a 5GHz core into a server is a bad idea, it's much better to design a server core for 3GHz operation.
The question is not whether QuickSilver beats Rome, but whether it beats next-gen Milan too.
It's not going to beat Rome. Like every past ARM server iteration it's going to be a profound disappointment. Maybe good as an nginx dispatcher but worthless for anything involving computation. That's my prediction based on the history of ARM server cores.
It surely is going to beat Rome. As for serious computation, you don't even know that there are now several Arm servers in the TOP500 list (as well as #1 in the GREEN500 list)?
You're clearly completely ignorant about Arm servers...
@Wilco1: "It normalises on clock frequency since that is its definition" True but incomplete. Memory latency does not scale with frequency. 70ns is 70ns @1GHz or 5GHz. In the case of Zen's architecture, the fabric also does not get faster with core frequency. So, running Zen with a faster/overclocked memory is also incorrect. If you really want to do an honest IPC comparison, you should lock the core frequency AND memory latency/timings and then do your comparison. Spec score/Ghz is a ratio of workload runtime per unit frequency not IPC.
We're running CPUs at 5GHz while DRAM barely runs at 10MHz (100ns). So how is that even possible? By making the CPU completely independent of this slow memory by adding huge caches, 3-4 levels of caches, automatic prefetching, out of order execution, and supporting ~50 outstanding cache misses.
As a result performance almost perfectly scales with frequency on most applications, particularly SPEC which is very compute intensive (and which fits in caches when run single-threaded). So IPC doesn't vary with frequency.
@ Wilco1: "By making the CPU completely independent of this slow memory " Except it is not independent, is it? Prefetching is not magic. It doesn't get every pattern: pointer chases cannot be prefetched, data dependent accesses like binary tree traversals the same (also a form of pointer chase), other forms of irregular data accesses. Also, the prefetch distance @5GHz has to be much higher than at 2.5GHz to cover for the extra latency in terms of cycles. LLC miss rates even on spec are low double digit MPKI. Not all applications have MLP to exploit in the instruction windows that are currently implemented. So no, caches work, but are not perfect. If they were, you would be right, but they are not and will never be. If we had perfect scaling, speed demons would be the obvious implementation choice and we would have 10GHz CPUs, maybe even higher. Memory latency was and is the stumbling block. I suggest trying it out before making such statements because you clearly are mistaken.
Granted, you don't see the electric bill, nor the cooling requirements, since you're probably using it in the cloud. However, they will translate that into lower hourly pricing.
We'll have to wait and see what the performance actually looks like.
Also, I wonder if the 200 W figure isn't at clocks above the peak efficiency point. I wonder how this would compare with EPYC, if you both at their most efficient clock speeds.
you don't know that. And with 128 PCi-e lanes it could make sense for at full GPU build where it's not hardly CPU dependent you could have a full 16 lane per GPU. If it's cheap this could make a lot of sense and as others have said, the IPC on new arm cpu's are quite impresive, even compared to the EPYC cpu's
"We’ve seen the likes of AppliedMicro/Ampere, Calxeda, Broadcom/Cavium/Marvell, Qualcomm, Huawei, Fujitsu, Annapurna/Amazon, "
It was nice Ian put in all the names in one place. It happens to me when discussing Amazon's new instances how all ARM's design vendor changes hand and we cant keep up with the names. That sentence clear things up a lot.
Now that Amazon has kickstart the whole thing to different level. Qualcomm is out, ( may be out a little too early ) We are left with Ampere, Calxeda, Marvell, Huawei, Fujitsu, and the new Nuvia.
It is likely Microsoft and Google are now looking purchase one for themselves, so Ampere and Calxeda might be target as both of them dont have any real recurring revenue backing them to sustain the race.
Which leaves Marvell to gain from all the smaller players who wants ARM Server CPU.
I'm pretty sure Marvel killed their chip a while ago, along with Broadcom selling their chip off to someone else. Qualacom's chip was killed by an activist investor and most of the other players either shut down their initiative or sold their design to someone else.
At this point just about the only chance ARM server cores really have is if Amazon or Google fully embraces them and starts designing their own server chips. Otherwise it's going to be the same shit show over and over again as sub-par noncompetitive parts are announced and mostly abandoned after they get samples back over and over again
As someone else said, I'll believe in a competent ARM server chip when I actually see one.
Wrong about Marvell. They have plans to release their chip next year so they will be competing with Ampere/Graviton2. Will be interesting to watch how they fare.
Wrong about everything. Marvell is at their 3rd generation Arm servers with upcoming ThunderX3. The Broadcom Vulcan core became ThunderX2. Amazon are now at their 2nd generation of Arm server chips. This 80-core server is Ampere's 2nd generation, or 3rd generation if you count the XGene-1. HiSilicon is at their 3rd generation too with their custom Kunpeng 920. Fujitsu is at their first generation with A64FX.
Given your display of intellect here, you don't seem competent enough to decide what a competent chip is.
I make no claim that I keep track of the comings and goings in this market segment. There are been over a dozen companies enter, create a product and then either abandon the product or sell it to another company. Several of the companies have abandoned a product then bought a competitor with a new product.
The Thunder product line is the only open market purchasable ARM server product. The original was abandoned before it was even produced as far as I can tell when the ThunderX2 was purchased from a competitor. X2 was a computationally weak thread engine that only has one or two good possible uses and it's sales are likely pathetically small. Now that they've purchased yet another abandoned product they've got yet another completely new iteration called X3 that will likely also be a computationally weak product with lots of thread capability.
You like to keep pointing out all these other "products" that you can't actually buy that have never been reviewed or bench-marked. I get it, your a fan-boi who can't help it (along with the personal insults), but the ARM server market will remain an essentially non-existent product until demonstrated otherwise. It doesn't matter how many vapor ware products you can point at if you can't actually buy any of them. As of right now the Thunder product line is the only available ARM product available for commercial purchase and AFAIK the only one that's been bench marked.
And Thunder isn't something to cheer about, it's 2 or 3 separately design products with no continuity. You can't sell in the server market without consistency and evolutionary upgrades. With each version of Thunder being a completely different product (designed by a completely different groups with completely different features) it simply won't get traction in the general server market because there is no predictability. And companies that don't understand this fundamental of the Server market fail at it.
IMO there has been no market viable ARM server product produced yet. Maybe there will be some day, but very few people are going to believe it until it happens. And that product won't see success until they can iterate and improve it over multiple generations with consistency in design.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
55 Comments
Back to Article
quiksilvr - Monday, December 23, 2019 - link
For some reason I really like the codename of this CPU...carcakes - Tuesday, December 24, 2019 - link
ARM - turbo boost clocks!MrEcho - Monday, December 23, 2019 - link
One of these days I should give the A1's on AWS a test to see the performance is like. They are much cheaper per core, but what is the performance trade off for my work loads?And to test and verify your stack, you have todo it on AWS, not a local computer.
hetzbh - Monday, December 23, 2019 - link
Someone needs to learn how to set the resolution on Ubuntu ;)kaidenshi - Tuesday, December 24, 2019 - link
What, you don't compute at 1024x768 stretched? Filthy casual! ;-)teamet - Monday, December 23, 2019 - link
Clicked this to learn more about next-gen ampere from nvidia. The read was a little disappointing in that contextIan Cutress - Monday, December 23, 2019 - link
You'd think I'd have put NVIDIA in the title for SEO if that was the case.That being said, Ampere is a lead company for NVIDIA's recent CUDA-on-Arm push.
So you might be able to buy Ampere on Ampere.
cpuaddicted - Monday, December 23, 2019 - link
Such a nightmare for the marketing guys at Ampere Computing to go against the next gen console from Nvidia.Yojimbo - Monday, December 23, 2019 - link
Before they chose the name Ampere, they called themselves Project Denver Holdings. Maybe they are obsessed with NVIDIA.yannigr2 - Tuesday, December 24, 2019 - link
Or, maybe, free publicity.Vince789 - Monday, December 23, 2019 - link
The 80-Core in the title is a clear sign it's not a GPU articlemode_13h - Tuesday, December 24, 2019 - link
The GV100, featured on the Titan V and Tesla V100, has 80 streaming multiprocessors. Those are basically equivalent to CPU cores.Vince789 - Tuesday, December 24, 2019 - link
Yep, but Nvidia never refers to them "GPU cores" lolmicklevin - Monday, December 23, 2019 - link
LOL, what's VMware screen doing on ARM server?name99 - Tuesday, December 24, 2019 - link
You do realize that VMware runs on ARM, right?https://blogs.vmware.com/vsphere/2019/10/esxi-on-a...
This is why it’s so hard to take the anti-ARM carping seriously, when the people engaged in it don’t even know the most basic facts about what they are criticizing...
Harekm - Tuesday, December 24, 2019 - link
Why would you buy this over a 64 core Epyc?yeeeeman - Tuesday, December 24, 2019 - link
Better price maybe? Performance is probably lower and power is in the same ballpark.Wilco1 - Tuesday, December 24, 2019 - link
I would expect it to have higher performance than Rome given Neoverse N1 has higher IPC and 25% extra cores. It should run at a higher frequency than Graviton2 since it uses far more than 25% extra power.milli - Tuesday, December 24, 2019 - link
Higher IPC than Rome? I'll believe it when I see it.Wilco1 - Tuesday, December 24, 2019 - link
Arm having higher IPC is non-controversial - it's well known that recent Arm micorarchitectures are as wide or wider than x86. Of course they clock lower than the fastest desktop chips, however clocks are similar for server parts, hence the IPC advantage matters. You should be able to try it out for yourself soon of course.milli - Tuesday, December 24, 2019 - link
As I said, I'll believe it when I see it. At the moment, it's all talk and no proof.N1 will be closely related to A76, which definitely has lower IPC than Zen 2. Even A77 has lower IPC. So I don't get where you get your 'facts'.
Wilco1 - Wednesday, December 25, 2019 - link
Cortex-A77 most definitely has higher IPC than Skylake and Zen 2, on SPECINT as well as SPECFP: https://images.anandtech.com/doci/15207/SPEC2006_S...That's despite being a small phone SoC. Servers have much larger caches and memory system so Neoverse N1 performs much better: https://www.anandtech.com/show/13959/arm-announces...
milli - Wednesday, December 25, 2019 - link
Don't tell me that you normalize that SPEC score? If CPU architectures have thought you anything, you would have known that performance doesn't scale linearly with clock frequency.Not only does it not scale linearly, an architecture designed to run sub-3GHz will often perform better normalized to an architecture designed to run over 4GHz (11-stage vs 19-stage). ARM cores just can't run at such high clocks.
Wilco1 - Thursday, December 26, 2019 - link
You do understand what IPC means, right? It normalises on clock frequency since that is its definition! If you understand microarchitecture you would know why IPC doesn't vary much with frequency.We're talking about servers here - high core count parts have base frequencies near 2GHz. Using a 5GHz core into a server is a bad idea, it's much better to design a server core for 3GHz operation.
The question is not whether QuickSilver beats Rome, but whether it beats next-gen Milan too.
rahvin - Friday, December 27, 2019 - link
It's not going to beat Rome. Like every past ARM server iteration it's going to be a profound disappointment. Maybe good as an nginx dispatcher but worthless for anything involving computation. That's my prediction based on the history of ARM server cores.Wilco1 - Saturday, December 28, 2019 - link
It surely is going to beat Rome. As for serious computation, you don't even know that there are now several Arm servers in the TOP500 list (as well as #1 in the GREEN500 list)?You're clearly completely ignorant about Arm servers...
deltaFx2 - Saturday, December 28, 2019 - link
@Wilco1: "It normalises on clock frequency since that is its definition" True but incomplete. Memory latency does not scale with frequency. 70ns is 70ns @1GHz or 5GHz. In the case of Zen's architecture, the fabric also does not get faster with core frequency. So, running Zen with a faster/overclocked memory is also incorrect. If you really want to do an honest IPC comparison, you should lock the core frequency AND memory latency/timings and then do your comparison. Spec score/Ghz is a ratio of workload runtime per unit frequency not IPC.Wilco1 - Sunday, December 29, 2019 - link
We're running CPUs at 5GHz while DRAM barely runs at 10MHz (100ns). So how is that even possible? By making the CPU completely independent of this slow memory by adding huge caches, 3-4 levels of caches, automatic prefetching, out of order execution, and supporting ~50 outstanding cache misses.As a result performance almost perfectly scales with frequency on most applications, particularly SPEC which is very compute intensive (and which fits in caches when run single-threaded). So IPC doesn't vary with frequency.
deltaFx2 - Sunday, December 29, 2019 - link
@ Wilco1: "By making the CPU completely independent of this slow memory " Except it is not independent, is it? Prefetching is not magic. It doesn't get every pattern: pointer chases cannot be prefetched, data dependent accesses like binary tree traversals the same (also a form of pointer chase), other forms of irregular data accesses. Also, the prefetch distance @5GHz has to be much higher than at 2.5GHz to cover for the extra latency in terms of cycles. LLC miss rates even on spec are low double digit MPKI. Not all applications have MLP to exploit in the instruction windows that are currently implemented. So no, caches work, but are not perfect. If they were, you would be right, but they are not and will never be. If we had perfect scaling, speed demons would be the obvious implementation choice and we would have 10GHz CPUs, maybe even higher. Memory latency was and is the stumbling block. I suggest trying it out before making such statements because you clearly are mistaken.Ian Cutress - Tuesday, December 24, 2019 - link
Epyc is x86. This is Arm.mode_13h - Tuesday, December 24, 2019 - link
Power-efficiency.Granted, you don't see the electric bill, nor the cooling requirements, since you're probably using it in the cloud. However, they will translate that into lower hourly pricing.
deltaFx2 - Saturday, December 28, 2019 - link
See Max TDP of 200+W. What power efficiency?Wilco1 - Sunday, December 29, 2019 - link
Power efficiency is performance/W, so just considering TDP says nothing about power efficiency.Max TDP may seem high, but we're talking about 80 serious cores here, so performance is high too.
mode_13h - Wednesday, January 1, 2020 - link
Yes, thanks.We'll have to wait and see what the performance actually looks like.
Also, I wonder if the 200 W figure isn't at clocks above the peak efficiency point. I wonder how this would compare with EPYC, if you both at their most efficient clock speeds.
mode_13h - Tuesday, December 24, 2019 - link
Another reason could be the lack of SMT, for the paranoid. Maybe Epyc wouldn't be competitive without SMT.mode_13h - Tuesday, December 24, 2019 - link
Assuming they would disable it, on Epyc, of course.The_Assimilator - Tuesday, December 24, 2019 - link
80 cores of trash.olde94 - Thursday, December 26, 2019 - link
you don't know that. And with 128 PCi-e lanes it could make sense for at full GPU build where it's not hardly CPU dependent you could have a full 16 lane per GPU. If it's cheap this could make a lot of sense and as others have said, the IPC on new arm cpu's are quite impresive, even compared to the EPYC cpu'sksec - Tuesday, December 24, 2019 - link
"We’ve seen the likes of AppliedMicro/Ampere, Calxeda, Broadcom/Cavium/Marvell, Qualcomm, Huawei, Fujitsu, Annapurna/Amazon, "It was nice Ian put in all the names in one place. It happens to me when discussing Amazon's new instances how all ARM's design vendor changes hand and we cant keep up with the names. That sentence clear things up a lot.
Now that Amazon has kickstart the whole thing to different level. Qualcomm is out, ( may be out a little too early ) We are left with Ampere, Calxeda, Marvell, Huawei, Fujitsu, and the new Nuvia.
It is likely Microsoft and Google are now looking purchase one for themselves, so Ampere and Calxeda might be target as both of them dont have any real recurring revenue backing them to sustain the race.
Which leaves Marvell to gain from all the smaller players who wants ARM Server CPU.
mode_13h - Tuesday, December 24, 2019 - link
We still don't know (and I somewhat doubt) that Nuvia is ARM-based.I imagine Google's CPU ambitions are loftier than this. Microsoft's, too. Probably.
ksec - Tuesday, December 24, 2019 - link
It was confirmed to be an ARM offering, but wasn't clear whether it was ARM v9 or not. The update at the end of the article.https://techcrunch.com/2019/11/15/three-of-apple-a...
mode_13h - Wednesday, January 1, 2020 - link
Thanks.Freeb!rd - Wednesday, December 25, 2019 - link
And they both are probably "kicking themselves" for not trying to buy out AMD at $10/share when it was a $5 stock...Ian Cutress - Wednesday, December 25, 2019 - link
I should add Phytium. Completely forgot about them. Sounds like they'll be in Tianhe-3.Wilco1 - Thursday, December 26, 2019 - link
There is also another Arm server startup: https://bamboosystems.io/mode_13h - Wednesday, January 1, 2020 - link
Heh, I was sure it was going to be a Chinese player, but it seems the principals are all white guys.mode_13h - Wednesday, January 1, 2020 - link
Wouldn't Socionext count, as well? Not custom cores, but they seem to be aiming for the server market and have some pretty big backing.SarahKerrigan - Thursday, December 26, 2019 - link
Calxeda has been defunct for years. Their last chip was A15-based.rahvin - Friday, December 27, 2019 - link
I'm pretty sure Marvel killed their chip a while ago, along with Broadcom selling their chip off to someone else. Qualacom's chip was killed by an activist investor and most of the other players either shut down their initiative or sold their design to someone else.At this point just about the only chance ARM server cores really have is if Amazon or Google fully embraces them and starts designing their own server chips. Otherwise it's going to be the same shit show over and over again as sub-par noncompetitive parts are announced and mostly abandoned after they get samples back over and over again
As someone else said, I'll believe in a competent ARM server chip when I actually see one.
cpuaddicted - Friday, December 27, 2019 - link
Wrong about Marvell. They have plans to release their chip next year so they will be competing with Ampere/Graviton2. Will be interesting to watch how they fare.Wilco1 - Saturday, December 28, 2019 - link
Wrong about everything. Marvell is at their 3rd generation Arm servers with upcoming ThunderX3. The Broadcom Vulcan core became ThunderX2. Amazon are now at their 2nd generation of Arm server chips. This 80-core server is Ampere's 2nd generation, or 3rd generation if you count the XGene-1. HiSilicon is at their 3rd generation too with their custom Kunpeng 920. Fujitsu is at their first generation with A64FX.Given your display of intellect here, you don't seem competent enough to decide what a competent chip is.
rahvin - Monday, December 30, 2019 - link
I make no claim that I keep track of the comings and goings in this market segment. There are been over a dozen companies enter, create a product and then either abandon the product or sell it to another company. Several of the companies have abandoned a product then bought a competitor with a new product.The Thunder product line is the only open market purchasable ARM server product. The original was abandoned before it was even produced as far as I can tell when the ThunderX2 was purchased from a competitor. X2 was a computationally weak thread engine that only has one or two good possible uses and it's sales are likely pathetically small. Now that they've purchased yet another abandoned product they've got yet another completely new iteration called X3 that will likely also be a computationally weak product with lots of thread capability.
You like to keep pointing out all these other "products" that you can't actually buy that have never been reviewed or bench-marked. I get it, your a fan-boi who can't help it (along with the personal insults), but the ARM server market will remain an essentially non-existent product until demonstrated otherwise. It doesn't matter how many vapor ware products you can point at if you can't actually buy any of them. As of right now the Thunder product line is the only available ARM product available for commercial purchase and AFAIK the only one that's been bench marked.
And Thunder isn't something to cheer about, it's 2 or 3 separately design products with no continuity. You can't sell in the server market without consistency and evolutionary upgrades. With each version of Thunder being a completely different product (designed by a completely different groups with completely different features) it simply won't get traction in the general server market because there is no predictability. And companies that don't understand this fundamental of the Server market fail at it.
IMO there has been no market viable ARM server product produced yet. Maybe there will be some day, but very few people are going to believe it until it happens. And that product won't see success until they can iterate and improve it over multiple generations with consistency in design.
mode_13h - Tuesday, December 24, 2019 - link
> Ampere did state that it is very secure in its funding.https://www.anandtech.com/show/14212/ampere-comput...
mpbello - Thursday, December 26, 2019 - link
"their design is optimized for power, performance, latency, and throughput, as well as offering more cores and more of other things as well"Optimized for everything = not optimized at all
andrewaggb - Thursday, December 26, 2019 - link
lol, I kinda thought that as well. It's a nice checklist, but pretty much all current arm and x86 cpu's can claim that.