ok .. getting tired of this! Intel loving Anandtech employs very unfair & unreasonable tactics to show AMD processors in bad light every single time. And most readers have no clue about the jargon Anandtech uses every time.
1 - HPL needs to be compiled with appropriate flags to optimize code for the processor. Anandtech always uses the code that is optimized for Intel processors to measure performance on AMD processors. As much as AMD and Intel are binary compatible, when measuring performance even a college grad who studies HPC knows the code has to be recompiled with the appropriate flags
2 - Clever words: sometimes even 4 GFLOPS is described as significant performance difference
3- "The Math Kernel Libraries are so well optimized that the effect of memory speed is minimized." - So ... MKL use is justified because Intel processors need optimized libraries for good performance. However, they dont want to use ACML for AMD processors. Instead they want to use MKL optimized for Intel on AMD processors. Whats more ... Intel codes optimize only for Intel processors and disable everything for every other processors. They have corrected it now but who knows!! read here http://techreport.com/discussions.x/8547">http://techreport.com/discussions.x/8547
I am not saying anything bad about either processor but an independent site that claims to be fair and objective in bringing facts to the readers is anything but fair and just!!! what a load!
I think a lot of us are intrigued by AMD's memory architecture, its ability to support NUMA, etc. A lot of benchmarch test how fast a small application runs with a high cash-hit rate, and that's not necessarily interesting to everyone.
The MySQL test is the right direction, but I'd rather see numbers for a more sophisticated application that utilizes multiple cores -- Oracle or MS SQL Server, for example. These are products designed to run on big iron like Unisys multi-proc servers, so what happens when they are running on these more economical Harpertown or Barcelona.
On the steppimgs note you made, it's not the B2 stepping that is supposed to perform better, it's the BA stepping...
The BA stepping was the improved form for B1s, and the B3 stepping is the improved form of the B2. BA and B2 came out at the same time in Sept (though BA was the one launched, B1 was what was reviewed), B2 for Phenom and performance clockspeeds, BA for standard and low power chips.
Do you happen to have a BA chip to test (those are the production chips)?
Despite K10's rather extensive architectural improvements, it looks likes its core performance isn't too different to K8. In fact, the gains we've seen so far could easily be attributable to the improved memory controller and increased cache bandwidth. It seems that introducing load reordering, a dedicated stack, improved branch prediction, 32B instruction fetch, and improved prefetching has had little impact, certainly far less than expected. The question is, why?
Well, we are still seeing 5-10% better integer performance on applications that are runing in the L2, so it is more than just a K8 with a better IMC. But you are right, I expected more too.
However, the MySQL benchmark deserves more attention. In this case the Barcelona core is considerably faster than the previous generation (+ 25%). This might be a case where 32 bit fetch and load reordering are helping big time. But unfortunately our Codeanalyst failed to give all the numbers we needed
At any rate, it was the most in-depth review I've seen, especially with the code analysis. I too, thought it would be higher, but remember that Barcelona is NOT HT3 and doesn't have the advantage of "gangning\unganging." There was an interesting article recently that showed perf CAN be improved by unganging (maybe it was ganging, can't find it) the HT3 links.
I really hate that OEMs decided to stand up to the big, bad AMD and DEMAND that Barcelona NOT have HT3 with ALL OF ITS BENEFITS.
I mean people complain that Barcelona uses more power, but HT3 would cut that somewhat. At least in idle mode, and even in cases where IMC is used more than the CPU or vice versa.
I also may as well use this to CONDEMN all of these "analysts" who insist on crapping on the underdog that keeps prices reasonable and technology advancing.
INSERT SEVERAL EXPLETIVES. REPEATEDLY. FOR A FEW DAYS. A WEEK. FOR A YEAR.
Conjecture regarding why AMD went quad core on the same die... and this has nothing to do with performance. I think one place where Intel is way ahead of AMD is package technology. Remember they were doing a type of Multichip module with the P6. Having 2 dice instead of a single die allows them to have an overall lower defect rate, higher yield, and higher GHz. This is vs. AMD's lower GHz but (it was hoped) greater data efficiency using an L3 die and lower latency of on-die communications amongst cores vs. Intel's solution of die to die communication.
Seriously, can you buy the 2360SE? Newegg doesn't even stock the 1.7Ghz 2344HEs.
The same situation exist on the Phenom line of CPUs. I don't see the value of reviewing Phenom 9700, 9900s when AMD cannot deliver them. I have trouble locating Phenom 9500s.
Seriously, can you buy the 2360SE? Newegg doesn't even stock the 1.7Ghz 2344HEs.
The same situation exist on the Phenom line of CPUs. I don't see the value of reviewing Phenom 9700, 9900s when AMD cannot deliver them. I am trouble locating Phenom 9500s.
The MySQL scalability problem is not so much in MySQL as in the Linux kernel and Glibc used.
To have it scale correctly to 8 CPUs you need kernel 2.6.22.x (alternatively you could try with a 2.6.24-RC -should be a bit faster-, but not with 2.6.23.x) and Glibc 2.6 or higher.
A default Ubuntu 7.10 for example should scale well with MySQL (OpenSUSE 10.3 *might* work, but they have backported the 2.6.23 scheduler which has a scalability problem).
It is a bit frustrating that once again you need some ultra new kernel and libraries to get good scalability. THat is unrealistic for people who use SLES and who rely on their support contract to get updates.
how about opensolaris? i dont know how much different it is from solaris 10, but it should be able to scale to dozens of cores nicely. I was about to ask about oracle and DB2 benchmarks but you answered that in your article; expensive, and the oems usually publish that info.
Yes, Phenom 9500 has an L3. But if you look at his question (in the subject line), he is asking about barcelona as a whole and phenom specifically. The answer is Yes, they are available.
From my understanding, x87 is now obsolete and not even supported in x86-64. Can you verify this? I know I had read it, from your article you state that Intel improved it, so I'm not as sure. I had assumed one of AMD's handicaps was the disproportionate, and nearly useless, x87 processing power their processors carried, but now I am not as sure. Is x87 supported in x86-64, and if not, why would Intel increase their x87 capabilities when it's clearly a deprecated technology?
The x87 instructions can be used in legacy mode and long mode. But it is true that Scalar SSE instructions are preferred by AMD and Intel.
x87 performance as many 32 bit programs are still important (look at 3DSMAx 32 bit).
If Intel's newest Core architecture would not have improved the x87 FP it would probably have looked silly as so many 32 bit programs still use it intensively. Secondly, as you can see, things like the Radix-16 circuitry are used by both the SIMD as the x87 units.
This is exactly what I was thinking of too. I want to change my mode of working to run several separate VM's, one for programming, one for Office etc and really want to know how Phenom compares to Q6600 for those uses. Well, this article looks at the server versions of those chips but for VMware the performance might be more comparable than, say, SuperPi 1M benchmarks!
I forgot to add, since Phenom would presumably also have the nested table support as Barcelona, how much performance improvement would this yield? I'd love to know
I was about to ask the same question after reading the concluding
You may feel for example that using four instances in our SPECjbb test favors AMD too much, but there is no denying that using more virtual machines on fewer physical servers is what is happening in the real world.
Since the CPUs have features that should accelerate virtualization, it would really be interesting to see how they compete there. My only addition to your request would be to add KVM as host as well (and XEN and what not as well if you care, though I really think only KVM is of interest).
This has been one of the clearer and better proofread articles I have read here lately. It was interesting, unbiased, and insightful. I am excited to see what you get into for your next project.
Thanks people. This kind of articles take ridiculously amounts of time and I really appreciate that you let me know that you liked the article. It keeps us going. (and I mean that!)
I agree, it was a very well done article. I can't wait to see how Intel's processors preform on Hyper... errr... Common System Interface (next year?). I believe that I will be buying AMD until that happens though for any servers.
Yeah, every time I see "Johan De Gelas" I have to read it.
I like the added info on the Barc's L3 cache and the intro-factoid about the new architecture.
I agree that the Barc's arrival is a year late and joined the party a little too shy. Integer performance will likely have to be addressed in the Bulldozer in 2-3 years. Which is 2-3 years too long. I would be really surprised if they can manage anything other than a die shrink for Shanghi with maybe more L3 cache and some tweaks for cache latency and SSE.
Just seems like AMD took a nose dive in development for their processors in the past 3-4 years. After the K8 I would think they would be able to come up with something more innovative. Revolutionary should of never entered their heads and they should actually look down upon themselves for using such a word after 4 years.
Three months or so since "launch", and you still can't get a server with AMD quad-core chips from any of the big 3 vendors (HP, Dell, IBM). AMD really screwed the pooch on this one.
In addition to being less ugly, PNG's higher compression would also make the file smaller (using less bandwidth), which I assume is what they were going for.
Then why are you here? Details is what technology is about!
I for one have a pet peeve with tech sites that use the wrong formats in their stories. Slightly damages credibility. Not to say this is a big deal in this case, though .gif is pretty much dead, unless you use an old browser on old tech, but then why would you be reading this story?
Look on the bright side, at least this isnt a Codec vs. Codec story, where the author uses jpgs for such color-limited screenshots.
I think the color depth was decreased alot more than 8 bit. That image only has 33 unique colors in it. Something went wrong with the dithering maybe? 256 is usually more than enough.
Who cares? The only part that suffers is the gradient at the top, all the relevant information is there, and this file is about half the size of what a PNG would be.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
43 Comments
Back to Article
befair - Friday, November 28, 2008 - link
ok .. getting tired of this! Intel loving Anandtech employs very unfair & unreasonable tactics to show AMD processors in bad light every single time. And most readers have no clue about the jargon Anandtech uses every time.1 - HPL needs to be compiled with appropriate flags to optimize code for the processor. Anandtech always uses the code that is optimized for Intel processors to measure performance on AMD processors. As much as AMD and Intel are binary compatible, when measuring performance even a college grad who studies HPC knows the code has to be recompiled with the appropriate flags
2 - Clever words: sometimes even 4 GFLOPS is described as significant performance difference
3- "The Math Kernel Libraries are so well optimized that the effect of memory speed is minimized." - So ... MKL use is justified because Intel processors need optimized libraries for good performance. However, they dont want to use ACML for AMD processors. Instead they want to use MKL optimized for Intel on AMD processors. Whats more ... Intel codes optimize only for Intel processors and disable everything for every other processors. They have corrected it now but who knows!! read here http://techreport.com/discussions.x/8547">http://techreport.com/discussions.x/8547
I am not saying anything bad about either processor but an independent site that claims to be fair and objective in bringing facts to the readers is anything but fair and just!!! what a load!
DonPMitchell - Friday, December 7, 2007 - link
I think a lot of us are intrigued by AMD's memory architecture, its ability to support NUMA, etc. A lot of benchmarch test how fast a small application runs with a high cash-hit rate, and that's not necessarily interesting to everyone.The MySQL test is the right direction, but I'd rather see numbers for a more sophisticated application that utilizes multiple cores -- Oracle or MS SQL Server, for example. These are products designed to run on big iron like Unisys multi-proc servers, so what happens when they are running on these more economical Harpertown or Barcelona.
kalyanakrishna - Thursday, November 29, 2007 - link
http://scalability.org/?p=453">http://scalability.org/?p=453kalyanakrishna - Thursday, November 29, 2007 - link
a much better review than the original one. But I still see some cleverly put sentences, wish it were otherwise.Viditor - Thursday, November 29, 2007 - link
Nice review Johan!On the steppimgs note you made, it's not the B2 stepping that is supposed to perform better, it's the BA stepping...
The BA stepping was the improved form for B1s, and the B3 stepping is the improved form of the B2. BA and B2 came out at the same time in Sept (though BA was the one launched, B1 was what was reviewed), B2 for Phenom and performance clockspeeds, BA for standard and low power chips.
Do you happen to have a BA chip to test (those are the production chips)?
BitByBit - Wednesday, November 28, 2007 - link
Despite K10's rather extensive architectural improvements, it looks likes its core performance isn't too different to K8. In fact, the gains we've seen so far could easily be attributable to the improved memory controller and increased cache bandwidth. It seems that introducing load reordering, a dedicated stack, improved branch prediction, 32B instruction fetch, and improved prefetching has had little impact, certainly far less than expected. The question is, why?JohanAnandtech - Wednesday, November 28, 2007 - link
Well, we are still seeing 5-10% better integer performance on applications that are runing in the L2, so it is more than just a K8 with a better IMC. But you are right, I expected more too.However, the MySQL benchmark deserves more attention. In this case the Barcelona core is considerably faster than the previous generation (+ 25%). This might be a case where 32 bit fetch and load reordering are helping big time. But unfortunately our Codeanalyst failed to give all the numbers we needed
BaronMatrix - Wednesday, November 28, 2007 - link
At any rate, it was the most in-depth review I've seen, especially with the code analysis. I too, thought it would be higher, but remember that Barcelona is NOT HT3 and doesn't have the advantage of "gangning\unganging." There was an interesting article recently that showed perf CAN be improved by unganging (maybe it was ganging, can't find it) the HT3 links.I really hate that OEMs decided to stand up to the big, bad AMD and DEMAND that Barcelona NOT have HT3 with ALL OF ITS BENEFITS.
I mean people complain that Barcelona uses more power, but HT3 would cut that somewhat. At least in idle mode, and even in cases where IMC is used more than the CPU or vice versa.
I also may as well use this to CONDEMN all of these "analysts" who insist on crapping on the underdog that keeps prices reasonable and technology advancing.
INSERT SEVERAL EXPLETIVES. REPEATEDLY. FOR A FEW DAYS. A WEEK. FOR A YEAR.
INSERT MORE EXPLETIVES.
donaldrumsfeld - Wednesday, November 28, 2007 - link
Conjecture regarding why AMD went quad core on the same die... and this has nothing to do with performance. I think one place where Intel is way ahead of AMD is package technology. Remember they were doing a type of Multichip module with the P6. Having 2 dice instead of a single die allows them to have an overall lower defect rate, higher yield, and higher GHz. This is vs. AMD's lower GHz but (it was hoped) greater data efficiency using an L3 die and lower latency of on-die communications amongst cores vs. Intel's solution of die to die communication.Can anyone confirm/deny this?
thanks
tshen83 - Tuesday, November 27, 2007 - link
Seriously, can you buy the 2360SE? Newegg doesn't even stock the 1.7Ghz 2344HEs.The same situation exist on the Phenom line of CPUs. I don't see the value of reviewing Phenom 9700, 9900s when AMD cannot deliver them. I have trouble locating Phenom 9500s.
tshen83 - Tuesday, November 27, 2007 - link
Seriously, can you buy the 2360SE? Newegg doesn't even stock the 1.7Ghz 2344HEs.The same situation exist on the Phenom line of CPUs. I don't see the value of reviewing Phenom 9700, 9900s when AMD cannot deliver them. I am trouble locating Phenom 9500s.
alantay - Tuesday, November 27, 2007 - link
The MySQL scalability problem is not so much in MySQL as in the Linux kernel and Glibc used.To have it scale correctly to 8 CPUs you need kernel 2.6.22.x (alternatively you could try with a 2.6.24-RC -should be a bit faster-, but not with 2.6.23.x) and Glibc 2.6 or higher.
A default Ubuntu 7.10 for example should scale well with MySQL (OpenSUSE 10.3 *might* work, but they have backported the 2.6.23 scheduler which has a scalability problem).
Thanks for the article!
JohanAnandtech - Tuesday, November 27, 2007 - link
Excellent feedback.It is a bit frustrating that once again you need some ultra new kernel and libraries to get good scalability. THat is unrealistic for people who use SLES and who rely on their support contract to get updates.
MGSsancho - Wednesday, November 28, 2007 - link
how about opensolaris? i dont know how much different it is from solaris 10, but it should be able to scale to dozens of cores nicely. I was about to ask about oracle and DB2 benchmarks but you answered that in your article; expensive, and the oems usually publish that info.anyways awesome article
Roy2001 - Tuesday, November 27, 2007 - link
I cannot find a SINGLE one, nowhere.drebo - Tuesday, November 27, 2007 - link
Newegg has the Phenom 9500 in stock. At least, they did yesterday. I've also got a vendor I use that has them in stock.JarredWalton - Tuesday, November 27, 2007 - link
But Phenom isn't Opteron 23xx. Different socket, different market, and it has L3. (Does Phenom X4 have an L3 cache? Maybe I should go check....)drebo - Wednesday, November 28, 2007 - link
Yes, Phenom 9500 has an L3. But if you look at his question (in the subject line), he is asking about barcelona as a whole and phenom specifically. The answer is Yes, they are available.Slaimus - Tuesday, November 27, 2007 - link
They may be gobbled by up Cray for that Budapest supercomputer.Regs - Tuesday, November 27, 2007 - link
I would not expect any from vendors and wholesalers until early next year.Matter of fact I wouldn't want one until then anyhow. I would at least wait until B3 stepping.
Regs - Tuesday, November 27, 2007 - link
I would not expect any from vendors and wholesalers until early next year.Matter of fact I wouldn't want one until then anyhow. I would at least wait until B3 stepping.
TA152H - Tuesday, November 27, 2007 - link
Johan,From my understanding, x87 is now obsolete and not even supported in x86-64. Can you verify this? I know I had read it, from your article you state that Intel improved it, so I'm not as sure. I had assumed one of AMD's handicaps was the disproportionate, and nearly useless, x87 processing power their processors carried, but now I am not as sure. Is x87 supported in x86-64, and if not, why would Intel increase their x87 capabilities when it's clearly a deprecated technology?
JohanAnandtech - Tuesday, November 27, 2007 - link
The x87 instructions can be used in legacy mode and long mode. But it is true that Scalar SSE instructions are preferred by AMD and Intel.x87 performance as many 32 bit programs are still important (look at 3DSMAx 32 bit).
If Intel's newest Core architecture would not have improved the x87 FP it would probably have looked silly as so many 32 bit programs still use it intensively. Secondly, as you can see, things like the Radix-16 circuitry are used by both the SIMD as the x87 units.
Gholam - Tuesday, November 27, 2007 - link
Do you have any plans to benchmark Opteron vs Xeon in an ESX Server environment?DeepThought86 - Tuesday, November 27, 2007 - link
This is exactly what I was thinking of too. I want to change my mode of working to run several separate VM's, one for programming, one for Office etc and really want to know how Phenom compares to Q6600 for those uses. Well, this article looks at the server versions of those chips but for VMware the performance might be more comparable than, say, SuperPi 1M benchmarks!DeepThought86 - Tuesday, November 27, 2007 - link
I forgot to add, since Phenom would presumably also have the nested table support as Barcelona, how much performance improvement would this yield? I'd love to knowsht - Tuesday, November 27, 2007 - link
I was about to ask the same question after reading the concludingYou may feel for example that using four instances in our SPECjbb test favors AMD too much, but there is no denying that using more virtual machines on fewer physical servers is what is happening in the real world.
Since the CPUs have features that should accelerate virtualization, it would really be interesting to see how they compete there. My only addition to your request would be to add KVM as host as well (and XEN and what not as well if you care, though I really think only KVM is of interest).
JohanAnandtech - Tuesday, November 27, 2007 - link
Indeed, we are working on that. The software that we described here (http://www.anandtech.com/IT/showdoc.aspx?i=2997&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2997&am... is being adapted to testing virtualized applications. We are also looking into the parameters that can really influence the results of a benchmark on a virtualized server.JohanAnandtech - Tuesday, November 27, 2007 - link
Indeed, we are working on that. The software that we described here (http://www.anandtech.com/IT/showdoc.aspx?i=2997&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2997&am... is being adapted to testing virtualized applications. We are also looking into the parameters that can really influence the results of a benchmark on a virtualized server.AssBall - Tuesday, November 27, 2007 - link
Thanks, Johan.This has been one of the clearer and better proofread articles I have read here lately. It was interesting, unbiased, and insightful. I am excited to see what you get into for your next project.
Hans Maulwurf - Wednesday, November 28, 2007 - link
Agreed, I have not seen an article as good as this one for years at Anandtech. And not for some time on other review sites as well.Thank you.
JohanAnandtech - Tuesday, November 27, 2007 - link
Thanks people. This kind of articles take ridiculously amounts of time and I really appreciate that you let me know that you liked the article. It keeps us going. (and I mean that!)magreen - Tuesday, November 27, 2007 - link
Excellent article, thorough and with amazing depth and expertise. Keep up the great work AT!Bluestealth - Tuesday, November 27, 2007 - link
I agree, it was a very well done article. I can't wait to see how Intel's processors preform on Hyper... errr... Common System Interface (next year?). I believe that I will be buying AMD until that happens though for any servers.Regs - Tuesday, November 27, 2007 - link
Yeah, every time I see "Johan De Gelas" I have to read it.I like the added info on the Barc's L3 cache and the intro-factoid about the new architecture.
I agree that the Barc's arrival is a year late and joined the party a little too shy. Integer performance will likely have to be addressed in the Bulldozer in 2-3 years. Which is 2-3 years too long. I would be really surprised if they can manage anything other than a die shrink for Shanghi with maybe more L3 cache and some tweaks for cache latency and SSE.
Just seems like AMD took a nose dive in development for their processors in the past 3-4 years. After the K8 I would think they would be able to come up with something more innovative. Revolutionary should of never entered their heads and they should actually look down upon themselves for using such a word after 4 years.
jones377 - Tuesday, November 27, 2007 - link
Any chance you could use the same tools to profile desktop applications as well in the future?DigitalFreak - Tuesday, November 27, 2007 - link
Three months or so since "launch", and you still can't get a server with AMD quad-core chips from any of the big 3 vendors (HP, Dell, IBM). AMD really screwed the pooch on this one.jojo4u - Tuesday, November 27, 2007 - link
Yuck, ugly GIF on the first page. Please use PNG because 256 colors are not enough for screenshots ;)deathwombat - Saturday, December 1, 2007 - link
In addition to being less ugly, PNG's higher compression would also make the file smaller (using less bandwidth), which I assume is what they were going for.jkostans - Tuesday, November 27, 2007 - link
Didn't even notice.aeternitas - Thursday, December 13, 2007 - link
Then why are you here? Details is what technology is about!I for one have a pet peeve with tech sites that use the wrong formats in their stories. Slightly damages credibility. Not to say this is a big deal in this case, though .gif is pretty much dead, unless you use an old browser on old tech, but then why would you be reading this story?
Look on the bright side, at least this isnt a Codec vs. Codec story, where the author uses jpgs for such color-limited screenshots.
SonicIce - Tuesday, November 27, 2007 - link
I think the color depth was decreased alot more than 8 bit. That image only has 33 unique colors in it. Something went wrong with the dithering maybe? 256 is usually more than enough.Justin Case - Friday, November 30, 2007 - link
Who cares? The only part that suffers is the gradient at the top, all the relevant information is there, and this file is about half the size of what a PNG would be.