Name: Athlon64 3000+: 64-bit at Half the Price
Item: Athlon64 3000+: 64-bit at Half the Price
Author: Wesley Fink

Athlon64 3000+: 64-bit at Half the Price

by Wesley Fink on 12/22/2003 8:15 AM EST

Posted in
CPUs

Post Your Comment
Please log in or sign up to comment.

Comments Locked

75 Comments

Back to Article

Reflex - Saturday, December 27, 2003 - link
um, this conversation has taken a turn into the absurd...machine man? WTF is wrong with you..?

Um, heh, I think if I stay in this debate any longer I may have to turn to drinking in order to understand what is being said...
Pumpkinierre - Friday, December 26, 2003 - link
The passages I have quoted are well known for those that lived in that era and stand fully by themselves in relevance to the overall article. The rest of the article is about methods of required information prediction etc. BUT as the excerpts state this is NOT RELEVANT to data/commands that have a low probability of being reused in cache and one of the examples given of apps that have this occurrence are games. This is clear- not complicated. The only addition is the fact that since the cacheless celeron the difference between memory and cpu speed has increased substantially making caches more useful but this in no way negates the above argument.
They also refer to writing back to memory once data has been changed by the cpu- one method is the cache hangs on to the data until it is deemed no longer useful where upon it must write the data back to RAM. This is what I was referring to as cache purging that a previous post hit on me saying all you needed to do was overwrite.

With the BIOS on my i875 (ABIT IC7-G) I cant switch any of the caches off (not that its relevant because P4 L1 cache is 8K- definitely too small). I know on HX mobos I could do it and cant remember on the VIA?K6 boards but there was a utility tweakbios which hasnt been updated to later boards(last time i looked) that gave you full access to all functions. Also the powerleap utility allowed you to switch one of the caches off.

Just as an aside I've come across a site that does the testing I suggest: tweaktown .com. Testing the HIS 128Mb 9800pro and 9600XT against a 256Mb 9800XT(AT post 3 days ago-
http://www.tweaktown.com/document.php?dType=articl...

"To retrieve the most accurate frames per second (FPS) in our benchmarking suite, we fired up a game then played until we reached a scene containing average amounts of action. We then recorded the FPS during a one minute time period and took the mean FPS. This gave us a fairly adequate ranking of what “Excalibur” of performance these graphic cards are at."

They are a little bit loose on the statistics and dont define their settings too well but they are in the right direction. They give anectdotal information and include 3dMark03 results but they themselves state in a note that they dont give them much credence. They use an FX51/SKV8 testbed which is hardly your budget gamers system but at least excludes cpu limitations.The responses in different games and settings of the different cards are exactly as I said it would be: different in every situation with the 9600XT close to the 9800pro in many gaming tests (even DX9-Halo but slaughtered in 3DMk03) and the 9800pro beating the 9800XT by 5% in 'call of duty' at 1280x. Even though the tests were manual and subjective, general trends were still obvious while not losing the system/game synergy anomalies that CAN ONLY BE DISCOVERED BY ACTUAL GAME PLAY. I praise them for their understanding of the complexity of system/software testing and hence the meaninglessness of demo testing.

Back to P4. Your argument on P4 not being server oriented and having super high latency is false. The Xeon(P4 version) may have been released later but the P4 was the experimental testbed for it. Why else would intel release a cpu that was roundly criticised by the media as being less powerful than the 1.0 P3. I know: they were having problems getting the P3 higher than 1Gig na na na-Bsht! The athlon was only at 1.2 and you lost your Tbird within 4secs if your fan failed. P3 went on to 1.4G same as Tbirds within the next year once problems were sorted. So the weak P4 was put out to test the new technologies(P4,i820/840,RAMBUS) with their eye firmly on the server market. With the K8 AMD are trying to run this battle the other way round servers first then desktop- all I can say is good luck!
On latency , RAMBUS was meant to deliver on this front(no not just bandwidth) but failed. It was meant to be better at lower latency than SDRAM or DDR DRAM and many articles stated this at the time. WHY else would intel do this when RAMBUS was 5 TIMES the cost of SDRAM at the time?! But it was found (against DDRAM) no to be so for reasons explained in your quoted ARS article (target byte transfer etc.). This is why Intel dropped it. RAMBUS has been at the same price as DDR for over two years now, bandwidth is good and the i850 RAMBUS mobo can still match it with i865/875 so RAMBUS should be coming into its own. Intel have killed it because of latency considerations. The P4/mobo was always destined to be a low latency high bandwidth system. It is now under the i875 mobo with PAT. This is the lowest latency system outside of the K8. The i875 is actually a small server board handling ECC reg. memory and i think able to be set up as a dually (Asus mobo?). The P3 was loved by business and server personnel for its coolness and reliability not latency or bandwidth where it was quite poor. The distrust of the P4 is only as a result of the extremely conservative nature of these people and that is what AMD is facing up to with the opteron. The present Xeon has really only started to be trusted in the last year with reliable DDR mobos.

The serverfarms you mention are a misnoma they are workstation farms. A server does what it name says it serves out files/programs to intelligent terminals basically optimising storage. Any other combination can be defined as a mainframe or a workstation. Both of these require powerful fpu cpu. A mainframe requires good bandwidth and latency if several terminals attached while a workstation only requires bandwidth if media streaming and low latency if operator driven testing of 3d virtual worlds is being carried out. A computer that renders is a workstation. A database computer is a mainframe. The K8 is ideal as a gaming chip and low to middle(MP'ed) workstation with the exclusion of media streaming apps (high bandwidth). It is not meant for servers as defined.

Personally the term server especially applied to the K8 makes me puke- only the machine men in AMD who have long since sold their brass monkeys could have thought that one up as the fate of a brilliant gaming processor. That's why we need big Arnie to go in there and sort it out but he's tied up with california. In the meantime, Reflex you know I speak the truth, we need more to come on side and demand what we've been waiting 4yrs for: the new 300 celeron 'not quite cacheless wonder'. I detect in your life story a hint of a failed machine man (probably why you frequent AT). If this is so, you have passed the first test of turning your back on all the disconnected abstract mumbo jumbo that plagued the last century and I urge you to continue on having faith in what you sense is real not what is dished up to you as being right.

Right now with the 512K A64 the break has occurred. Already the price has gone from 217 to $240 in the latest AT price roundup. We must make sure the price stays down and demand is met. Bickering and argument over old quarrels is pointless. We all know the K8's destiny- for the masses. So rally to the call of Jefferson and Arnie:

"No, no we're no going to take these high priced bloated cache K8s anymore".

or for Voltaire- la marseillaise:

""Marchons, marchons enfants de la GAMING COMMUNITY
notre K8 de gloire est arrive"

Happy New Year!
Reflex - Friday, December 26, 2003 - link
You can do this test now, actually. Most BIOS implementations allow you to disable both L1 and L2 cache. I have actually done this test. Performance drops through the floor, including in gaming. As I stated before, you do not have to actually cache things to make them faster, cache is primarily used for pointers to locations in memory, which seriously reduces latencies.

You are basically chopping tiny little sections out of the article that you think support your claims without simply reading the whole thing and seeing what it tells you. I am not going to cut and paste replies, either read the whole thing and understand why cache is so important, or continue to be ignorant on the topic. I highly reccomend anyone reading this to go read the article for themselves, the excerpts listed above are very out of context.

Secondly, if the only purpose of a server is to serve up web pages, then you are correct that a strong FPU is not needed. However, companies like Pixar use larger serverfarms(renderfarms as they like to call them) with tons of dual CPU systems. Since those servers are used for rendering images/video, a strong FPU is very very important. Several companies have switched to K7/K8 based servers for their superior FPU, including Pixar and I believe ILM.

Furthermore, the Xeon *is* a enhanced P4, not the other way around. I am not sure how to put it to you, but I was personally involved in the development process. I am a former Microsoft engineer who worked on the Windows 2000 and XP kernel, I do know what I am talking about. I had my hands on P4's long before they hit the market, as well as K8's. I can pretty much tell you the development cycle of any CPU made since 1999 and what order they were developed. The P3 Xeon continued as the primary server CPU from Intel for a year past the release of the P4 simply because it took that long for Intel to finish enhancing the chipset and cache algorithms of the P4, as well as validating multi-CPU support. The P4 was a purely consumer CPU, its server uses were an afterthought, if that had not been the case, it would have been a low latency rather than a super high latency design, the high latency design has crippled their competitiveness in a lot of situations, namely database servers which rely on super low latencies. In many cases, even to this day corporations prefer P3 Xeons for both their lower power/lower heat as well as the fact that it takes 2Ghz or more from a P4 to compete with a P3 for a very optimized DB and the heat/power requirements just don't make it worth it to changeover to a P4 above 2Ghz...

Anyone wishing to test the BS being spewed above in their favorite games, go into your BIOS and disable your cache. Use FRAPS or some other counter to measure your minimum, maximum and average framerate. Then turn your cache on and repeat. There is no need for debate, anyone can run this test if they wish...
Pumpkinierre - Friday, December 26, 2003 - link
"When an app fills up the cache with data that doesn't really need to be cached because it won't be used again and as a result winds up bumping out of the cache data that will be reused, that app is said to "pollute the cache." Media apps, games, and the like are big cache polluters, which is why they weren't too affected by the original Celeron's lack of cache. Because they were streaming data through the CPU at a very fast rate, they didn't actually even care that their data wasn't being cached. Since this data wasn't going to be needed again anytime soon, the fact that it wasn't in a readily accessible cache didn't really matter."
Thats from your stated article #71 Reflux, which is saying what I have been saying. I dont totally agree with the the media app and media streaming part of the explanation but in essence it describes what I and others have observed in regard to gaming. Try this from the same article on the cacheless celeron:
"Along with its overclockability, there was one peculiar feature of the "cacheless wonder," as the Celeron was then called, that blew everyone's mind: it performed almost as well on Quake benchmarks as the cache-endowed PII."
I think someone hammered me earlier about the P2 slaughtering the celeron 300.

If you got an A64 out there (I've yet to see one even in a shop) disable your L2 cache maybe in BIOS or using utility (powerleap?) and run any FS or FPS/Doom type game as long as the exe file and configuration/data files fits wholly within DRAM run this memory at its fastest (latency) settings-preferably high FSB if clock unlocked (lucky these days, the apparatchiks are also going to lock the FSB). You most likely will notice little difference in game play as long as the memory pipeline hasnt been crippled by the L2 disconnection which I doubt will happen (although there might be an overhead from the L1 checking on a non existent L2 when the CPU requests memory content).
On the server front even Intel is still trying to break into the market, garnering a section of the low end market with Xeons. These chips are X-86 based architecture from 25 years ago, effectively out of date and meant for desktops (64Kb blocks and 640K maximum memory etc.). The P4 is a rebadged Xeon (not the other way around as some purport)but intel followed this route from its non server days-8088 and 80886. Servers require high memory bandwidth and low system latency - to respond quickly to multiple requests and stream the data/programs. They dont require big fpu. You're not modelling the worlds weather and a server is not a mainframe. The change in direction of intel with the P4 was as a result of its intended server pedigree- poor fpu, quad pumped data bus (not needed for desktop) and expected low latency with RAMBUS. Its only held its own by the phenomenal ramp in speed which is nothing to do with the 20 execution unit pipelines. With the Itanium Intel have broken away from the X-86 mold under the pretext of a new start with 64bit. From all reports the first one was very slow in all modes. The second one I dont know much about but if it has to simulate X-86 it wont be lighning fast. With this processor they hope to attack the middle range server market going against IBM, SUN etc. who use K8 and xeons (and possibly itaniums) in their low end systems.
K8 is also X-86 (with a 64bit extension set)and hence not designed as a server chip. Further it has only medium bandwidth even double pumped opterons. It does have very low latency which also helps a bit with the bandwidth deficiency. So at best they are only going to be able to enter the low end server market. I know: 8way systems, 3HT links, separate memory for each processor blah blah blah but all this is going to take a long time to work out- even with duallies mobos have got the memry running off one cpu. AMD have'nt got time and the server market wants turnkey reliability- it doesnt like being the experimental tesbed.
The requirements of a good gaming chip are powerful FPU and low (system) latency. The 3d virtual world of games is made up from small data input files with a large exe file requiring heavy cpu number crunching to create that world (thats why I differ with the ARS Tech article about data streaming) so massive bandwidth is not required. But fast response is and as the ARS article points out with caches a cpu read or write request must be checked in all levels of cache first before going to main memory. This adds latency if the probability of the data being in the cache is low.
So a small L1 cached K8 fulfills perfectly these requirements and solves the prod. capacity problem which in turn should get the price in the sub 150 sweet zone. I'll buy one if it makes it there in the next 3 months. I really didnt want my P4- I wanted what I described but got sick of waiting and have been even sickerwith AMD meanderings since

The k8 12 execution unit pipelines have been optimised and tuned making it the powerfullest fpu for the money, at equal speeds it would eat a k7 for breakfast. The G5 is supposed to be okay but from what i remember of apple powerpcs they are not all that hot on the latency front, Further its not X86- Bill likes X86- his whole company is founded on it so the K8 is going in XBOX2 - I'll bet you $50 bucks on it.

However what AMD need is Arnie-both internally against the apparatchiks and externally- twin gatlings hammering out the cacheless society message.
PrinceGaz - Thursday, December 25, 2003 - link
Thats a most excellent article you linked to there Reflex and it helped fill in a few things I wasn't sure of, thanks for that. I hope Pumpkin... reads it thorughly and learns something from it.
Reflex - Thursday, December 25, 2003 - link
http://arstechnica.com/paedia/c/caching/caching-1....

A specific article on caches, how they work and what they do for a CPU.
Reflex - Thursday, December 25, 2003 - link
Pumpkin: Once again, you are wrong. Show me a link with this 'rumor'. IBM themselves stated that the Xbox2 was going to use a variant of thier Power5 architecture. It is NOT going to be an x86 chip in any way, shape or form, so the K8 is not going to be in the XB2.

And I believe you just got your low cost A64, thats what the 3000+ is. However as a 'gaming' CPU I think your missing the point: The K8 is a great architecture for a number of uses, in fact while it is *marginally* better than the P4 at gaming, it absolutely kills the P4/Xeon when it comes to servers. So the choice by AMD to target servers(high margins) as a priority, and consumer level(very low margin) as a secondary market was pretty much a no-brainer. Furthermore, due to the lower volumes of the server market it allowed them time to figure out how specifically to tweak certain aspects of the BIOS, drivers, etc to fully take advantage of the architecture when it hit mass market. They could play very conservatively with the Opteron, it was already considerably better than the competition without tweaks and that market is always willing to sacrifice a bit of performance for stability. This essentially gave them several months to tweak while they made money from the architecture. Not a bad plan(plus gave them more time to refine manufacturing).

Honestly the argument that its a 'gaming tuned CPU' is rediculous, its no more gaming tuned than the K7. If a strong FPU is the main argument for making it a gaming CPU that darned Alpha really shoulda been the master of Quake. And hey, lets not forget the Itanium which has one of the strongest FPU units ever devised. The K8 has the exact same FPU unit as the K7 did in fact.

http://arstechnica.com/cpu/index.html

Read the articles on that link for a long list of very explicit architecture overview of different CPU's, and comparisons between them. There is quite a bit there on the K7, K8 and Power5(PowerPC 970).

Come back after you have read these articles, it will make what you have to say far more relvant.
Pumpkinierre - Thursday, December 25, 2003 - link
And on the jingle:

"No,no we're not going to take these large caches anymore"

I can add failed poet to the CV!
Pumpkinierre - Thursday, December 25, 2003 - link
errata again should be:
"servers which require low latency and LARGE memory bandwidth"
Reflex - Thursday, December 25, 2003 - link
HammerFan: I take it back, this is not a useless debate. If you look at it from the point of convincing Pumpkin, then yeah it would be useless. But as the previous post demonstrates, a lot of useful information is coming out that will help the novice who is perhaps curious about the workings of different aspects of CPU's that are mentioned in different articles understand things better.

Anyone who wants to take this into further depth I highly reccomend reading the articles on Ars Technica relating to CPU architecture(http://www.arstechnica.com). They have a very very good overview of both the K7 and K8 core, and I believe there is an article on the P4 there as well.

Anyways, keep reading if you wanna know more about how things work I guess. ;)
Pumpkinierre - Thursday, December 25, 2003 - link
#65 what youre discussing is the branch predictor in the cpu. This is at the micro level and further down the track than the cache prediction algorithm. Because of the fast nature of cpus the branch predictor offers up a number of solutions to a decision yet to be made by the operator or program. Once the decision is made, the cpu throws away (flushes) the non relevant solutions and then repeats the process with the next step of the program. This way tne cpu uses its spare time more efficiently and involves black magic (no. of pipelines, execution lenghts, branch prediction methodology, buffers etc.). This is also where the K8 has been tuned for gaming (strong fpu) mentioned in my last post. What I've been talking about is caches. These have prediction algorithms- a small program if you will run by a processor as part of its housekeeping. Whether this is done by the cpu itself or by separate dedicated circuitry I dont know-but it is in the processor somewhere. these algorithms are black magic also. Caches on hard drives and DVD/CD players/burners have gone up and down (512k to 2Mb to8MB and now settled back to 2Mb) because predictability of required data at this level is nigh on impossible. Better burn software has made the need for large caches redundant. In the case of HDD many say 512K is all you need so the decision is more to do with cost/marketing (the bigger-the better and as I look between my legs I understand this philosophy).
So similar to the cpu predictor the cache predictor loads up the data /commands that are possible requirements in the future and waits for the cpu to make the final decision. Unlike hdds etc., at this level its got something to go on- the program code and in the case of a batch process- data/command input file. This file is set in stone and controls the exe program which may have many decision statements and subroutines. Even the stupidest cache algorithm would load information from this file first as soon as it encountered the relevant Open file statement in the main program. For the rest its a question at looking at the branch statements and what memory addresses each require and their associations- again to do with algorithm, compilation cache/memory architecture-black magic.
This all fine for batch jobs and that is what a demo is(go away and have a cup of coffee). But a game is not a batch job- you dont have a cup of coffee in the middle of an FS or FPS without hitting the pause button. So in this instance the cache has nothing to go on- loads up as much of the main program as it can and waits there for the operator to give it an instruction. Predictability-zero. So as with caches on HDD and Cd burners, for this low predictability application the cache size can come down. I suspect algorithms can or will look ahead in the code possibly in conjunction with the code compilation to better assess what the cpu will require but this will be of only small benefit to 3D gaming and a hindrance if the game hasnt followed the expected methodology in its conception. Caches benefit servers/workstations they are only present on desktops because these systems are expected to be jack of all trades.
In the case of the K8, it is a production/politics problem- so AMD have gone for a niche market but they've picked the wrong one because they think servers are high profitability. But this is erroneous as the server market requires extensive backup and upgrade paths which is based on reputation which in turn requires lots of initial capital outlay to build up goodwill. On top of that, the K8 wasnt designed for that(you dont need powerful fpus for servers which require low latency and memory bandwidth-K8 has the low latency thats it), it was designed for gaming pure and simple. So the solution is to get it out there targeted at gamers and by chopping of the cache they could double capacity. This 512k a64 is going to sell like hot cakes but its going to be hard to get hold of as the server apparatchiks in AMD cling to their model and refuse to divert resources. AMD are in deep turmoil evident in the lack of clarity on socket types, upgrade paths and roadmaps (what's this 32bit paris sckt754 A-XP processor- either stick with K7/SktA or leave the 64bit set in). With XBOX2 rumour has it that the K8 is going to be used with IBM producing it. The G5 is a dog cf to the K8 in gaming and Bill knows it.I'm all for it as it would establish the K8s true credentials but the problem might be that Bill becomes too interested.
So its up to us, the interested populace, to back up whoever it is in AMD that is taking on the machine men and state unequivocally that what we want is a budget gaming K8 cpu NOW! (use the Arnie gospel- the broom and the jingle:"No,no we're not going to take this anymore").
Merry Xmas
PrinceGaz - Thursday, December 25, 2003 - link
@Pumpkin... - Your argument against game benchmarks is fundamentally flawed; while it may sound plausible to people who know nothing about how a CPU works (which includes yourself it seems), you only need to read some of the CPU articles here on AT to spot the problem.

Basically what you're saying is game benchmarks are invalid because the processor has access to the benchmark/demo recording data and can use it to ensure all the data and instructions the processor will need is cached ready to be used, and that the only way to test real game-performance is for a human-player to interact with it as then theres no way for the processor to predict exactly what or when the player will do things. Right?

Wrong. The processor can only make predictions based on what it has done at that point in the code the last few times its reached it, more specifically the Branch Prediction Unit makes its decision about whether to assume a branch is followed or not by checking a 2-bit counter (0 to 3) which is incremented each time it is actually taken, and decremented if it isn't. By looking at that counter it can see whether the branch has been taken more often than not recently, and if thats the case it assumes the branch will be taken again. Thats the limit to its prediction.

Theres no magical examining of demo-recording files to see what it says is coming up next, all decisions are made on the basis of very recent past events (so theres no long-term memory to it either, if you were about to use that argument), therefore it makes no difference if the game-input is from a human-player or a file. If you don't believe me read this-

http://www.anandtech.com/cpu/showdoc.html?i=1815&a...

Your whole argument against using demos/game-recordings is therefore proven totally incorrect, and with it everything else you have said about how large cache processors performed differently in game demos to when a human interacts directly with the game. Basically, everything you have said on that subject is total utter rubbish. Head-Shot! :)
Reflex - Thursday, December 25, 2003 - link
Um, while most of that is junk science at best, let me point out something just cause I found it a bit funny: IBM's Power5 is going in the Xbox2, not the Athlon64. They announced that last month...
Pumpkinierre - Thursday, December 25, 2003 - link
Sorry forgot Merry Xmas
Pumpkinierre - Thursday, December 25, 2003 - link
Yeah well #60 no one does what I'm suggesting except for consumers who find against all 'expert' advice that a particular game runs quite well on their lowly processor maybe with the help of a video card upgrade. You can see it in this review with a half size cache A64 being within miniscule difference of a 3200+. This cache crippling may have extended to the 16way associative cutting it down to 8way, supposedly further damaging performance If all the blarney about cache improving system performance were true you'd expect 15-20% loss of performance- after all 512k is a lot of transistors and a fair bit of die space. I mean the guys that bought full blown A64-3200 at over us$400 must be spitting chips. But the fact is you cant make any statement or deduction from this review as there are too many variables (different mobos, processors demos benchmarks etc.) all requiring intensive analysis to draw any truth.
The fact is the K8 was built for gaming- why? powerful number cruncher (fpu-better than K7) and LOW LATENCY- that is what you need for gaming. The K7 was only experimental for the K8 not the other way round as many suggest (K8 is just a K7 with two extra branch instructions per pipeline nanana- BullSht). This processor is tuned for gaming and someone at the heart of AMD knows this. Unfortunately the apparatchiks have taken over, due to over 2 years of losses and so we've got server business models, inverted pyramids and a mute heart. The server market is conservative and full of apparatchiks who wont take a risk. So even though its profitable its a long haul with reputation build up etc. and really not the province of a company laden with debt as is AMD. So its up to internet hardware sites to chorus and point out this bad turn in direction in order to harmonize with those inside AMD who know K8's true destiny. Some of the politics can be seen with all these different sockets when the cpu barely been released (i've still yet to see one even in a shop). Unfortunately it seems that the hardware sites perhaps helped by intel who strive to achieve what the K8 has (ie low latency), seem determined to follow this trend with bloated FX51 and occasionally P4EE cpu dropped into the heavy cache biased tests to make us go ooh ha and go out the back to do what we never admit to, because we know we cant afford it.

The problem AMD have is capacity in production. They basically have one Fab and they are scared of over reaching and not supplying demand. Hence the single expensive A64 release,the tentative OEM release of the 3000+ (like the top range opteron in april) and the limited edition hyper expensive FX51. The present production is geared towards opteron production even the A64 is a rebadged opteron.

The solution: 2dies to the 200mm2 wafer not on 90 nm- too many problems, no money - but NOW! so on .13um. This means less than 100mm2 per die. A quick look at the present die shows that the computational and L1 exist on less than half the die. This is good enough in my view but if they could squeeze in and extra 128K as L1 or even L2 it may keep the cache apparatchiks/zombies at bay. To compensate dual bank memory controller with fastest memory DDR500, dual or quad phase memory- whatever but try to minimise latency (careful balance between bandwidth and latency-another story). This is the Newcastle that is required and should be being demanded by the internet sites preferably released before Prescott to show up this cpu obvious problems an shortcomings. With a sale price under US$150 and AMD would meet demand, have over 30% of the market and be in the red by end 2004 with debt on its way down like before Bush and Iraq. The advent of Win64 and Xbox2(I'm not the only one to have noticed the K8's true calling) will only further boost sales and credibility. As it is their model is one of production contraction(witness A64 3200+ sales) for supposedly high profit most probably resulting in slow death or takeover.
So AT and other sites revamp totally your testing procedures for the new year- no synthetics, no demos just real usage with operator anectdotes- too subjective?! - isnt that what quantum mechanics is telling us! and no cpu or gpu above us$500. You'd double your susbscription in a year and it wouldnt just be AMD with a capacity problem. That might turn AMD around as long as you kept barking at its heels for what the K8 was always meant to be: a cheap fast responsive overclockable gaming cpu.
Reflex - Thursday, December 25, 2003 - link
All I can say is that I used to get paid to do tests like this. Pumpkin is wrong, plain and simple. Show me one modern game that runs better on a Duron than an Athlon. Show me one modern game that runs better on a Celeron than a P4. Do this with equivilent clock speeds. I don't care how you do the demos. Bear in mind that there are *plenty* of user created demos you can run aside from what the game manufacturer gives you to start, so there is no conspiracy here.

All I can say is: Prove it. I know your wrong on a technical level, so the ball is in your court.

Hammerfan: No, it wasn't, but it was a somewhat fun excercise for a little while, till it got repetitive...
HammerFan - Wednesday, December 24, 2003 - link
was this arguement really worth all those lines of text?
Pumpkinierre - Wednesday, December 24, 2003 - link
When you run a program with the same input file you get a predictable follow through of code to the CPU a la von Neumann. Even SMP and HT tuned games will be the same with a predictable follow through of code. That is why you get repetitive results. Cache prediction algorithms love nothing better than step by step follow through. They can load up the next steps in the program in conjunction with the input file data or command and have it on hand for the cpu. The process I have described is a game demo and this process is almost the antithesis of what happens in actual operator driven gaming. Its true I'm a failed scientist (and gardener!) but if I produced a model of a process ie a demo and claimed it represented the process without correlating the results with the actual process ie what is felt by gamers which no site has ever done i'd truly take up tiddly winks as my primary occupation. The only use of demos is to compare the same computer system family eg A-XP/nf2/ATI9800p and then change one variable BUT WITHIN THE FAMILY eg XP2500+ barton to XP3000+ barton (both 333MHz). Even changes of cache size and FSB within the same series of processors can be deemed out of the family. Only a single variable can be changed at a time and then the response of the whole system observed. The result from this would define comparatively the power of the system where the demo is integral to that system BUT NOT THE ACTUAL PLAYING OF THE GAME from which the demo is derived. Most reviews do even better with a kaleidoscope of Intel & AMD cpus, mobos, DRAM and other factors all compared in the same chart with max. fps as being the winner when in fact the relevance to gameplay is nothing. No wonder the populace turn to George Bush and Arnie for inspiration. For 2d and multimedia applications this sort of testing (Winstone, photoshop, high end wkstation, 3dmax5) is fine as it represents ordered command sequence that operators use when running these apps eg rotation followed by render etc. in cad-again the antithesis of gaming where you might bank left too hard find yourself in a spin and kick the rudder back off the throttle while unloading the wings IMMEDIATELY to correct.

Secondly, outside of any technical argument, demos are produced by the companies to sell their games- see how it runs on my system. Its only natural they are likely be sequence selected and "optimised" for smoothness, good fps and visual attraction.

The above has caused terrible confusion with a meaningless neurotic MHz,cache size, equivalent rating, IQ vs fps war amongst the internet elite and worse berating celerons and Durons as useless (when many know in operation they play games very well) while poor selling expensive overbloated cache high end cpus more relevant to servers than gaming, are discussed by the internet sites.

The solution: As in science (except for the 20th century)- the awesomely simple - DO THE TEST ON THE GAME ITSELF by a competent gamer. Yes you wont get a repetitive result but no game is played the same way exactly even if following a set routine (just like surfing - no wave is the same man- add failed surfer to the list!). By running several passes of the same judiciously chosen game sequences - meaningful results could be derived and systems compared for that game. Groupings of similar responding games would then help the consumer better match a system to his preferred games and importantly budget. If AT did that they would have to add a few more servers (Xeons of course 2MB L3) to cope with subscription.

PS Sorry to those that want me to bury it! Merry Xmas
PrinceGaz - Wednesday, December 24, 2003 - link
Yeah, I think we've wone this argument against Pumpkin... At least until the next time theres a CPU article (Prescott?) where he'll no doubt say its large cache cripples gaming performance. He should be on a comedy-show :)
Reflex - Wednesday, December 24, 2003 - link
*laff* *hands #55 some pom pom's*

Now if you don't mind, I must return to my secret identity.... ;)
Phiro - Wednesday, December 24, 2003 - link
Reflex: Don't worry, the silent majority is with you on this one.

Go Reflex, go Reflex, go Reflex!
Wesley Fink - Wednesday, December 24, 2003 - link
#53 - We did not mean to imply that the Athlon64 was not selling, it is just not selling at the rate AMD would like right now. The article you refer to is SYSTEM sales, and the A64 is stated to be top 10 in Canada. In my opinion, the 3000+ will definitely kick that up a huge amount.

In checking every local white box dealer, not one had an Athlon64 actually in stock for sale. Their bread-and-butter are mainsteam PC's, and the $450 A64 was a "Special Order" item. Athlon XP, on the other hand, were featured in most ads from the same dealers. Now that the 3000+ is out, I see A64 featured at these same dealers.

Intel/Celeron/P4 has been the domain of the big manufacturers, like HP and Dell, that sell in the chain stores. Whether AMD wants to hear it or not, AMD has been a much larger part of the "white-box" market. If the "white-box" dealers weren't using A64, then AMD was losing many sales. The 3000+ moves into a new price niche and will, in my opinion, sell VERY well.
dvinnen - Wednesday, December 24, 2003 - link
I figure this is worth a post for Pumpkinierre:

http://www.theinquirer.net/?article=13332

Seems to be selling fine to me. It's on eof the best selling PCs at TigerDirect, the 3000+ will no dought help even sell better. Even Anandtech can be wronge once in a while. As for your cache argument, the reason peopole bought the cacheless celerons was because they where great overclockers and cheap, not because they where low latency. The rest of argumenthas already been torn to shreads so I won't bother.
Reflex - Wednesday, December 24, 2003 - link
I'm sorry guys, I don't see a point in this debate any longer. Its fairly obvious that Pumkin simply does not know what he is talking about, and certainly not what cache does for a CPU. Its main purpose is to hide latencies inherant in the asyncronous design of modern CPU's and memory, and the more of it the better it does that job. Furthermore, most of what is contained in cache is not instructions themselves, but rather pointers to exact locations in memory that specific data/instructions are located enabling much faster retrieval of that information. The more cache you have, the more of this type of information it can store, more than making up for any latencies caused by the extra step of searching cache...

I have played with all the CPU's mentioned in these articles. I had my hands on the Athlon64 over two years ago. It is leaps and bounds beyond other architectures in many respects, and one of those is its combination of large cache size and integrated memory controller. It will never be outperformed by a Duron, nor by a lower cache version of the same chip. Feel free to use whatever you think is best for your own rig, but advocating this to others is doing them a disservice. And while the Celeron 300 was certainly valued for its overclockability, it was *never* considered better in overall performance than the Pentium II 450 in *any* respect. The lack of cache crippled the chip, even in gaming, although it had less of an impact in that arena than in some other tasks. Your history is more than a bit revisionist.

Anyways guys, I'm through arguing with someone who obviously knows nothing about what he is speaking of. I will continue to respond to those of you with the ability to actually go read the reviews and who's arguments do not consist of you simply deciding that since AT's reviews don't match up with your personal opinion that AT and the rest of the world is wrong and that you are right. ;> I require a bit more scientific proof than your opinion, especially seeing as I do know how this stuff works having worked on it myself.
AnonymouseUser - Wednesday, December 24, 2003 - link
TrogdorJW said: "For the educated market (most of the people reading Anandtech, ArsTechnica, Tom's Hardware Guide, HardOCP, etc.), the PR ratings are pretty much meaningless."

Judging the "educated market" by the comments some members have made to this article and many others preceding it, they aren't as "educated" as one would expect. Take, for example, the following statement concerning cache: "128K is probably optimum for gaming." (for proof of this ignorance, see the following comparison: http://anandtech.com/cpu/showdoc.html?i=1927) -_^

Pumpkinierre barfed: "Stick with celerons and durons, you'll have fun and money to boot."

Exactly why are you arguing over the top end processors while still advocating the low end?
PrinceGaz - Wednesday, December 24, 2003 - link
@Pumpkinierre: What exactly are wrong with game benchmarks? It doesn't make any significant difference to how a game runs whether someone is sitting there pressing keys and moving the mouse, or if the game itself is playing back a recorded demo of the same. The actual game-code executed is the same in both cases, it just takes its input from a different source. The recording is just as unpredictable as far as the CPU is concerned as someone playing it there and then.

Less cache is never going to improve the performance of games, especially not the 128K of cache you seem to be promoting. Every single gaming benchmark gave higher performance with the A64 3200+ than the A64 3000+ (except those that were gfx-card bound where they were roughly identical) and the only difference between the two processors is the 3200+ has twice as much cache. More cache clearly resulted in more speed.

If the cache were halved again to 256K, the loss in performance would be even greater, and halving it once more to 128K would have a serious impact. Just compare the performance in the budget CPU article of the 1.6GHz Duron (128K+64K) to the 1.47GHz Athlon XP (128K+256K) and you'll see the Duron lost every game test (sometimes by over 20% difference) to the 8% lower-clocked AthlonXP, because 192K total cache isn't enough for it to run well. The smaller the cache, the more of an impact it has.

You keep mentioning about how less cache improves the minimum frame-rate, or the "smoothness", or that they have lower-inertia than processors with more cache. What a load of garbage! Minimum frame-rates caused by the CPU will be hit that much harder if the processor has to keep going to system-memory because the data it needs isn't cached. The last thing you want is a system with very little cache like you're advocating.

I like your strange suggestion that a system with less cache has less inertia, as if you can actually notice the delay caused by larger cache CPUs when playing. Actually the memory-controller makes more difference to the latency or inertia as the P4 3.2 in the test had a considerably greater memory-latency of over 200 nanoseconds, compared to under 100nS for both Athlon 64 chips. Personally I've never been bothered by delays of a few hundred nanoseconds while playing even the most intensive games, in fact theres no way *anyone* will actually notice a delay caused by whether or not the processor decides to access main-memory or cache. But it'll be faster in the long run if it usually finds what it needs in a larger cache.

A 512K L2 cache seems adequate to give good performance in games as there isn't a major improvement when it is increased to 1024K (but that does add considerably to the size and cost of the chip). On the other hand, 256K does reduce the performance noticeably (compare a Barton to a simiarly clocked T'Bred) and cutting 256K of cache doesn't make so much difference to the size. Therefore 512K seems a good balance and an ideal cache size for gamers. Certainly far better than 128K :p
Pumpkinierre - Wednesday, December 24, 2003 - link
Forgot to add my opinion of P4EE as requested by #47. Basically a rebadged Xeon to compete with a rebadged opteron (FX51). Both at absurd prices. At least the P4EE doesnt need reg. memory and ABIT o'clocked it to 4 Gig at COMDEX on an IC7-G (also my mobo)with standard cooling (4.5 with specialist stuff- fastest cpu in the universe!). Given that it is a new core (galatin) and stepping there might be a bit of poke in it but others (AT?) didnt find much headroom.
Yes the benchmarks show 8-15% better than the 3.2P4 but for gaming youd be better off with the latter for the same cache reasons as I've stated in the previous posts. With the exception of a fast 128-256K L1 cache, the P4 cache arrangement is the next best thing with a very small 8K cache (notice smaller than the 16K L1 on the original pentium and done for a reason-lower latency) working inclusively (L1 content always present in L2) with a 512K L2 cache. For gaming this is superior to the exclusive arrangement on AMD chips which gives a larger combined cache at the expense of latency. This goes a long way to explaining the smoothness of P4's over A-XP experienced by gamers who have tried both. K8 smoothness (due to its low latency mem.controller etc.) is also already legendary. The P4EE probably has an inclusive L2 in L3 cache but you're getting into serious server territory with a 2Mb L3 cache.
Intel understand latency (witness PAT technology) and they are also aware of the internet benchmarking community. So to combat the FX51 which was a stupid kneejerk release (reg. mem. sckt940 etc.) from AMD they cynically released the heavy cache P4EE which would show up good in the predictable gaming benchmarks but in truth be worse than a standard P4 3.2 in actual play. The high price is meant to catch the well heeled fools who think they are getting the best intel gaming CPU. Stick with celerons and durons, you'll have fun and money to boot.
Pumpkinierre - Wednesday, December 24, 2003 - link
Yes 47 KF in most cases overwriting is all that is required with the possible exception of the exclusive L1 that AMD use in both K7 and K8. If the prediction algorithm deems that the information in the L1 cache is more important than some other data in the L2 cache then it is more efficient to write back the information to L2 (ie purge) than have to recall the same information from slower DRAM, should the CPU require different data/commands to be available momentarily in the L1 cache.

With regards to memory latency etc..You have to remember I dont even agree with any of the benchmarks or testing methods in reference to gaming. The only testing that is valid is a repetitive user controlled execution of a particular gaming program sequence. Power of a system is defined by the highest average minimum frame rate and latency/smoothness by the smallest average difference between maximum and minimum frame rates. It is integral that the computer system/game latency be defined as a system may work well on one game but not another. As far as all the other benchmarks go you may as well run Business Winstone for all that it will tell you.

The memory controller works on a 16 or 32 bit data path from the DRAM same as the CPU data bus so the time to fully load a 256K cache will be half of a 512K and 1/4 of a 1MB cache. So cache size will have an effect on system latency especially if the cache has to be frequently refreshed.
In the case of data, the CPU only require 4 bytes but it must be the right 4 bytes of data. If each time the data addresses are changed there is no predictive link between the new ones and the previous ones then the cache is refreshed at a latency cost. The strike rate for 2d/text applications is high. For fast interactive games very low. So a compromise between cache size and response must be struck. Early celerons (no cache) were loved by gamers for their low inertia but panned by the benchmarkers. Even the 128K celerons flew and regarded by many as better than P2 450s with half speed 256 or 512K cache for games. 128K is probably optimum for gaming. Larger caches have been put on desktops to accomodate other apps Office graphics, CAD internet as desktops are jack of all trades. more importantly 128K of cache would get K8 prices down as die size would be halved and capacity doubled
Reflex - Wednesday, December 24, 2003 - link
I believe if you read the line that was quoted from me you will observe the fact that I said 'noticable' decrease in performance. On a purely theoretical level yes you can lose a clock cycle or two on a larger cache size. However this is *never* going to be noticable in a real world application, only in an app that does nothing but measure clock cycles for information in cache. In the real world this performance 'hit' is completely meaningless, and as time goes on and programs get larger, CPU's will begin to show more and more of a difference due to the cache sizes. Compare a Duron with 128/64kb L1/2 to an equivilent speed Athlon with 128/256(unlocking the multiplier and setting the bus speed and clock speed identical). There is a definite performance improvement going to the Athlon, specifically due to the larger cache size. Theoretically it has slightly higher latency but realistically it means nothing in the end result.

I stand by my statement, a larger cache size is *never* a bad thing, as long as it is running at the same clock rate and bus width as the smaller cache. Any theoretical losses in performance are more than made up by future gains as apps start to utilize more cache.

And if your going to tell me that a gamer *only* uses their computer for a single threaded game, then don't quote multimedia encoding benchmarks to me next time you want to talk about where the P4 shines. Last I checked most people use their PC's for a *lot* of tasks, gaming being one of them. And in games, the Athlon 64/FX is pretty much the cats meow, at any cache size regardless of your personal opinion of how it should be rated. Makes me wonder what your opinion of the P4EE is considering all the cache on that thing. ;)

BTW, to the people talking about 64bit code being twice as large, well, its not quite like that, however I do believe you are correct that larger caches will play a bigger role with 64bit code than with 32bit code, time will tell what is optimal...

#45: You hit the nail on the head. In an ideal world, consumers would do their research and learn for themselves what is best, however I personally can admit to the fact that in a lot of cases I simply do not have the time. Ratings systems are currently a necessary evil in the PC market, and until Intel is not market leader that it currently is, they will continue to be necessary. I'd love to see AMD's True Performance Initiative take flight, but unfortunatly that is highly unlikely as Intel has no motivation to do so as it stands now...
arejerjejjerjre - Wednesday, December 24, 2003 - link
In other words Intel doesn't cheat!
TrogdorJW - Wednesday, December 24, 2003 - link
And as for Pumpkinierre's comments about the PR ratings, I hate to do it, but I agree with him. They suck! Like, black-hole suckage! Only problem is, there really isn't a good way to fix them.

The fact of the matter is that in any given architecture, more MHz is pretty much always faster (although not necessarily by a large amount). So while the Pentium 4 1.4 GHz chips really weren't much better than the P3 1.0 GHz, once we get to 2.0 GHz and above, there's no real comparison. Also, the various changes to the chipsets and cache sizes made P4 a real winner in the long term. I'm still running a P3, and trust me, it really stinks!

For the educated market (most of the people reading Anandtech, ArsTechnica, Tom's Hardware Guide, HardOCP, etc.), the PR ratings are pretty much meaningless. At least, I *HOPE* they are. We all know that there's more to a system than a fast processor. Not so for the more typical customer... the PR ratings are either a good way of equating performance, or in some instances, they're a great way to make a sale.

If AMD actually got realistic benchmarks out to all the people, most still wouldn't have a clue what to make of them. And the funny thing is, if you go into many computer stores, the advantage is really in AMD's favor with their PR scheme. I have heard many salespeople say something like, "Oh, the 3200+ AMD system is just as fast as a Pentium 4 3.2 GHz, and it costs less," which is a bold-faced lie. But it has closed many a sale, I'll wager, and so it continues.

Intersting to note that while Intel certainly pushes clock speeds, they really aren't making any unsubstantiated claims. (Well, they are, but no more than anyone else...) And you don't see Intel trying to tell people that a Celeron 2.8 GHz is a better CPU than a 2.0A GHz Pentium 4 (which it isn't!), because they know that clock speeds are only part of the story. That's why we get great marketing speak like "FSB800", "Quad-pumped", "Dual-channel", "HyperThreading", and so on. And the fact is that pretty much every one of those items does indeed tie in to performance benefits. Intel just doesn't bother to try to explain to the simpletons of the world exactly how MUCH of an impact each technology has. ;)
TrogdorJW - Wednesday, December 24, 2003 - link
Larger caches DO cause some increase in latency. Basically, you have more circuitry and such to go through. However, the real penalty can be seen quite clearly: The 512 KB cache CPU beats the 1 MB cache CPU by an AMAZING two clock cycles. Or, since we're talking 90 cycle compared to 92 cycles, it wins by 2.2%!!!!!!!

Wow, that's so significant that I think I need even more exclamation points to prove the importance of that win: !!!!!!!!!!!!!

Okay, so we see an overall decrease in latency of 2%. Compared to the P4 where it only has latencies that are 58.7% lower. And we can see that where low latency is important, the A64 definitely gains an advantage over the P4. But overall, a miniscule difference in memory latencies isn't going to matter in most benchmarks. The remaining tests bear this fact out.

The question about benches in 64-bit is an interesting one, however. Code that's twice as large will need twice as much cache to maintain parity with a 32-bit core. I think the 1 MB cache version will outperform the 512 KB version by a larger margine in 64-bit, but larger than "almost no difference" may not be that large. :)
KF - Wednesday, December 24, 2003 - link
>wrong data can be purged and refreshed within the average computational cycle.

Wrong (unneeded?) data is not purged.It is overwritten. Therefore no time is used to purge wrong data.

I don't know where you got this from.It would be like saying the data on hard drive had to be erased before you could reuse a sector. Not true.

The only thing I know of that could make a larger cache slower is that the tag comparitor that does the comparison of addresses (tags) with an extra bit could have a longer propagation delay.

Sorry about the multiple posts. I don't know what I'm doing to cause this. IE just jumped away as I was typing and I saw multiple copies of the partially done message come up.
KF - Wednesday, December 24, 2003 - link
>If the cache is too small it starves the cpu, if too big it causes lag

A bigger cach does not cause any lag.

>wrong data can be purged and refreshed within the average computational cycle.
Pumpkinierre - Tuesday, December 23, 2003 - link
"There are no instances where a larger cache will noticably slow down a computer. There are many instances where it will noticably speed it up"

#37 Its difficult to understand your statement when this very article shows a 512K A64 beating a 1 Mb A64 on memory latency. If it was the mobos as you say, at the very least it shows that cache size has no effect on memory latency. Yes, if you run 5-10 apps at the same time you need a large cache. But gamers dont do this - they run one app.(the game) to maximise CPU power. Yes I agree with better prediction algorithms and compilation etc. but these are only of use if the task is predictive. Gaming is not predictive and some of these optimisations could in fact hinder this application. Cache is a middle man and as a consequence suffers a delay. The ideal system is the memory running at CPU speed with memory addresses buffered in the CPU(ie cached memory)no L1 or L2 cache. In the real world the large differencer betweeen cpu and memory speed make caches useful. For 3Dgaming the cache must minimise system latency. If the cache is too small it starves the cpu, if too big it causes lag. For gaming the ideal cache would be optimal in size and running faster than the cpu so that wrong data can be purged and refreshed within the average computational cycle. The k8 lends itself to this low latency development with its on die mem. controller and single bridge(nf150) mobo design but AMD have changed their direction- servers, multi-tasking, heavy caches, big dies, costly.

As for the rating system you couldnt have brought up a better example than the Cyrix/IBM ratings of long ago to show AMD what not to do. They were a total farce, again the rating not reflecting the truth, and customers including myself avoided them in droves.
RZaakir - Tuesday, December 23, 2003 - link
#38. It shuts down like the Athlon XP has been doing for a while now.
kmmatney - Tuesday, December 23, 2003 - link
So what happens if you pull off the HSF with the Athlon 64? Does the cpu fry up, or does it shut down with no harm? I recently had a Duron fry up when one of the HSF tabs snapped. off the cpu socket...
Reflex - Tuesday, December 23, 2003 - link
#32: The lack of a huge performance difference with larger cache sizes is actually indicative of the fact that most applications are optimized for CPU's with 256kb of L2 cache. As applications start to be compiled to take advantage of the ability to place more instructions in the cache, CPU's that have larger caches will start to pull ahead noticably. Furthermore, larger cache sizes aid in multi-tasking, most of these benchmarks are performed with a single application running at a time, but in reality I often have 5-10 apps running simultaniously. The larger the cache, the more responsive the system will be.

There are no instances where a larger cache will noticably slow down a computer. There are many instances where it will noticably speed it up. Your comment about the motherboards is rediculous: you can 'expect' whatever you like, but the simple fact is that they are made by different manufacturers and experience has shown that there is a large variety of BIOS tweaks that can be done for the AMD64 platform, performance has been going up almost weekly with new BIOS updates, and its likely that that trend will continue. You do not know the differences between the two boards tested, so your statement is pure speculation based on zero evidence and an obvious lack of knowledge of how cache is used by applications(for instance, the prediction tables/algorithms used is FAR more important than the cache size itself).

As for your statement about Mhz, that has also been proven a falsehood. Back in the days of the Pentium, AMD and Cyrix both tried to list their Mhz accuratly with the K5 and 6x86 series. They were both clobbered until they switched to the PR system instead. Unfortunatly they did not maintain pace using this system and discredited the PR system altogether(Cyrix was the worst about it). The average consumer who sees "Athlon64 2.0Ghz 640kb" on a system next to a "Pentium4 3.0Ghz HT" *is* going to assume that the P4 is the better system, even though it performs at a lower level in 90% of the benchmarks out there and costs more money. Labeling their systems this way would be suicide for AMD. I would highly suggest you learn a thing or two about marketing before making such rediculous statements...

#33: Where have you seen otherwise? Certainly not on Anandtech. Also, performance has improved considerably since the release of the A64 and FX due to BIOS updates and tweaks, so your probably seeing that reflected in these later articles appearing. However, even at the start the A64 was an extremely strong performer against the P4 line(and the FX in a league of its own).
HammerFan - Tuesday, December 23, 2003 - link
#25, it isn't possible to use an A64 on a socket 940 board, nor is it possible to use an A64FX on a socket 754 board. If it were, AT probably would've used the same board for all 3 CPUs
arejerjejjerjre - Tuesday, December 23, 2003 - link
By the way why did they use different motherboards for A64 and A64 FX? Wouldn't it be more accurate to compare those processor with the same motherboard?
I don't know about you guys but I think soltek isn't as good as asus but that may just be in my head...
arejerjejjerjre - Tuesday, December 23, 2003 - link
MrEman havent you noticed that Intel is slowly going to the direction of more performance per clock!
Every new release of their cpu has been made to perform better than the last one with the same clock speed like Willamette->NW->NW(133FSB)->P4(200FSB)C
arejerjejjerjre - Tuesday, December 23, 2003 - link
Why is it that A64 is so strong compared to P4 3.0 and 3.2 in gaming performance I recall seeing many articles say otherwise? One thing I remember clearly that P4 3.2 (not extreme edition) won quite perfectly any athlon 64 thrown at it! A64 Fx51 is another thing though...
Pumpkinierre - Tuesday, December 23, 2003 - link
Here you go #30 from the horse's (AT) mouth:
"While the Athlon64 is a better designed and better performing processor than the Athlon XP in almost every way, people have not been waiting in line to buy the processor. Certainly the cost of motherboards is not the reason, since there are many Socket 754 boards in the $100 and less price range. Performance compared to Intel is also not the reason, since the Athlon64 3200+ performs very well compared to Intel’s best. The issue seems to be price."
They might be shipping fine but not many have been buying .
As far as the mobo explanation for the smaller memory latency shown by the 3000+ goes, I would expect the soltek board (3000+) to be a slower board than the ABIT board which targets gamers and enthusiasts. Many people say that increased cache improves latency. If the cache is the same size as memory this may be so. But in fact as soon as the cache is below half of the memory size it starts becoming a hindrance to memory latency for applications where it has to be purged and refreshed at the whim of an operator. This article shows this ,with very little difference in performance between 512k and 1 Mb l2 models. There is a cache size between CPU instruction/data starvation and redundant cache purging and refreshing which optimises memory latency for aparticular application. In the case of gaming with the A64 I believe that to be 128 to 256K of L1 preferably, or larger L2 (256-512K) and small inclusive L1 (a la P4).

And the AMD rating system is crap. Just call it A64 2.0GHz 640K combined cache and it will sell just as well if not better.
Reflex - Tuesday, December 23, 2003 - link
The ratings argument was rediculous. Most consumers really only know Intel's chips, so rating against them is pretty much the only way to rate your processor family if your name is not Intel. Honestly, Intel is using a ratings system more or less by just going on Mhz, when you consider the fact that Mhz is almost meaningless in thier current designs its just as much an arbitrary number as anything else.

I really don't know anyone who has been confused by just giving them a number that compares to Intel. Its simple and keeps the issue from becoming too large. Considering the fact that most consumers will *not* bother to get to know the differences in architecture before they purchase their PC, there really isn't a huge choice. And the car analogy was rediculous, there is no auto manufacturer that owns 80%+ of the market, so there is no need for competitors to rate thier products against the market leader. If a auto manufacturer was at that level you can be certain that the competition would be comparing themselves to the leader, thats how it generally works...

And more cache does not equal a crippled gaming CPU. The cache on a FX series CPU is running at the same speed as on a A64, its just larger. This is a good thing, and honestly there are no scenerios I can come up with where it would harm performance in any way that could possibly be noticable to the end user. It was a good move. So far as I can tell the FX outperforms the A64 on literally everything, so I see nothing deceptive about it at all.
dvinnen - Tuesday, December 23, 2003 - link
Pumpkinierre: Please shut up. The 3200+ os shipping fine. They even if they haden't done a silent release, it has nothing to do with the server faimly. Releasing a slower chip isn't usally sometihng to shout from the rooftops. And the slight memory difference had nothing to do with the cache. It was 2 cycles faster most likly because they weere two different mobos. For your P4EE review, is it so hard to check the older anandtechs review of it?

I'm not going to get your absurd argument about the rating scheme
Reflex - Tuesday, December 23, 2003 - link
To the guys mentioning the P4EE: I only ask where the Alpha EV8, Sparc, Xeon, Itanium, and Opteron are. Most of those are in the same price range. The EE is not a chip in the same catagory, judging by price, as the Athlon64 3000+, or even the Opteron and FX. The only one listed on realtime pricing is the 3.2Ghz at ZipZoomFly and it runs $1032. Thats the same price you could build an entire Athlon64 system for! Its not really comparable and dosen't belong in this comparison any more than an Alpha or Xeon does. In a comparison of a new FX I would expect to see the EE, but really you only need one top end CPU to put budget processors in perspective, and the fastest top end CPU was already used(AthlonFX) so why add more and waste a ton more time benchmarking. We already can figure out where the EE would stand just by looking at earlier reviews and its relation to the FX, so no point in rehashing it here...

As for this CPU, I really don't care what the original core is. Its a top flight CPU for a very good price on a platform with a lengthy upgrade path. What more can you ask for? And, when 64bit hits mainstream, it can handle it, which is a nice bonus. If I could spare $300 right now I'd be there. ;>
Pumpkinierre - Monday, December 22, 2003 - link
As far as I am aware #26 Intel name their celeron and centrino processors according to their actual speed not some mythical "equivalent" with the P4. Even within the P4 intel makes no distinction between 100,133,200MHz FSB or 512K,256K L2 cache or hyperthreading even though these attributes add power to the CPU. Even Apple, forever bragging how much they are better than wintel machines, dont give an equivalent. Only AMD have gotten themselves in this contorted lie upon lie type rating mess. Its up to the consumer to find out and the sales personnel to point out the strenghts and weaknesses of each cpu. This is routine, when you buy a honda or mazda car you dont get a Ford or GM equivalent rating thrown at you. You buy it on its own merits. Try this:

A-XP: Very good at games and all 2D, cheap.
A64: Best at everything except video encoding, expensive.
P4: Best at media encoding and very good at all other tasks, middle to expensive.

If AMD put that out theyd get plenty more sales as the majority are interested in Office, gaming and internet applications-the strong points of the K7 and K8's. Instead consumers get this stupid rating system where clearly the A64 performance and cost are superior to the A-XP of equivalent rating. So one question would probably sink the sale as they'd recognise a con and turn to intel who name their products for what they are.
yak8998 - Monday, December 22, 2003 - link
#26 - I think Intel's system is much better, although it could use some improvement. Simply name, clockspeed and a letter, ex: P4 2.4C. It would be harder for AMD tho due to a few more factors (memory controller and such)
MrEMan - Monday, December 22, 2003 - link
#23/24, what would you suggest AMD use in place of their current part numbering scheme? Is Intel's clock speed designation any more accurate/less confusing when comparing P4s and Celerons? In my opinion, the PC industry needs something along the lines of the cpu performance equivalent to the FTC power spec for measuring audio amplifier outputs.

I believe AMD was/is trying to get the industry to back the True Performance Initiative in order to achieve more accurate comparisons when testing different processor architectures.

However, unless Intel dumps the P4 design, they have no business reason to change to something far more accurate then their current "GHz is everything" preference for the retail market.

Unfortunately, the uninformed buyers are the ones most hurt by the lack of an industry standard for measuring CPU and system performance.
PorBleemo - Monday, December 22, 2003 - link
On the main page it says it has 512MB of L2 cache! That's OK by me! :P
Pumpkinierre - Monday, December 22, 2003 - link
Oh yes and while I'm at it, AMD should drop that silly naming system. Its not only confusing with 2 different processors(K7,K8), 4 different caches(L2 64K,256K,512K,1Gb), 4 FSB(100,133,166,200MHz) and single bank/dual bank mem. controllers. It basically makes Intel the standard and allows them to call the shots as they did with the P43.2 vs the A-XP3200+. The masses enjoy a bath and also dont like BS to which most of them turn their back at the slightest whiff.
Pumpkinierre - Monday, December 22, 2003 - link
This is the cpu that should have been released to the masses (who do enjoy a shower #9) in september. The only disgrace is the lack of official release from AMD who dont want to disturb their server focussed business model in the eyes of the analysts and so decide to slip it out the back door. That's why I am on the side of people who think these cpu's are just A64 3200s that failed to make the grade. At least, mine and others rants about AMD only looking after the well heeled have'nt fallen on deaf ears. And I agree with #7 a P4EE should be included if you include the outrageously priced (and yes limited edition #11) FX51 in your reviews. After all by the time you include reg. memory and 940Mobo for the fx the price diferential cf. P4EE in that stratospheric category is not much.

While I'm discussing ranting, look at the only bench mark where the 3000+ beat the 3200+ (and all others) sciencemark2 (memory latency). This demonstrates what i've said about large caches getting in the way of system latency. This low latency translates into better response and smoothness in gaming (not demos which dont show this quality due to their predictable code path). The ideal woud be a fast L1 cache (probably 256K) and quad pumped fast memory maybe that dual phase memory that VIA are looking at. What AMD have created is the ideal gaming chip then crippled it with a large cache because they decided to go into the server market and then re crippled the desktop chip with a single bank memory controller so as to differentiate product without upsetting their production line . No wonder SETI doesnt get any reply out there. Still lack of A64 3200+ sales has reluctantly pushed them to release the 3000+ maybe the same will occur with the true Newcastle once they realise that the server path is going to be along slow haul.
tfranzese - Monday, December 22, 2003 - link
good review, good chips again from AMD
KristopherKubicki - Monday, December 22, 2003 - link
The only difference between NewCastle and Clawhammer was the onchip cooling technology (and the 1/2 cache size)...

My NewCastles are in the mail, Ill do some thermal testing on it for some upcoming enclosure reviews as well to see the difference.

Kristopher
sandorski - Monday, December 22, 2003 - link
sweeeeeeet
Shinei - Monday, December 22, 2003 - link
Okay, now I'm REALLY upset that I just bought a 2800+ Athlon XP... :(
Fantastic review though. :)
Jeff7181 - Monday, December 22, 2003 - link
We already know it does well in a 32-bit environment.
I'd like to see it tested against the A64 3200+ under a 64-bit Linux OS and software.
64-bit code naturally has more "bulk" to it than 32-bit code, so the extra cache of the 3200+ SHOULD cause a larger performance gap between the two processors when running 64-bit software... although I've yet to see this tested anywhere, it is one of the most important factors.
Oxonium - Monday, December 22, 2003 - link
Ok, I concede that Morgan was a slightly different core. But my point was that it is fairly common for AMD and Intel to give the same core different names depending on their cache size or how they are to be used.

I agree with dvinnen. I'm sure a smaller die would save some money in wafer costs but it also requires design time, tooling, and qualification. The die size issue will likely be addressed in ~6 months when AMD implements the 0.09 micron manufacturing process. It doesn't make a lot of sense to spend money on reducing core size when there will be a new core in a few months anyway that resolves the problem. Plus, as dvinnen said, using the Clawhammer core allows AMD to still make a profit on the Athlon64's that don't pass QA with their full cache.

As for memory controller improvement, that could be true. But this would be more like a new processor stepping rather than a new core.

Hopefully Anand will remove the heat spreader to show if this really is just a 3200+ with half the cache disabled or a new core.
johnsonx - Monday, December 22, 2003 - link
#12 - I think you're wrong on the Morgan being just a Palomino with most of the cache disabled. I'm pretty sure the Morgan was actually a different core that truly had only 64k L2.

As to the Thorton, you may be right on that one... I'm not sure.
dvinnen - Monday, December 22, 2003 - link
but of course, there is only one way to find out, pop the top on them. Some core pictures will tell the truth.
dvinnen - Monday, December 22, 2003 - link
I agree, I think it will actually cost AMD more to make a smaller core. Sure, it will save wafer costs, but they lose the value of bulk, have to make new masks, plus can't pull a Celeron and sell chips with some cache broken.
mkruer - Monday, December 22, 2003 - link
According to AMD Zone, this is not the Newcastle Core, but rather and AMD64 with half the L2 cache disabled.

BTW #12 are you sure I though the Newcastle Core was suppose to have some memory controller refinements. Either way it is still good to see the chips finally coming down to a more reasonable price, and still kicking Intel @$$
Oxonium - Monday, December 22, 2003 - link
I think Newcastle is just the codename for the Clawhammer corde with cache disabled, rather than a totally new core. AMD frequently does this. Morgan was the Palomino core with 3/4 the L2 cache disabled. Thorton was the Barton core with 1/2 the L2 cache disabled. Intel uses a similar naming system in distinguishing the Xeon cores from the Pentium 4/Celeron cores. This naming system can sometimes make it difficult to determine whether a core is totally new or variant of another one.
MrEMan - Monday, December 22, 2003 - link
#9, I think Intel has a bigger problem with "clockspeed is the only thing that matters" position in determining CPU performance. I have yet to see any explanation out of Intel as to why a 1.6GHz Duron is better in many cases than a 2.6 GHz Celeron. Same goes for the Pentium III like 1.6 GHz Centrino comparing to a 2.6 GHz P4.

My main fault with AMD over the years is that they never run ads comparing equivalent clock speed Athlons/Durons vs P4/Celerons, to let the public see who is BSing whom when it comes to clockspeed vs performance (would anyone brag that they need a 30%+ increase in GHz to compare with their competitor's product, especially when they cost almost twice as much?). As for their use of performance ratings, I feel the Intel marketing machine left AMD no choice but to do so.

#7 as for the P4EE who really cares about the P4EE with its limited availability (most going to Intel's Dell division), and its outrageous price... not me, that is for sure.

Lastly, concerning performance to currently available competitor's products, how come so many cut Intel slack back when the original 1.3-1.5 GHz P4s dogs got trounced by P3s of that time, let alone Athlons, and now give AMD grief because the P4 wins some of the benchmarks? Also those giving AMD grief about being late with the Athlon64, it seems that Intel is also having process problems with their new designs.

Seems like a double standard to me.
morcegovermelho - Monday, December 22, 2003 - link
#8 Thanks.
Great review.
johnsonx - Monday, December 22, 2003 - link
Is it really NewCastle? As in a new core that is actually different (like it physically has only 512k cache and thus is smaller and therefore cheaper to make)?

Or is it just the same ClawHammer core with 1/2 the cache disabled? (surely with such a big die, many 1Mb ClawHammer cores must come out with some bad cache but otherwise perfectly functional)

I tend to think it is the latter, though the author seems pretty convinced we really have a new core here (but with no further explanation).

Or am I being too picky about what does and doesn't constitute a new core? Does turning off half of the cache on old core X = new core Y?

As a side note, to Praeludium, recall that back when the AthlonXP first came out, the ratings were VERY conservative as well; 1600+ chips benchmarked well against 2.0Ghz P4's. But as Intel changed cores, increased cache & FSB, and added HT, AMD's ratings increasingly seemed like marketing BS; but this is not an entirely fair criticism. AMD had no choice but to keep the same ratings system for the AthlonXP, else they would be in the ridiculous position of bringing out a faster AthlonXP, but giving it a lower rating than a slower AthlonXP. Ratings numbers are designed to compare with Intel (nevermind what AMD says about comparing to the T-Bird - at first, probably yes, but not any more), but the ratings also have to be internally consistent within AMD's AthlonXP line. AMD has taken the wise step of 're-targeting' the ratings of the A64, though these new ratings may falter again when the Prescott comes out and starts to ramp.
I think AMD is probably stuck with this ratings system on the consumer-level chips for quite some time; I'm sure AMD and Intel both chose their architectures carefully and for good technical merit, but higher IPC is certainly harder to market to the unwashed masses than raw clock speed.
Wesley Fink - Monday, December 22, 2003 - link
#4 - Memory Bandwidth and Latency (ScienceMark 2) should not change between a 3.0 and 3.2 Pentium 4. We only included both the 3000+ and 3200+ in these tests because the Athlon64 Memory Controller is on the chip, and as we said in the review, we wanted to roughly check if there had been any chages to the Memory Controller.

We started comparisons to the 3000+ with a 3.0 P4, but when we saw the gaming benches, we updated to a 3.2. In the interest of getting a review to you quickly, we ran complete benches on the 3.2, but only included 3.0 benches where we had already run them, or where they would provide more info. The winstones are VERY time-consuming and are run on a separate drive, while Aquamark 3 appeared mostly bound to the video card we were using for benchmarking.
saechaka - Monday, December 22, 2003 - link
will the p4ee be included in the next review?
KristopherKubicki - Monday, December 22, 2003 - link
lifeguard1999: Same for me.

Kristopher
Praeludium - Monday, December 22, 2003 - link
"...the 3000+ rating of the [...] Athlon64 is very conservative."

Heh, never thought I'd see that. But it does seem true, almost suspiciously so. I am still holding off on getting an Athlon64 because my own AthlonXP chip is not yet a year old, but all the same, the new year might bring an investment in one (investment? Ha, the things depreciate faster than a car hitting a cement wall).

Good review at trying to clear up the mystery. I kind of wish the 3000+ was slightly less of a performance champ though, because now I'm thinking AMD can't figure out how to scale with their own rating numbers. :P

And #4, I'd almost assume the 2.8c wasn't included because the results were pretty much in favour of the A64 3000+ already over the 3.0.
morcegovermelho - Monday, December 22, 2003 - link
page 4 - Science Mark 2; page 5 - Aquamark; page 7 - Winstone 2004; Where are the 3.0Ghz P4 results?
Why isn't the P4 2.8C included? It's in the same price range.
lifeguard1999 - Monday, December 22, 2003 - link
Interesting. The charts DO show up in both IE and Firebird when I click through the pages one-by-one, but not when I click the "Print this article" link (which is how I read the articles). Go figure. Thanks for the help Curt.
Curt Oien - Monday, December 22, 2003 - link
I see charts with IE
lifeguard1999 - Monday, December 22, 2003 - link
This would be a nice article, but I do not see any charts in either IE or Mozilla Firebird. Is anyone else seeing this problem. or is it just me? Or is it that there are no benchmarking charts?
sheh - Wednesday, August 10, 2016 - link
The article is missing the images, as of 8/2016.

Athlon64 3000+: 64-bit at Half the Price

Post Your Comment

75 Comments

Back to Article

Reflex - Saturday, December 27, 2003 - link

Pumpkinierre - Friday, December 26, 2003 - link

Reflex - Friday, December 26, 2003 - link

Pumpkinierre - Friday, December 26, 2003 - link

PrinceGaz - Thursday, December 25, 2003 - link

Reflex - Thursday, December 25, 2003 - link

Reflex - Thursday, December 25, 2003 - link

Pumpkinierre - Thursday, December 25, 2003 - link

Pumpkinierre - Thursday, December 25, 2003 - link

Reflex - Thursday, December 25, 2003 - link

Pumpkinierre - Thursday, December 25, 2003 - link

PrinceGaz - Thursday, December 25, 2003 - link

Reflex - Thursday, December 25, 2003 - link

Pumpkinierre - Thursday, December 25, 2003 - link

Pumpkinierre - Thursday, December 25, 2003 - link

Reflex - Thursday, December 25, 2003 - link

HammerFan - Wednesday, December 24, 2003 - link

Pumpkinierre - Wednesday, December 24, 2003 - link

PrinceGaz - Wednesday, December 24, 2003 - link

Reflex - Wednesday, December 24, 2003 - link

Phiro - Wednesday, December 24, 2003 - link

Wesley Fink - Wednesday, December 24, 2003 - link

dvinnen - Wednesday, December 24, 2003 - link

Reflex - Wednesday, December 24, 2003 - link

AnonymouseUser - Wednesday, December 24, 2003 - link

PrinceGaz - Wednesday, December 24, 2003 - link

Pumpkinierre - Wednesday, December 24, 2003 - link

Pumpkinierre - Wednesday, December 24, 2003 - link

Reflex - Wednesday, December 24, 2003 - link

arejerjejjerjre - Wednesday, December 24, 2003 - link

TrogdorJW - Wednesday, December 24, 2003 - link

TrogdorJW - Wednesday, December 24, 2003 - link

KF - Wednesday, December 24, 2003 - link

KF - Wednesday, December 24, 2003 - link

Pumpkinierre - Tuesday, December 23, 2003 - link

RZaakir - Tuesday, December 23, 2003 - link

kmmatney - Tuesday, December 23, 2003 - link

Reflex - Tuesday, December 23, 2003 - link

HammerFan - Tuesday, December 23, 2003 - link

arejerjejjerjre - Tuesday, December 23, 2003 - link

arejerjejjerjre - Tuesday, December 23, 2003 - link

arejerjejjerjre - Tuesday, December 23, 2003 - link

Pumpkinierre - Tuesday, December 23, 2003 - link

Reflex - Tuesday, December 23, 2003 - link

dvinnen - Tuesday, December 23, 2003 - link

Reflex - Tuesday, December 23, 2003 - link

Pumpkinierre - Monday, December 22, 2003 - link

yak8998 - Monday, December 22, 2003 - link

MrEMan - Monday, December 22, 2003 - link

PorBleemo - Monday, December 22, 2003 - link

Pumpkinierre - Monday, December 22, 2003 - link

Pumpkinierre - Monday, December 22, 2003 - link

tfranzese - Monday, December 22, 2003 - link

KristopherKubicki - Monday, December 22, 2003 - link

sandorski - Monday, December 22, 2003 - link

Shinei - Monday, December 22, 2003 - link

Jeff7181 - Monday, December 22, 2003 - link

Oxonium - Monday, December 22, 2003 - link

johnsonx - Monday, December 22, 2003 - link

dvinnen - Monday, December 22, 2003 - link

dvinnen - Monday, December 22, 2003 - link

mkruer - Monday, December 22, 2003 - link

Oxonium - Monday, December 22, 2003 - link

MrEMan - Monday, December 22, 2003 - link

morcegovermelho - Monday, December 22, 2003 - link

johnsonx - Monday, December 22, 2003 - link

Wesley Fink - Monday, December 22, 2003 - link

saechaka - Monday, December 22, 2003 - link

KristopherKubicki - Monday, December 22, 2003 - link

Praeludium - Monday, December 22, 2003 - link

morcegovermelho - Monday, December 22, 2003 - link

lifeguard1999 - Monday, December 22, 2003 - link

Curt Oien - Monday, December 22, 2003 - link

lifeguard1999 - Monday, December 22, 2003 - link

sheh - Wednesday, August 10, 2016 - link

Log in