Comments Locked

54 Comments

Back to Article

  • Meaker10 - Sunday, February 5, 2017 - link

    And here we have a card with all the features that would be on a high end consumer card if AMD we pushing them.

    This is why even Nvidia lovers should want AMD to do well too...
  • ddriver - Sunday, February 5, 2017 - link

    Specs look good, but I don't think the value for the dollar will be that good.
  • SharpEars - Monday, February 6, 2017 - link

    But that >$5,000 price tag, though.
  • Yojimbo - Sunday, February 5, 2017 - link

    Huh? Why would NVIDIA put HBM 2 on a high end consumer card? Games don't need it on NVIDIA's architecture, yet. Why would NVIDIA put FP64 on a high end consumer card? What other features are you talking about that GP100 has that GP102 doesn't?

    GP102 is the superior high end consumer architecture, and you are already getting it. The only difference is if AMD had competitive high-end products, NVIDIA would have released the 1080 Ti by now, so the ultra high end would be significantly cheaper than the Titan X it is served by now.
  • close - Monday, February 6, 2017 - link

    One more big difference could have been (slightly) lower prices.
  • DanNeely - Monday, February 6, 2017 - link

    At the moment HMB2 is only marginally faster than GDDR5X; it's main advantage is in terms of power consumption.
  • Yojimbo - Monday, February 6, 2017 - link

    HBM2 looks to be about twice as fast as GDDR5X.
  • Yojimbo - Monday, February 6, 2017 - link

    Well, maybe not twice as fast, but it's at least 50% faster.
  • ddriver - Tuesday, February 7, 2017 - link

    Actually on its own HBM2 is much slower. It is the die level parallelism that increases its bandwidth. The HBM2 solution is 4096bit, gddr5x solution is barely 384 bits wide - more than 10 times narrower, but only 0.6 times slower. If you were to implement a 4096 bit wide gddr5x controller it would destroy HBM2 at a comparable bus width. That would however be highly problematic and undesirable, as it would increase board size and power consumption drastically.
  • Yojimbo - Tuesday, February 7, 2017 - link

    Yeah and SSDs are slow because it's die-level parallelism that increases bandwidth. It doesn't matter what methods are used to achieve the bandwidth. The conversation is obviously about the bandwidth of the memory system. If you were to implement 4096 width bus in GDDR5X it would be huge, expensive, and power hungry. I'm not sure why you are saying this. It's just obfuscating the issue. Within ranges of reasonable cost and power consumption an HBM2 memory subsystem is capable of greater than 50% more bandwidth than a GDDR5X memory subsystem.
  • ddriver - Tuesday, February 7, 2017 - link

    What you are missing is that hbm is not without its problems either - cost, availability, production delays. Obviously hbm is not intrinsically better, if it were, nobody would use gddrx. Also, it would only take a 768bit memory interface for gddrx to beat hbm, and that's not too much of a stretch either, there have already been 512bit memory controllers on gpus, a 25% increase outta be possible.

    What defines products is not possibility but commercial viability. If you had the money to spare for a larger MC on the die and the extra power, you wouldn't even bother with hbm.

    The reason hbm is more energy efficient is that it is SLOWER than gddrx. So it is just not objectively true to say that hbm is faster.
  • Yojimbo - Wednesday, February 8, 2017 - link

    I'm not missing that HBM2 is not a perfect solution. I also never said it is "instrinsically better". I said it's faster. Within 9 months you will see HBM2 at 1 TB/s. Already you can see it at 720 GB/s. There's a reason you don't see GDDR5X used for bandwidths that high, and it's not because it's faster.

    There is always a cost and power trade off for performance. It's irrelevant how fast GDDR5X could be if you allowed it to run off a small nuclear reactor. The only speed that matters is the speed that works within reasonable constraints of cost and power. Within those reasonable constraints HBM2 is faster. It's faster because it's more energy efficient. It is most certainly objectively true to say HBM2 is faster. There's a context to every statement. And an engineering statement has an engineering context. I don't have to read an HBM2 whitepaper to determine it's scalability to infinity to counter your argument. Such arguments are entirely irrelevant.
  • ddriver - Wednesday, February 8, 2017 - link

    And yet that doesn't make it faster. It just makes it more energy efficient. It could not possibly be technologically faster, as it runs on a much lower clock rate. It doesn't scale to infinity either, you don't simply add more dies and they work by magic, this has to be facilitated at the memory controller.

    HBM mainly owes its efficiency to two factors - 1 it is produced at a better process node, and 2 it is optimized for best performance to power ratio. There is nothing preventing you from doing the same with gddrx chips, aside from economic viability. It is more economically efficient to use older production lines for gddrx and have the new production lines free for chips where efficiency is more critical, and since gddrx doesn't live on the same chip as a 200 watt gpu, it can be pushed to its limit in order to maximize raw performance at the expense of some extra power you have the TDP budget to displace.

    Today it is entirely possible to create a GPU with a 1024 bit MC, which will beat hbm2 amply in terms of bandwidth. And it will be able to support like 128 GB of memory at least, whereas with your fabled "scalability to infinity" for hbm2, you hit a brick wall at 32gb, and only once the 8gb chips become available, for now 16gb is the limit. Sure you could possibly make a GPU that could support 8 hbm modules, but newsflash - that would require a 8192 bit memory controller, and that's about 8 times more complex than a 1024 bit memory controller, and a lot more transistors to turn out faulty during manufacturing, rendering the entire chip useless.

    So get the fanboyism out of your head and realize that hbm is not faster, it is just more energy efficient.
  • Yojimbo - Wednesday, February 8, 2017 - link

    Oh yes I am an HBM fanboy. HBM! HBM! HBM! Though, obviously one of us clearly can't see straight. Probably the one that doesn't realize that engineering solutions are designed to solve engineering problems, and engineering problems have constraints. 128 GB of GDDR5 memory would take up a huge amount of real estate just like a 1024-bit GDDR bus width would add too much complexity and draw too much power.

    I have no idea how easy it would be to extend HBM2 past 32 channels. But it really doesn't matter. It's designed to operate within the realm of practicality. No one is going to build your 1024 bit GDDR5 solution. If they wanted to they would would already be doing it and wouldn't be using HBM2. Within the realm of practicality HBM2 is faster and that's all that matters.

    Anyway, we're obviously getting nowhere here. Take care.
  • JoeyJoJo123 - Monday, February 6, 2017 - link

    >Games don't need it...

    Man I sure do hate tired old arguments that are structured like this.

    Yeah, and nobody _needs_ every car to have self-driving functionality. And nobody _needs_ phones with battery lifespans that exceed one day of usage. And nobody _needs_ monitors with high refresh rates and high color fidelity.

    But it sure as hell would improve the bottom-line performance for the lowest ranking product being sold, and it improves the user experience for anyone buying into this technology.

    So stuff it with your crappy argument.
  • Yojimbo - Monday, February 6, 2017 - link

    "
    Man I sure do hate tired old arguments that are structured like this.

    Yeah, and nobody _needs_ every car to have self-driving functionality. And nobody _needs_ phones with battery lifespans that exceed one day of usage. And nobody _needs_ monitors with high refresh rates and high color fidelity.

    But it sure as hell would improve the bottom-line performance for the lowest ranking product being sold, and it improves the user experience for anyone buying into this technology.

    So stuff it with your crappy argument."

    Games are not memory bandwidth bound. When I say they don't need it, it means they are not really helped by an increase in memory bandwidth. The "lowest ranking product" needs HBM 2 less than the top end product. GDDR5 is plenty fast enough to keep up with the number of compute cores in low end and mainstream graphics cards, even with AMD's more bandwidth-hungry architecture. If the memory subsystem is fast enough to keep up with the demands placed on it from the compute subsystem that's 100% of what you can get from it. I'm not sure where you expect this performance boost from increasing memory bandwidth to come from.
  • Yojimbo - Monday, February 6, 2017 - link

    I think maybe what you're missing is that there is always a cost-benefit analysis to any design decision. Increasing the manufacturing cost of a card reduces its competitiveness unless it gives some offsetting benefit. Suppose, just for illustrative purposes, it costs $20 more per card to have HBM2 memory instead of GDDR5. NVIDIA or AMD could spend that $20 on HBM2 or they could spend that $20 on increasing the number of compute cores on the card. If the resulting performance across a range of critical benchmarks favors spending the money on compute cores then that's what they should do. It's not that there is never a situation where more memory bandwidth could help, it's just that the benefit is smaller than what is achieved by the same use of resources in a different area. NVIDIA and AMD need to make the choices that result in the best price/performance ratio for the market segment they are targeting. An HBM2 configuration is not an optimum one for NVIDIA and is only an optimum one for AMD at the very high end.
  • Michael Bay - Tuesday, February 7, 2017 - link

    Self-driving car will carry you right into slavery.
  • Yojimbo - Tuesday, February 7, 2017 - link

    Yes, just like the self-cleaning oven.
  • sna1970 - Monday, February 6, 2017 - link

    why put HBM2 on a high end card ? simple , Card size ... you can have a Titan X card the length of 170mm instead of 280mm ...

    putting the Memory on GPU itself reduces the card size by 50%
  • Yojimbo - Monday, February 6, 2017 - link

    At a large increase in cost. Only worth it for very niche products. Since it demands its own manufacturing process (CoWoS) it hardly seems worth it.
  • Meteor2 - Monday, February 6, 2017 - link

    Yup, just the same as in the HEDT desktop world, with $1,700 Intel CPUs -- because they have no competition.
  • Dark_Complex - Sunday, February 5, 2017 - link

    Any plans to try and get a couple of these in to review? It would be interesting to see if the use of NVLink helps improve SLI scaling, particularly in games which normally have poor scaling.
  • ddriver - Sunday, February 5, 2017 - link

    You do realize this is not a gaming product? The chip is optimized for compute throughput so its efficiency in games will be subpar. I am not saying it will be slow, just that it is not going to be efficient, in metrics such as price or power to performance.

    They will release aн HBM2 solution that will do better in games at a fraction of the cost. Buying quadros for gaming is like buying a lambo to deliver mail.
  • Gigaplex - Sunday, February 5, 2017 - link

    The drivers will also be optimised for accuracy rather than performance, unlike the GeForce range of cards. Don't expect game profiles either.
  • Kevin G - Sunday, February 5, 2017 - link

    Generally speaking, you could use the Geforce drivers on Quadro cards in the past, it just isn't a supported configuration. (The reverse is not true so you can't use Quadro on Geforce cards.)
  • Kevin G - Sunday, February 5, 2017 - link

    While certainly not a gaming product due to its insane professional cost and professional level feature set, I'd wager that it could be used for gaming. I'm curious how much HBM helps on the nVidia side of things as well as how much impact 128 (presumably) ROPs would help at high resolutions. This could be a great 4K card or even make 5K feasible for some select games. If SLI is improved with nvLink, then certainly 5K becomes a possibility with two cards. Inquiring minds want to know.
  • Cygni - Monday, February 6, 2017 - link

    Even if its not a gaming product, who cares? It would be fun/interesting. Considering the complete lack of boxed Pascal Titans worldwide, it would also be interesting to see how it fares for the money is no option crowd.
  • Ej24 - Sunday, February 5, 2017 - link

    Maybe they'll make a gp100 consumer card as this generation's Titan Black or Z or something other than X. Then the 1080ti can step up to fill in the niche of the current Titan x(p) without stealing sales of the Titan brand as they'll have a higher tier Titan Z/black/gp100 card.
  • RaichuPls - Monday, February 6, 2017 - link

    There's literally no benefit to anybody if they launch a GP100 consumer card...
  • ddriver - Monday, February 6, 2017 - link

    There is literally no substantiation in that claim.
  • Achaios - Monday, February 6, 2017 - link

    QUOTE This is a notable distinction, as NVIDIA’s GPU production strategy has changed since the days of Kepler and Maxwell. No longer does NVIDIA’s biggest GPU pull triple-duty across consumer, workstations, and servers. UNQUOTE

    So, if I'm reading this right, this means that we will never see a 1080TI, i.e. a fully-enabled Pascal for consumers?
  • Qwertilot - Monday, February 6, 2017 - link

    No, you'll get that in the form of GP102, which is probably quite as fast as GP100 in games and a rather smaller, simpler and cheaper chip overall.

    They've split the product lines - compute is getting very important indeed nowadays.
  • TheinsanegamerN - Monday, February 6, 2017 - link

    What it means is that, instead of making a single mega chip for use across all markets, the market this level of quadro goes into is big enough to get it's own chip.

    You can still get the super mega chip for gaming, the titan xp. the GP100 is simply made for compute first, with no aspirations of being made into a gaming chip.
  • TheinsanegamerN - Monday, February 6, 2017 - link

    also keep in mind that this new quadro is the same size as the titan XP. the only difference is compute features.
  • Meteor2 - Monday, February 6, 2017 - link

    The Titan X might be as close to a fully-enabled Pascal care that we'll see; it could be that yields are so poor with big dies at 16nm there's very few dies coming out completely fault-free. Maybe those are the GP100s going into Tesla and now Quadro.
  • Gothmoth - Monday, February 6, 2017 - link

    i use 3d max and vray and i would love such a card.
  • ddriver - Monday, February 6, 2017 - link

    The question is what would you love more - this card, or having a full set of kidneys? :D
  • edzieba - Monday, February 6, 2017 - link

    IIRC this is the first confirmation that the GP100 die even HAS a direct graphical output capability.
  • Ryan Smith - Monday, February 6, 2017 - link

    Nah, we've known this since GP100 launched last year. NVIDIA was upfront that it had all the graphics bits as well; they just weren't being used on Tesla for obvious reasons.
  • Meteor2 - Tuesday, February 7, 2017 - link

    He said *output*.
  • Yojimbo - Tuesday, February 7, 2017 - link

    No, he said "output capability".
  • Yojimbo - Tuesday, February 7, 2017 - link

    And, he's talking about the GPU, the GP100, not about a card, the P100. The actual direct graphical output is handled by other components of the card (the physical connector, for instance). And he also said "confirmation". It seems pretty obvious that the Tesla P100, the only prior incarnation of the GP100, doesn't have any direct graphical output.
  • SharpEars - Monday, February 6, 2017 - link

    I read and read and then I saw the price tag >$5,000. Color me not at all interested...
  • Ninhalem - Monday, February 6, 2017 - link

    You're not the target consumer. I use ANSYS for thermal analysis and this card would be really nice for what I work on. When you're already looking at spending $15k on a 32 core workstation, and the software costs past $100k for 1 license, $5k is like a small stone.
  • Yojimbo - Monday, February 6, 2017 - link

    According to NVIDIA's website, the P6000, P5000, M6000, and M5000 do have ECC.

    http://www.nvidia.com/object/compare-quadro-gpus.h...
  • kleingordon - Monday, February 6, 2017 - link

    It seems like the Mezzanine version of the Tesla P100 isn't easily available (i.e., I can't put it in a Thinkmate custom-build rack server). Is the Quadrio GP100 the only way I could have NVlink in a rack server (I assume I can put it in one, even if it's geared towards workstations)?
  • Ktracho - Tuesday, February 7, 2017 - link

    The article states "While NVIDIA offers PCIe Tesla P100 cards, those cards only feature passive cooling and are designed for servers..." If your rack server has appropriate cooling and power, these should work, though I'm not sure if NVLink is available on these Tesla cards.
  • del42sa - Tuesday, February 7, 2017 - link

    128 ROP´s doesn´t fit well to 6 GPC´s, safe bet is 96 ROP´s
  • Yojimbo - Tuesday, February 7, 2017 - link

    Aren't the ROPs attached to the GPCs through a crossbar? They are paired with the Memory Controllers. If I am interpreting it the right way, HBM has 8 independent channels per stack and the Quadro GP100 presumably has 4 stacks. So there are 32 Memory Controllers. So the ROPs should be some multiple of 32. Therefore both 96 and 128 ROPs would be possible. Is there a reason the number of ROPs needs to be a simple ratio of the number of Raster Engines (GPCs)?
  • del42sa - Wednesday, February 8, 2017 - link

    I am not sure dude, didn´t AMD say that HBM controler is much simpler than GDDR5 ? Fiji has only 8 memory controller for 4 HBM stacks and I don´t see any reason why it should be so different with HBM2 .... So each controller must operate with 512-bit wide bus as one HBM stack has 1024bit IO. I am not sure how Nvidia would attach them to the ROP´s
  • del42sa - Wednesday, February 8, 2017 - link

    https://www.techpowerup.com/img/16-04-12/30a.jpg
    as you can see GP100 have 8 memory controllers for 4 HBM stacks. As for GPC´s , you never see odd numbers of ROP´s in corresponding GPC or decimal numbers of ROP´s. There is always even numbers of ROP ´s

    GP100 have 6 GPC´s, if you divide 128 ROPS by #6 you´ll get # 21,3 ROP´s per one GPC´s which looks really strange.

    So either way GP100 have 96 ROP´s as well as GP102 or 144 ROP´s , but never 128 ROP´s
  • Yojimbo - Wednesday, February 8, 2017 - link

    Yes, I haven't seen it. But I don't know why.
  • Yojimbo - Tuesday, February 28, 2017 - link

    Well, the 1080 Ti has been released now and it has 6 GPCs and 88 ROPs. It has a 352-bit memory bus compared with the 384-bit bus of the Pascal Titan X. The Pascal Titan X has 96 ROPs. 88 = (352/384)96, so the number of ROPs in this case seem to be tied to the memory subsystem and not to the number of GPCs, as 88 / 6 is not an integer.

    So in that light it seems possible that the Quadro GP100 has 128 ROPs.

Log in

Don't have an account? Sign up now