The Intel Optane SSD DC P4800X (375GB) Review: Testing 3D XPoint Performance
by Billy Tallis on April 20, 2017 12:00 PM EST3D XPoint Refresher
Intel's 3D XPoint memory technology is fundamentally very different from NAND flash. Intel has not clarified any more low-level details since their initial joint announcement with Micron of this technology, so our analysis from 2015 is still largely relevant. The industry consensus is that 3D XPoint is something along the lines of a phase change memory or conductive bridging resistive RAM, but we won't know for sure until third parties put 3D XPoint memory under an electron microscope.
Even without knowing the precise details, the high-level structure of 3D XPoint confers some significant advantages and disadvantages relative to NAND flash or DRAM. 3D XPoint can be read or written at the bit or word level, which greatly simplifies random access and wear leveling as compared to the multi-kB pages that NAND flash uses for read or program operations and the multi-MB blocks used for erase operations. Where DRAM requires a transistor for each memory cell, 3D XPoint isolates cells from each other by stacking them each in series with a diode-like selector. This frees up 3D XPoint to use a multi-layer structure, though not one that is as easy to manufacture as 3D NAND flash. This initial iteration of 3D XPoint uses just two layers and provides a per-die capacity of 128Gb, a step or two behind NAND flash but far ahead of the density of DRAM. 3D XPoint is currently storing just one bit per memory cell while today's NAND flash is mostly storing two or three bits per cell. Intel has indicated that the technology they are using, with sufficient R&D, can support more bits per cell to help raise density.
The general idea of a resistive memory cell paired with a selector and built at the intersections of word and bit lines is not unique to 3D XPoint memory. The term "crosspoint" has been used to describe several memory technologies with similar high-level architectures but different implementation details. As one Intel employee has explained, it is relatively easy to discover a material that exhibits hysteresis and thus has the potential to be used as a memory cell. The hard part is desiging a memory cell and selector that are fast, durable, and manufacturable at scale. The greatest value in Intel's 3D XPoint technology is not the high-level design but the specific materials and manufacturing methods that make it a practical invention. It has been noted by some analysts that the turning point for technologies such as 3D XPoint may very well be in the development in the selector itself, which is believed to be a Schottky diode or an ovonic selector.
In addition to the advantages that any resistive memory built on a crosspoint array can expect, Intel's 3D XPoint memory is supposed to offer substantially higher write endurance than NAND flash, and much lower read and write times. Intel has only quantified the low-level performance of 3D XPoint memory with rough order of magnitude comparisons against DRAM and NAND flash in general, so this test of the Optane SSD DC P4800X is the first chance to get some precise data. Unfortunately, we're only indirectly observing the capabilities of 3D XPoint, because the Optane SSD is still a PCIe SSD with a controller translating the block-oriented NVMe protocol and providing wear leveling.
The only other Optane product Intel has announced so far is another PCIe SSD, but on an entirely different scale: the Optane Memory product for consumers uses just one or two 3D XPoint chips and is intended to serve as a 32GB cache device accelerating access to a mechanical hard drive or slower SATA SSD. Next year Intel will start talking about putting 3D XPoint on DIMMs, and by then if not sooner we should have more low-level information about 3D XPoint technology.
117 Comments
View All Comments
Ninhalem - Thursday, April 20, 2017 - link
At last, this is the start of transitioning from hard drive/memory to just memory.ATC9001 - Thursday, April 20, 2017 - link
This is still significantly slower than RAM....maybe for some typical consumer workloads it can take over as an all in one storage solution, but for servers and power users, we'll still need RAM as we know it today...and the fastest "RAM" if you will is on die L1 cache...which has physical limits to it's speed and size based on speed of light!I can see SSD's going away depending on manufacturing costs but so many computers are shipping with spinning disks still I'd say it's well over a decade before we see SSD's become the replacement for all spinning disk consumer products.
Intel is pricing this right between SSD's and RAM which makes sense, I just hope this will help the industry start to drive down prices of SSD's!
DanNeely - Thursday, April 20, 2017 - link
Estimates from about 2 years back had the cost/GB price of SSDs undercutting that of HDDs in the early 2020's. AFAIK those were business as usual projections, but I wouldn't be surprised to see it happen a bit sooner as HDD makers pull the plug on R&D for the generation that would otherwise be overtaken due to sales projections falling below the minimums needed to justify the cost of bringing it to market with its useful lifespan cut significantly short.Guspaz - Saturday, April 22, 2017 - link
Hard drive storage cost has not changed significantly in at least half a decade, while ssd prices have continued to fall (albeit at a much slower rate than in the past). This bodes well for the crossover.Santoval - Tuesday, June 6, 2017 - link
Actually it has, unless you regard HDDs with double density at the same price every 2 - 2.5 years as not an actual falling cost. $ per GB is what matters, and that is falling steadily, for both HDDs and SSDs (although the latter have lately spiked in price due to flash shortage).bcronce - Thursday, April 20, 2017 - link
The latency specs include PCIe and controller overhead. Get rid of those by dropping this memory in a DIMM slot and it'll be much faster. Still not as fast as current memory, but it's going to be getting close. Normal system memory is in the range of 0.5us. 60us is getting very close.tuxRoller - Friday, April 21, 2017 - link
They also include context switching, isr (pretty board specific), and block layer abstraction overheads.ddriver - Friday, April 21, 2017 - link
PCIE latency is below 1 us. I don't see how subtracting less than 1 from 60 gets you anywhere near 0.5.All in all, if you want the best value for your money and the best performance, that money is better spent on 128 gigs of ecc memory.
Sure, xpoint is non volatile, but so what? It is not like servers run on the grid and reboot every time the power flickers LOL. Servers have at the very least several minutes of backup power before they shut down, which is more than enough to flush memory.
Despite intel's BS PR claims, this thing is tremendously slower than RAM, meaning that if you use it for working memory, it will massacre your performance. Also, working memory is much more write intensive, so you are looking at your money investment crapping out potentially in a matter of months. Whereas RAM will be much, much faster and work for years.
4 fast NVME SSDs will give you like 12 GB\s bandwidth, meaning that in the case of an imminent shutdown, you can flush and restore the entire content of those 128 gigs of ram in like 10 seconds or less. Totally acceptable trade-back for tremendously better performance and endurance.
There is only one single, very narrow niche where this purchase could make sense. Database usage, for databases with frequent low queue access. This is an extremely rare and atypical application scenario, probably less than 1/1000 in server use. Which is why this review doesn't feature any actual real life workloads, because it is impossible to make this product look good in anything other than synthetic benches. Especially if used as working memory rather than storage.
IntelUser2000 - Friday, April 21, 2017 - link
ddriver: Do you work for the memory industry? Or hold a stock in them? You have a personal gripe about the company that goes beyond logic.PCI Express latency is far higher than 1us. There are unavoidable costs of implementing a controller on the interface and there's also software related latency.
ddriver - Friday, April 21, 2017 - link
I have a personal gripe with lying. Which is what intel has been doing every since it announced hypetane. If you find having a problem with lying a problem with logic, I'd say logic ain't your strong point.Lying is also what you do. PCIE latency is around 0.5 us. We are talking PHY here. Controller and software overhead affect equally every communication protocol.
Xpoint will see only minuscule latency improvements from moving to dram slots. Even if PCIE has about 10 times the latency of dram, we are still talking ns, while xpoint is far slower in the realm of us. And it ain't no dram either, so the actual latency improvement will be nowhere nearly the approx 450 us.
It *could* however see significant bandwidth improvements, as the dram interface is much wider, however that will require significantly increased level of parallelism and a controller that can handle it, and clearly, the current one cannot even saturate a pcie x4 link. More bandwidth could help mitigate the high latency by masking it through buffering, but it will still come nowhere near to replacing dram without a tremendous performance hit.