Enterprise NVMe Round-Up 2: SK Hynix, Samsung, DapuStor and DERA
by Billy Tallis on February 14, 2020 11:15 AM ESTMixed Random Performance
Real-world storage workloads usually aren't pure reads or writes but a mix of both. It is completely impractical to test and graph the full range of possible mixed I/O workloads—varying the proportion of reads vs writes, sequential vs random and differing block sizes leads to far too many configurations. Instead, we're going to focus on just a few scenarios that are most commonly referred to by vendors, when they provide a mixed I/O performance specification at all. We tested a range of 4kB random read/write mixes at queue depth 32 (the maximum supported by SATA SSDs) and at QD 128 to better stress the NVMe SSDs. This gives us a good picture of the maximum throughput these drives can sustain for mixed random I/O, but in many cases the queue depth will be far higher than necessary, so we can't draw meaningful conclusions about latency from this test. This test uses 8 threads when testing at QD32, and 16 threads when testing at QD128. This spreads the work over many CPU cores, and for NVMe drives it also spreads the I/O across the drive's several queues.
The full range of read/write mixes is graphed below, but we'll primarily focus on the 70% read, 30% write case that is commonly quoted for mixed IO performance specs.
Queue Depth 32 | Queue Depth 128 |
A queue depth of 32 is only enough to saturate the slowest of these NVMe drives on a 70/30 mixed random workload. All of the high-end drives aren't being stressed enough. At QD128 we see a much wider spread of scores. The DERA and Memblaze 6.4TB drives have pulled past the Optane SSD for overall throughput, but the Samsung PM1725a can't come close to keeping up with them—its throughput is more on par with the DERA D5437 drives with relatively low overprovisioning. The high OP ratio on the DapuStor Haishen3 H3100 allows it to perform much better than any of the other drives with 8-channel controllers, and better than the Intel P4510 which has a 12-channel controller.
QD32 Power Efficiency in MB/s/W | QD32 Average Power in W | ||||||||
QD128 Power Efficiency in MB/s/W | QD128 Average Power in W |
The DapuStor Haishen3 H3100 is the main standout on the power efficiency charts: at QD32 it's the only flash-based NVMe SSD that's more efficient than both of the SATA SSDs, and at QD128 it's getting close to the Optane SSD's efficiency score. Also at QD128 the two fastest 6.4TB drives have pretty good efficiency scores, but still quite a ways behind the Optane SSD: 15-18W vs 10W for similar performance.
QD32 | QD128 |
Most of these drives have hit their power limit by the time the mix is up to about 30% writes. After that point, their performance steadily declines as the workload (and thus power budget) shift more toward slower more power hungry write operations. This is especially true at the higher queue depth. At QD32 things look quite different for the DERA D5457 and Memblaze PBlaze5 C916, because QD32 isn't enough to get close to their full read throughput and they're actually able to deliver higher throughput for writes than for reads. That's not quite true of the Samsung PM1725a because its steady-state random write speed is so much slower, but it does see a bit of an increase in throughput toward the end of the QD32 test run as it gets close to pure writes.
Aerospike Certification Tool
Aerospike is a high-performance NoSQL database designed for use with solid state storage. The developers of Aerospike provide the Aerospike Certification Tool (ACT), a benchmark that emulates the typical storage workload generated by the Aerospike database. This workload consists of a mix of large-block 128kB reads and writes, and small 1.5kB reads. When the ACT was initially released back in the early days of SATA SSDs, the baseline workload was defined to consist of 2000 reads per second and 1000 writes per second. A drive is considered to pass the test if it meets the following latency criteria:
- fewer than 5% of transactions exceed 1ms
- fewer than 1% of transactions exceed 8ms
- fewer than 0.1% of transactions exceed 64ms
Drives can be scored based on the highest throughput they can sustain while satisfying the latency QoS requirements. Scores are normalized relative to the baseline 1x workload, so a score of 50 indicates 100,000 reads per second and 50,000 writes per second. Since this test uses fixed IO rates, the queue depths experienced by each drive will depend on their latency, and can fluctuate during the test run if the drive slows down temporarily for a garbage collection cycle. The test will give up early if it detects the queue depths growing excessively, or if the large block IO threads can't keep up with the random reads.
We used the default settings for queue and thread counts and did not manually constrain the benchmark to a single NUMA node, so this test produced a total of 64 threads scheduled across all 72 virtual (36 physical) cores.
The usual runtime for ACT is 24 hours, which makes determining a drive's throughput limit a long process. For fast NVMe SSDs, this is far longer than necessary for drives to reach steady-state. In order to find the maximum rate at which a drive can pass the test, we start at an unsustainably high rate (at least 150x) and incrementally reduce the rate until the test can run for a full hour, and the decrease the rate further if necessary to get the drive under the latency limits.
The strict QoS requirements of this test keep a number of these drives from scoring as well as we would expect based on their throughput on our other tests. The biggest disappointment is the Samsung PM1725a that's barely any faster than their newer 983 DCT. The PM1725a has no problem with outliers above the 8ms or 64ms thresholds, but it cannot get 95% of the reads to complete in under 1ms until the workload slows way down. This suggests that it is not as good as newer SSDs at suspending writes in favor of handling a read request. The DapuStor Haishen3 SSDs also underperform relative to comparable drives, which is a surprise given that they offered pretty good QoS on some of the pure read or write tests.
The Memblaze PBlaze5 C916 is the fastest flash SSD in this bunch, but only scores 60% of what the Optane SSD gets. The DERA SSDs that also use 16-channel controllers are the next fastest, though the 8TB D5437 is substantially slower than the 4TB model.
Power Efficiency | Average Power in W |
Since the ACT test runs drives at the throughput where they offer good QoS rather than at their maximum throughput, the power draw from these drives isn't particularly high: the NVMe SSDs range from roughly 4-13 W. The top performers are also generally the most efficient drives on this test. Even though it is slower than expected, the DapuStor Haishen3 H3100 is the second most efficient flash SSD in this round-up, using just over half the power that the slightly faster Intel P4510 requires.
33 Comments
View All Comments
PaulHoule - Friday, February 14, 2020 - link
"The Samsung PM1725a is strictly speaking outdated, having been succeeded by a PM1725b with newer 3D NAND and a PM1735 with PCIe 4.0. But it's still a flagship model from the top SSD manufacturer, and we don't get to test those very often."Why? If you've got so much ink for DRAMless and other attempts to produce a drive with HDD costs and SSD performance (hopefully warning people away?) why can't you find some for flagship products from major manufacturers?
Billy Tallis - Friday, February 14, 2020 - link
The division of Samsung that manages the PM17xx products doesn't really do PR. We only got this drive to play with because MyDigitalDiscount wanted an independent review of the drive they're selling a few thousand of.The Samsung 983 DCT is managed by a different division than the PM983, and that's why we got to review the 983 DCT, 983 ZET, 883 DCT, and so on. But that division hasn't done a channel/retail version of Samsung's top of the line enterprise drive.
romrunning - Friday, February 14, 2020 - link
Too bad you don't get more samples of the enterprise stuff. I mean, you have both influencers, recommenders, and straight-up buyers of enterprise storage who read Anandtech.Billy Tallis - Friday, February 14, 2020 - link
Some of it is just that I haven't tried very hard to get more enterprise stuff. It worked okay for my schedule to spend 5 weeks straight testing enterprise drives because we didn't have many consumer drives launch over the winter. But during other times of the year, it's tough to justify the time investment of updating a test suite and re-testing a lot of drives. That's part of why this is a 4-vendor roundup instead of 4 separate reviews.Since this new test suite seems to be working out okay so far, I'll probably do a few more enterprise drives over the next few months. Kingston already sent me a server boot drive after CES, without even asking me. Kioxia has expressed interest in sampling me some stuff. A few vendors have said they expect to have XL-NAND drives real soon, so I need to hit up Samsung for some Z-NAND drives to retest and hopefully keep this time.
And I'll probably run some of these drives through the consumer test suite for kicks, and upload the results to Bench like I did for one of the PBlaze5s and some of the Samsung DCTs.
PandaBear - Friday, February 14, 2020 - link
ESSD firmware engineer here (and yes I have worked in one of the company above). Enterprise business are mostly selling to large system builder so Anandtech is not really "influence" or "recommend" for enterprise business. There are way more requirements than just 99.99 latency and throughput, and buyers tend to focus on the worst case scenarios than the peak best cases. Oh, pricing matters a lot. You need to be cheap enough to make it to the top 3-4 or else you lose a lot of businesses, even if you are qualified.RobJoy - Tuesday, February 18, 2020 - link
Well these are Intel owners here.Anything PCIe 4.0 has not even crossed their minds, and are patiently waiting for Intel to move their ass.
No chance in hell they dare going AMD Rome way even if it performs better and costs less.
romrunning - Friday, February 14, 2020 - link
This article makes my love of the P4800X even stronger! :) If only they could get the capacity higher and the pricing lower - true of all storage, though especially desired for Optane-based drives.curufinwewins - Friday, February 14, 2020 - link
100% agreed, it's such a paradigm shifter by comparison.eek2121 - Friday, February 14, 2020 - link
Next gen Optane is supposed to significantly raise both capacity and performance. Hopefully Intel is smart and prices their SSD based Optane solutions at a competitive price point.curufinwewins - Friday, February 14, 2020 - link
Ok, great stuff Billy! I know it wasn't really the focus of this review, but dang, I actually came out ludicrously impressed with how very small quantities of first gen optane on relatively low channel installments have such a radically different (and almost always in a good way) behavior to flash. Definitely looking forward to the next generation of this product.