Comments for The Reality of SSD Capacity: No-One Wants Over 16TB Per Drive

The Reality of SSD Capacity: No-One Wants Over 16TB Per Drive

by Ian Cutress on 3/13/2019 11:00 AM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

86 Comments

Back to Article

dgingeri - Wednesday, March 13, 2019 - link
I would love 4TB per drive at under $100 each.
nagi603 - Wednesday, March 13, 2019 - link
This. If SSDs would have a lower price point, perhaps only double, or - and I know I'm dreaming here - 50% extra over that of cheap HDDs, (which is ~$360 with tax, using local prices, as opposed to currently that being enough for 2TB quad-level cell SSD only) I'd get rid of my 8TB archive HDDs without a second thought. My NAS system already has redundancy, and with SSDs, it would be virtually inaudible.
coburn_c - Thursday, March 14, 2019 - link
If capacity stagnates flash could catch up and create actual competition for price per gigabyte.
TelstarTOS - Friday, March 15, 2019 - link
Bring 3-4TB ones to consumers.
philehidiot - Monday, March 18, 2019 - link
For me, I have four SSDs and two HDDs. The SSDs have been built up over the years and have got bigger and faster as time goes by, each new one becoming the root drive and the older ones moving to storage. For my use I think I've reached the peak in terms of noticeable performance people and going faster isn't going to yield tangible results. This opens me up to maintaining performance and increasing capacity so within my budget will be 1TB drives quite readily next time. I may try an M2 NVMe drive next time if the price is right and use another drive for backing up.

On a side note, my missus is trialling a mechanical keybaord next door and I can hear she's typing faster than I can. This SHALL NOT STAND.
philehidiot - Monday, March 18, 2019 - link
"performance people"????? I've clearly been drinking too much.
And yes, I recognise the irony of my other typos. Shut up.
abufrejoval - Tuesday, March 19, 2019 - link
After some reflection:
That headline is clickbait, and you probably noticed with the discussion it launched.

You have two distinct points, one for HDD another for SSD: That they coincide currently at around 16TB IMHO has little to do with that specific capacity.

For HDDs the recovery time loss could be a point, but it's related to bandwidth. If multi-active head technology becomes more prevalent, that "manageable risk" capacity point will jump with the sequential bandwidth.

With SSDs you need to keep IOPS and capacity in balance, which means you'd have to put in more channels and switching fabrics inside the SSD to keep performance in line, either on-chip (bigger chips) or with multi-chip. Since these chips are little processing--lots of I/O they won't shrink well, so there is no pricing benefit. And if you go multi-chip, there are good reasons to go modular, which proves your point.

So I guess we'll see some effort put into making the modular design less expensive in terms of connectors and perhaps we'll see some fixed aggregates (modular design, but fixed assembly to save on interconnector cost and reliability downturn).

So I guess you're raising a valid point, but could have said it better :-)
PeachNCream - Wednesday, March 13, 2019 - link
While formatting a 1GB hard drive in a mom and pop computer shop in the mid-1990s, all of the technicians gathered around to watch it, remarking to one another that no one would ever need that much capacity. In 2017, I got hassled for buying a new smartphone with only 1GB of RAM.

I think it's only a matter of time until there will be a demand for that sort of density and I think that NAND has enough legs to still be around when that reality arrives...relatively soon.
willis936 - Wednesday, March 13, 2019 - link
No one is saying they don't want more storage. They're saying they don't want more than a certain amount of storage if it's shelf life is below a certain threshold. I mean yeah if you take it to the logical extreme where SSDs are literally free then you could probably come up with a cold storage system based around replacing the SSDs relatively often, but that's a ridiculous premise. Why generate all of that waste? Why not just choose a more suitable technology?
Eliadbu - Thursday, March 14, 2019 - link
They don't want larger drives since they don't want the risk of long server downtime when they need to recover it incase of faulty drive. If recovery process would have been faster and less risky in terms of server down time then yes they would have got higher capacity drives.
rahvin - Wednesday, March 13, 2019 - link
Just remember, there is always someone older out there.

The first hard drive I ever purchased was 80 megabytes, top of line at the time, around $380 and one of the first IDE drive they produced IIRC.
PeachNCream - Wednesday, March 13, 2019 - link
I'm not sure why that's worth bragging about, but that age of yours is catching up with you if you read that a 1GB drive was my first and not just one I happened to be working at installing in a customer's new PC. It's okay though, Anandtech's comments section is indeed a competitive zone in which it is necessary to bash our nerd junk against one another. I feel no need to compete with you on personal experiences, but as it stands, my history with computing by no means started in the mid-90s. I just happened to be a part owner of a computer store at that point. The rest you can guess about or make up with whatever you feel makes you sleep better at night.
Opencg - Wednesday, March 13, 2019 - link
well my e penis remembers when 1kb was huge. world war 2. suck it
PeachNCream - Wednesday, March 13, 2019 - link
I'm horrified and impressed at the same time because my epeen clearly doesn't measure up at this point.
sorten - Thursday, March 14, 2019 - link
What about WWII? If you've been "computing" since WWII and you were 15 years old at the end of that war, that means you're 89 +- 1 years old. Do you need a nap?
bigvlada - Thursday, March 14, 2019 - link
That's what you could get in Europe in 1980/1981. Sinclair ZX 81 had one kilobyte of ram (and the most horrible keyboard in this part of the galaxy) and cost somewhere around 70 British pounds at the time. There were programs and games for that machine but the process usually consisted of user having to retype the whole program from a book or magazine (hard level mode, the magazine was in German :) ) before storing them on an audio tape. The great 8-bit rivalry (Sinclair ZX Spectrum vs Commodore C-64) was still at least a year away.
Anonymous Blowhard - Thursday, March 14, 2019 - link
My first hard drive was a 10MB MFM unit on an IBM XT.

Now get off my lawn.
FunBunny2 - Thursday, March 14, 2019 - link
"My first hard drive was a 10MB MFM unit on an IBM XT."

if memory serves, that thingee was about the size of a artisan loaf of bread, and weighed about 10 lbs. and my lawn, too.
abufrejoval - Thursday, March 14, 2019 - link
No, much smaller than an 8" floppy disk drive.

They actually were the same size as a 5 1/4" floppy in the PC. I worked on one which was an upgraded PC (no XT), so it only had a 5MB Winchester drive.

It also eventually had one of these really impressive Hercules graphics cards, super-high resolution 720x400 monochrome on these wonderful green and slow phosphor maskless tube displays that were so easy on the eyes.
abufrejoval - Thursday, March 14, 2019 - link
I think this site is more about starting early than being old.

That said, I regularly swapped 5MB RK05 and 10MB DL10 disks on a PDP-11/34 with 64KB magnetic core memory we eventually upgraded to 256MB of this fancy new DRAM, which unfortunately lost its content when you switched off the power. That meant we either had to leave it on during the night or "reboot" in the morning. Before we'd just put the box on powerless-standby and resume next morning.

We called all hard disks Winchester drives btw. and only swapped the media not the head assembly.

And transporting data meant riding trains with reel tapes in your backpack, because punched cards were too bulky.

That job paid for my first own computer, an Apple ][, and the last Apple product I ever purchased (well, actually it was a clone, had 48KB of RAM and lower case letters!). It also featured removable media 140k per disk and side (you could cut out a write-protect notch and then also use the reverse side).

But when I benchmarked my first FusionIO SSD in January 2008 my eyes glazed over almost like they did when we compared access times of fixed head drums against moving head disks.

Since I still have to work for one or two decades I cannot afford to be old.
Null666666 - Thursday, March 14, 2019 - link
First was 20m, but they made 40m at the time.
piroroadkill - Thursday, March 14, 2019 - link
The problem is rebuild times, plain and simple. If capacity rises but transfer speeds don't increase accordingly, your rebuild times grow and grow. Longer rebuild times equals unacceptable risk.
Null666666 - Thursday, March 14, 2019 - link
.m2 baby!

PCI 4 is due out soon.
lmcd - Friday, March 15, 2019 - link
Controllers also need more channels + more internal bandwidth.
goatfajitas - Wednesday, March 13, 2019 - link
Data centers and even the smallest of companies servers use RAID5 or higher on any volume that is storing data. This means a set of data exists even if any drive fails. As long as the drives are not more prone to failure than any other older drive, there is no reason to not want larger drives.
Death666Angel - Wednesday, March 13, 2019 - link
I don't remember who posted the article (it wasn't on a site I usually frequent), but he argued that RAID5 is basically dead in any larger scale deployment, even in small server scale applications. This is due to HDD capacities being so large and recalculating the data from the parity bits is so slow. It argued (with some math I don't remember, not even mechanical failure, but just bit errors on a 10^12bit scale or somesuch) that the chances of error in the array while restoring the array from one failed drive are so large after a certain capacity, that it is not a good idea to have a RAID5 array after a certain point of size, which we either are already at or approaching very soon (this article is a few years old I think and it spoke of it happening in the near future).
blakeatwork - Wednesday, March 13, 2019 - link
There was an old article from ZDNet in 2007, that proclaimed the death of RAID-5 on drives greater than 1TB (https://www.zdnet.com/article/why-raid-5-stops-wor... they had a follow-up in 2016 that said they weren't wrong, but that manufacturers upped their URE specs on *some* drives (https://www.zdnet.com/article/why-raid-5-still-wor...
lightningz71 - Wednesday, March 13, 2019 - link
Which is why RAID 6 exists. Two parity drives reduces the chance of data loss to below RAID-5 levels for any but the most extreme of array sizes. It is, of course, less power, cost, and space efficient as RAID-5, but, you have to pay for redundancy somewhere.
ken.c - Wednesday, March 13, 2019 - link
That's why real big storage uses some other kind of algorithm for data protection, be that 3x mirroring (see HDFS and other "cloud scale" storage) or reed-solomon forward error correction (Isilon or Qumulo) or some variant thereof. No one uses RAID5 or 6 alone at any sort of performance scale. Even when they are used, they're striped across, for example in a ZFS pool or Lustre or GPFS setup.

I see very little risk in using larger SSDs in that sort of configuration.

The author's contention that Netflix would have to go back to cold storage if they lose a single SSD is ludicrous.
erple2 - Wednesday, March 20, 2019 - link
First of all Netflix can't lose any single SSD - they don't own any of their storage. They buy all of it from AWS. Netflix is completely run in AWS land these days. As to how AWS manages that, I suspect "poop-tons of redundancy" is the way they do it. Well, except for those S3 buckets in us-east, that is...
Karatedog - Wednesday, March 13, 2019 - link
And this is why RAID Z3 exist, where you use 3 parity drive. We had a mail system, where if a 4 TB HDD died, rebuilding the RAID volume took 25 days under usual load (and not 4,44 hours, come on ppl:). The chance that the other 2 parity disks die in that 25 days is extremely rare.
cfenton - Wednesday, March 13, 2019 - link
There's a very good reason. The larger the array, the longer the resilvering time. Even if you don't lose any data, you're array is still going to be nearly useless while it's resilvering.
CaedenV - Wednesday, March 13, 2019 - link
Really? Nearly useless? The used IO of a storage array is typically limited by the fabric or network connection of the end users. Sure, in some rare occasions this becomes an issue, but typically the network connection is so ridiculously slow compared to the speed of the drives, that the performance hit of the silvering process would not even be noticed. These aren't HDDs where they are limited to mere hundreds of iops, these are SSDs, and multiple SSDs, each having hundreds of thousands of iops.
Performance hit? yes. Useless? hardly.
The only thing I see here is that they are expecting high failure rates on these large SSDs. Be that the nature of QLC flash, or hammering the SSDs with minimal GC over time, etc. They are essentially saying that they are expecting down time, and potentially enough failures that they would expect something more than normal RAID protections can guard against.

I mean... its not like they are clamoring over 16B HDDs as an alternative for hot storage compared to a 16TB SSD.
angry_spudger - Wednesday, March 13, 2019 - link
Degraded, yes, nearly useless? no.
Karatedog - Thursday, March 14, 2019 - link
Nitpicking: resilvering is for RAID 1, that's mirror, as classic bathroom mirrors were fixed by resilvering their backside. Other RAID configurations (5,6, Z) are 'rebuilt'.
Sry, old habit 😃
Joseph Christopher Placak - Wednesday, March 13, 2019 - link
The answer to RAID 5 or RAID 6 is to use RAID 10. It is super fast because of striping and each striped drive is mirrored and there are no parity calculations.
brshoemak - Wednesday, March 13, 2019 - link
RAID10 is superior to RAID5/6 in many ways, but keep in mind that with RAID10 you lose half your gross drive array capacity to mirroring of the stripes. RAID5 you only lose the capacity of one drive or two drives in RAID6 to parity. It's a balance of cost/performance/stability - pick two.

That being said, I wouldn't use RAID5 or RAID6 with the current multi-TB drive sizes unless I didn't plan on accessing the data often and had both an onsite and offsite backup of the data. Even then there would be an 'ick' factor involved.
asmian - Saturday, March 16, 2019 - link
NO! That is no answer at all. RAID10 is not superior in any way. It might be faster, because of no parity calculations, but it is vastly less secure. Quite apart from the hideously expensive waste of disk space duplicating everything without any parity beyond a four-disk array, it's a poor consumer-grade solution for people who don't really understand redundancy and just think faster is better. Raid10 is just a glorified RAID0 (striping) which is the most insecure way to store data as any single drive failure is fatal - striping is not even RAID, really, as none of the disks are actually redundant as per the acronym, but are all essential.

Assume one disk goes down in your RAID10. One "side" of the array is now compromised. The other side takes over duty, and the array rebuilds on a hot spare. But while it is rebuilding, the SAME disk on the other side fails, or just has a silent bit error incident. Either you lost the same disk on both sides now, which are parts of a stripe so you just lost the ENTIRE array, both sides failing catastrophically, or if you are very lucky you just rebuilt the array with a few bit errors in, compromising your data. If it was just stray bit errors then you probably won't notice or be notified that you need to restore the compromised files from backup (if you have one...) so the next time you backup you'll destroy the archived good version of the files with the compromised ones made during the rebuild. What an EXCELLENT outcome.

Remember too that random bit errors in a RAID10 or a RAID5 cannot be rectified. If one side of your RAID10 suddenly starts throwing bit errors, then the only way to confirm the data is to check the other side - but how can the controller know which holds the correct data now, and which is wrong? All it can tell is that they are different. The same is true with a single parity drive - is the error in one of the array drives or in the stored parity calculation? With a second parity or comparison the erroring drive can be uniquely identified and reported as bad.

With RAID6 you have TWO parity drives. In the unlikely event that a drive fails or errors during a rebuild, the parity calculations on the second drive will kick in and ensure that your array doesn't fail. That second parity also ensures that there is no risk of storing bit errors during a rebuild, since there is still a parity to check against. You have a second line of defence that simply isn't there in RAID5 or 10 to protect you while your array rebuilds. It might be slower, but RAID6 is vastly superior.

You might think that those second failure chances are unlikely, but the main problem is that the bigger the individual drives get in your array, the more chance there is of a bit error occurring while rebuilding, which is the worst time for it to happen. Having a second parity will protect you. Using drives that are actually designed for RAID environments (WD RE-class and similar) helps too, as they are guaranteed to have significantly better bit error rates than cheap consumer drives, by an order of magnitude.
ballsystemlord - Monday, March 18, 2019 - link
So they aught to use RAID 60.
It's doable and the amount of data that we use is scheduled to increase dramatically in the coming years without a similar increase, even if you have oodles of moo-la, in network BW (The cables are the limitation here and most of us are effected and doomed), RAM BW, HDD/SSD BW, and GPU compute power (not to mention the GPU's total RAM).
Nvidia has increased the size of their dies to humongous levels and even if AMD follows suite, it can't grow much bigger in an economical sense. Likewise with CPU's, but in the case of their core to core and core to RAM BW. RAM is getting faster, but that's only if you go to the non-spec modules. RAM latency is not decreasing and DDR5 is supposed to include solid state storage which recreates the need for redundancy as RAM modules fail due to said storage. HDD and SSD BW appear to be plateauing, although I'm really happy that HDD manufactures are taking my long time idea and using multiple heads on at least some of the HDDs. I'm confused as to why they did not do that before.
Not that I'm trying to paint a depressing picture, mind, but there are a lot of bottle necks in the current designs and actual technologies that need to be overcome.
malcolmh - Wednesday, March 13, 2019 - link
Well the enterprise market may be satisfied, but consumer drives are still highly capacity restricted. I find it pretty astonishing that capacities as small as 256Gb, and even 128Gb, are still mainstream consumer offerings for desktop internal and external drives, given modern data use.

At the upper end, consumer SSDs effectively top out at 1Tb (a few 2Tb may be available, but will cost you more than 8Tb of spinning rust). Capacity and price/Gb still has a long way to go in the consumer market.
zepi - Wednesday, March 13, 2019 - link
I don't find small drives surprising at all. There are many many people who don't save anything locally. They rely solely on dropbox / onedrive / google drive and are happy that their files are available on every device they own.
Scott_T - Wednesday, March 13, 2019 - link
definitely, every home user's computer I've worked on would be fine with a 256gb ssd (about $30 now!) and with everyone streaming video and music I dont see that changing.
erple2 - Wednesday, March 20, 2019 - link
The only thing that this fails for is "gaming" - though if gaming services start streaming, then it's possible that large drives become even less useful.
CaedenV - Wednesday, March 13, 2019 - link
yep... 128GB is a flash drive or SD card these days. I am not understanding people who build with 128GB SSDs and then a 2TB HDD and then expecting users to figure out how to redirect folders, or otherwise use the larger space. They just fill up the SSD and then wonder why on earth they can't install the next game.
I am a data hog on my NAS, but on my local system I feel like I am a fairly light user, and even I need 256GB of space at minimum just for windows, office, and a few games.
mitsuhashi - Wednesday, March 13, 2019 - link
You're misusing B vs b!
**triggered**
bloodgain - Wednesday, March 13, 2019 - link
Baloney. It's a matter of cost, not capacity. If the price of SSDs drops significantly -- maybe by half, definitely by 75% -- then you simply switch to redundant storage and rebuild time becomes a non-issue, as there is no downtime for the rebuild, no matter how long it takes. If data centers could by a 1 PB enterprise-class SSD for $25K, they'd order them by the pallet-load.
deil - Wednesday, March 13, 2019 - link
Problem is price of 10x4TB ssd for 16 TB one, does not mean it will survive 10xlonger, and you still need to raid them. That size already goes into storage capacity, and those are dominated by cheaper HDD's.
They don't say that 32TB SSD's don't have a reason to exist, they say that its stopping to be cost/effective. Speed is reason for going to SSD and you don't need it as big as cold storage/size does not mean faster access/copy.

And nobody sane would loose 16 TB as anyone who have 16 TB of important stuff raids their storage.
raid's can wait for replication, there is no need to recover swiftly. -> hdd's wins
TrevorH - Wednesday, March 13, 2019 - link
> no-one wants to rebuild that large a drive at 500 MB/s, which would take a minimum of 4.44 hours, bringing server uptime down to 99.95% rather than the 99.999% metric

Who rebuilds a drive while a server is down?
jordanclock - Wednesday, March 13, 2019 - link
You rebuild on live production servers?
afidel - Wednesday, March 13, 2019 - link
Absolutely, if you had to take down a server or worse SAN array every time a drive died no data center would ever get anything done. Mine was ~500 drives, with a 1.5% AFR that's 2 drives a month lost. You're not stopping operations 2x a month to wait on the RAID rebuild.
PeachNCream - Thursday, March 14, 2019 - link
Yup, you do not take the SAN down when you need to rebuild one failed drive. I do agree that the market for 8+TB SATA drives is small, but it hasn't much to do with rebuild time. There are other factors at play here that were overlooked.
abufrejoval - Thursday, March 14, 2019 - link
Yup, SANs will rebuild from standby drives and live rebuilding is why you pay for smart RAID controllers even at home.

I think rebuilding my home-lab RAID6 after upgrading from 6 to 8 4TB drives actually took longer than copying the content over a 10Gbit network (more than two days). But I slept soundly while that was going on because it primary data wasn't at risk and I also had a full backup on another RAID.
Kevin G - Wednesday, March 13, 2019 - link
Storage and networking are relatively flexible nowadays.

I also challenge the idea that arrays being rebuilt at 500 MByte/s: NVMe is here and a good NVMe RAID controller should be able to rebuild at close to 3 GByte/s. Granted that would shift the ~5 hour of down time to ~50 minutes, still beyond what 99.999% uptime would require. The rest this can be controlled by better storage policies as say leverage multiple smaller RAID5/6 arrays the sit behind a logical JBOD instead of a single larger units. IE six RAID5 arrays leveraging six drives a piece instead of one massive 36 drive array. Beyond that, leverage system level redundancy so that requests that would normally be services are then directed to a fully functional mirror. Granted load would not be even between the systems but externally there would be no apparent drop of service and a slight dip in performance for a subset of accesses from normal. The end result would be no measurable downtime.

Data redundancy is mostly a solved problem. The catch is that the solutions are not cheap which I see price being the bigger factor in maintaining proper redundancy.
zjessez - Wednesday, March 13, 2019 - link
Enterprises are now using dual port SAS 12Gbps SSD's and they will rebuild a lot faster than 500MBps. In addition many vendors are adopting NVMe and that would rebuild even faster. The reality is that it is a cost issue with regards to high-capacity SSD's and second it is a bandwidth issue in terms of rebuild times. However there are also considerations in how the drive manages a rebuild, is it a Raid-5 or a Raid-6 implementation which will be slower, or is it distributed raid that can rebuild in parallel or erasure coding based on Reed Solomon. These are all factors that come in to play in terms of rebuild speeds and degradation of performance during such rebuild.
jhh - Wednesday, March 13, 2019 - link
One of the other issues with extremely high capacity drives is that the I/O interface doesn't scale with capacity. If the data is so infrequently accessed, that a higher speed interface isn't needed, why not just use cheaper HDD storage? Latency to that infrequently accessed data is about the only downside. A similar issue arrives with disaggregated storage, that even 100 Gbps Ethernet becomes a bottleneck for a system with many NVME drives.
PixyMisa - Wednesday, March 13, 2019 - link
True, but with NVMe, and with PCIe 4.0 and 5.0 on the way, we do have a lot of bandwidth. A 16TB PCIe 5.0 x4 drive would take less than 20 minutes to read.
jordanclock - Wednesday, March 13, 2019 - link
I love that Ian talked to people in actual data center industries, reports that those same people all said 16TB is the effective limit based on all of the factors, and all the comments are saying they're wrong.

I'm seeing a lot of armchair engineering in these comments and not a lot of citation of sources or even experiences.
rpg1966 - Thursday, March 14, 2019 - link
People are questioning it because he hasn't explained why *any*-sized drive would be an issue given proper data management techniques.
Fujikoma - Thursday, March 14, 2019 - link
He stated that he talked to a few individuals. He should have included the size of the data centers the drive were intended for, that these people ran, as a matter of perspective. Myself, I could see drives larger than 16TB for small businesses that use the data internally and could rebuild on a weekend.
ksec - Wednesday, March 13, 2019 - link
I think I need to dig up some info on this because I don't quite understand it. Any players that are using 16TB per "Ruler" will likely have huge redundancy built into their system. Not just consumer grade RAID, but something likely much more sophisticated than ZFS Cluster or Black Blaze's Reed-Solomon.

And with these drives going up to even 8GB/s, I don't see how 16TB will be a factor. If anything it will be the cost per GB that will be the issue. Assuming 8TB / Ruler has the same NAND reliability of 16TB.

If anything it will likely be the speed of Network that is the handycap.
ksec - Saturday, March 16, 2019 - link
This a reply to my future self in case I need to look things up again. Ian made a reply in Reddit ( why not here ) about the Data being too much to lose and risk were too large for Cold, Warm or even Hot Data.

I think we fundamentally agree, except everyone were thinking of different use case. Those vendor were likely selling to customer hosting within dozen of PB data. That is anywhere between 1.6% to 0.16% of their Data. Which is quite a lot. But I was thinking in the case of Facebook, Google And Amazon Scale. 1EB, 1000PB, that is 0.0016% of their data. I doubt they would have thought 32TB would be too much, likely at their scale they would even want 100TB.
thingreenveil313 - Wednesday, March 13, 2019 - link
How does a live rebuild on an array with hot storage increase downtime? This doesn't make any sense. No one with sense takes down a prod device to do a rebuild when the point of drive pools is to keep devices online through disk failures.
ksec - Thursday, March 14, 2019 - link
Exactly. I would have yawned if it was coming from any other media, but coming from Anandtech I suspect there are uses cases that I don't know or understand. I wish someone could chime in on this.
boozed - Wednesday, March 13, 2019 - link
"if you’re willing to entertain an exotic implementation"
And an exotic price!
farmergann - Wednesday, March 13, 2019 - link
Prices and IO bottlenecks of larger drives are the sole limiting factors to greater adoption.

Drive failure has basically nothing to do with this conversation and density means squat diddly given the minuscule scaling achieved.

I can't even bring myself to address the uptime silliness mentioned by this author. Jimeny, please edit that away...
Theorem21 - Wednesday, March 13, 2019 - link
Seriously, stop using RAID. Use ZFS. Rebuild times, recovery + management are all vastly different and far better.
PixyMisa - Wednesday, March 13, 2019 - link
I started using ZFS last year. It's magical.
abufrejoval - Thursday, March 14, 2019 - link
ZFS is a redundant array of inexpensive disks, too, just smarter software to operate it.

And I run oVirt/GlusterFS for RAIS (redundant array of inexpensive servers).
piroroadkill - Monday, March 18, 2019 - link
Your comparison literally makes no sense. One is a name given to a bunch of different redundancy schemes, one is a file system. They can't be compared like that. If you're comparing RAIDZ in ZFS to other types of RAID, then what you're saying starts to make sense, but you're still using a type of RAID.
32TB - Thursday, March 14, 2019 - link
What are you talking about? I want over 16 TB per drive. F this fake news author!
Hgp123 - Thursday, March 14, 2019 - link
Can someone explain why a failed drive would cause downtime? I understand that a failed drive needs to be rebuilt but doesn't a hotswap system prevent downtime? I don't understand why a system would ever need to go down to replace a drive when I've got a dinky HP server that allows me to swap out drives and rebuild while the OS is running.
PeachNCream - Thursday, March 14, 2019 - link
It doesn't cause downtime. An array can be rebuilt while remaining in production. Of course, there will be a performance impact as the rebuild is happening. Part of the point of using a fault-tolerant storage array is to (buckle in for this because it's going to be absolutely shocking) continue operations in spite of a fault.
SzymonM - Thursday, March 14, 2019 - link
Downtime for rebuild? C'mon you have RAID controllers with customizable rebuild priority. I'd love 16TB or 32TB SSD drives for my Gluster nodes, because larger drives == less nodes == lower cost of DC presence (rack space, cooling, power, cost of the rest of the server). BTW Gluster also has customizable resilvering policy for replicated volumes. The only problem is 15TB Samsung drives are pricey as hell.
abufrejoval - Thursday, March 14, 2019 - link
Actually I don't mind more Gluster nodes as long as the fabric can manage the additional bandwidth.

And with redundancy managed via Gluster I am considering to lower the redundancy within the boxes at least for SSD: Never liked the write amplification of hardware RAID controllers with their small buffers and HDD legacy brains.

Still ZFS below Gluster for the "new tape" on HDDs.
kawmic - Thursday, March 14, 2019 - link
4tb @ $100 sounds reasonable. I would pay that.
plsbugmenot - Thursday, March 14, 2019 - link
I want one. Apparently that makes me a "no one". Thanks.
urbanman2004 - Thursday, March 14, 2019 - link
I'm pretty satisfied w/ both my 8TB and 10TB high capacity drive, but does it scare anyone that helium is being to compensate for filling the void of higher capacity drives?
abufrejoval - Thursday, March 14, 2019 - link
Yes. As I understand it you cannot stop Helium from leaking only compensate for expected life time. I tend to have higher expectations...
Null666666 - Thursday, March 14, 2019 - link
Ah...

I would love >128g... I work in rather large data, have for years.

Mange risk by replication.

Never had much trust in "...never need more then".
peevee - Thursday, March 14, 2019 - link
PCIe5 will allow for 4x speed. Controllers will probably catch up eventually. Hence 4x capacity at the same rebuild time.

Now... who the hell needs PBs of capacity in 1U but Google and Facebook? Lots and lots of unused data nobody is going to access - what kind of connection to that rack are you going to have?
occasional - Thursday, March 14, 2019 - link
No-one wants over 16TB ... FOR NOW
Tech moves so fast.
Before you know it, we'll have over 200TB SSDs as normal.
Xajel - Monday, March 18, 2019 - link
I would, if cost was low to justify having 2-3 drives for backups.
meh130 - Friday, March 22, 2019 - link
I think the larger SSDs have a strong use case in RAID arrays. There are several reasons for this. First, NVMe attached SSDs present a complicated way to add capacity compared to SAS conntected SSDs. SAS connected SSDs are easy to scale by adding SAS connected drive enclosures, the same enclosures currently used for 10K SAS HDDs. NVMe drive enclosures would need to be attached either using PCIe cables or NVMe over Fabrics technology. All of the array vendors have settled on the latter approach, and it requires the drive enclosure to be a full-blown array with dual NVMe over Fabrics controllers fronting the SSDs. This significantly increases costs compared to an array with local NVMe attached drives (which only requires additional PCIe switch ASICs to fan out the PCIe connections to the local drives).

The result is an array with 24 local 16TB NVMe SSDs is much less expensive than an array with 24 local 8TB NVMe SSDs and 24 more 8TB NVMe SSDs in an NVMe over fabrics external enclosure.

Rebuild times are a consideration, but declustered RAID technology can reduce that, as can triple parity RAID.
vedru - Tuesday, October 29, 2019 - link
I believe this article is a little conservative and does not cover the real reasons behind the slow uptake of larger SSD drives or the possibilities there. Currently the larger mass produced SSD-s are Samsung PM1643 that go up to 30.72TB size. The issue is the price of the larger drives is still relatively high. However rebuilding a RAID with 30TB SSD-s will likely take less time than rebuilding same RAID with 12-16TB HDD-s, so the risk during rebuild is smaller with the SSD-s.
Also with HDD-s the drive failures are more common and the overall run time is lower where SSD-s are limited by number of writes. Now with the chip development advancing there are already storage's available that do asymmetric load on the SSD drives to ensure they do not get to end of write cycles at same time. Also SSD drives are composed of a number of chips that with advancement of firmware can be managed so that if one chip in a drive in the storage fails most or all of the data can still be read and copied to a replacement drive in the array, not requiring a full rebuild. This would alleviate the fear of data loss in larger drives. I have no doubt that once the price parity is reached between HDD-s and SSD-s, much larger SSD-s will become common use and current cold storage will be replaced by very large SSD-s. It is just waiting for prices to be able to compete with HDD-s.
Bellfazair - Wednesday, September 30, 2020 - link
These kind of articles are pointless. in 20 years 16TB will be the new 16GB. 30 years ago my father paid $900 for a 40MB drive and said "No one will ever use this much space". The reality is if the price was low enough who wouldn't want 16TB in their laptop or PC? Hell I'd like a 16TB drive in the new PS5. Given enough time we will laugh at the absurdity that some one actually thought we wouldn't want a drive larger than 16TB. Just as people think the the president of IBM from 1914 to 1956, Watson was absurd when he said he thought there was a world market "for maybe five computers".
Jake5554 - Sunday, November 28, 2021 - link
Wrong. I have been waiting for an internal 16TB to come out for the last year or so, and I am getting impatient.

The Reality of SSD Capacity: No-One Wants Over 16TB Per Drive

Post Your Comment

86 Comments

Back to Article

dgingeri - Wednesday, March 13, 2019 - link

nagi603 - Wednesday, March 13, 2019 - link

coburn_c - Thursday, March 14, 2019 - link

TelstarTOS - Friday, March 15, 2019 - link

philehidiot - Monday, March 18, 2019 - link

philehidiot - Monday, March 18, 2019 - link

abufrejoval - Tuesday, March 19, 2019 - link

PeachNCream - Wednesday, March 13, 2019 - link

willis936 - Wednesday, March 13, 2019 - link

Eliadbu - Thursday, March 14, 2019 - link

rahvin - Wednesday, March 13, 2019 - link

PeachNCream - Wednesday, March 13, 2019 - link

Opencg - Wednesday, March 13, 2019 - link

PeachNCream - Wednesday, March 13, 2019 - link

sorten - Thursday, March 14, 2019 - link

bigvlada - Thursday, March 14, 2019 - link

Anonymous Blowhard - Thursday, March 14, 2019 - link

FunBunny2 - Thursday, March 14, 2019 - link

abufrejoval - Thursday, March 14, 2019 - link

abufrejoval - Thursday, March 14, 2019 - link

Null666666 - Thursday, March 14, 2019 - link

piroroadkill - Thursday, March 14, 2019 - link

Null666666 - Thursday, March 14, 2019 - link

lmcd - Friday, March 15, 2019 - link

goatfajitas - Wednesday, March 13, 2019 - link

Death666Angel - Wednesday, March 13, 2019 - link

blakeatwork - Wednesday, March 13, 2019 - link

lightningz71 - Wednesday, March 13, 2019 - link

ken.c - Wednesday, March 13, 2019 - link

erple2 - Wednesday, March 20, 2019 - link

Karatedog - Wednesday, March 13, 2019 - link

cfenton - Wednesday, March 13, 2019 - link

CaedenV - Wednesday, March 13, 2019 - link

angry_spudger - Wednesday, March 13, 2019 - link

Karatedog - Thursday, March 14, 2019 - link

Joseph Christopher Placak - Wednesday, March 13, 2019 - link

brshoemak - Wednesday, March 13, 2019 - link

asmian - Saturday, March 16, 2019 - link

ballsystemlord - Monday, March 18, 2019 - link

malcolmh - Wednesday, March 13, 2019 - link

zepi - Wednesday, March 13, 2019 - link

Scott_T - Wednesday, March 13, 2019 - link

erple2 - Wednesday, March 20, 2019 - link

CaedenV - Wednesday, March 13, 2019 - link

mitsuhashi - Wednesday, March 13, 2019 - link

bloodgain - Wednesday, March 13, 2019 - link

deil - Wednesday, March 13, 2019 - link

TrevorH - Wednesday, March 13, 2019 - link

jordanclock - Wednesday, March 13, 2019 - link

afidel - Wednesday, March 13, 2019 - link

PeachNCream - Thursday, March 14, 2019 - link

abufrejoval - Thursday, March 14, 2019 - link

Kevin G - Wednesday, March 13, 2019 - link

zjessez - Wednesday, March 13, 2019 - link

jhh - Wednesday, March 13, 2019 - link

PixyMisa - Wednesday, March 13, 2019 - link

jordanclock - Wednesday, March 13, 2019 - link

rpg1966 - Thursday, March 14, 2019 - link

Fujikoma - Thursday, March 14, 2019 - link

ksec - Wednesday, March 13, 2019 - link

ksec - Saturday, March 16, 2019 - link

thingreenveil313 - Wednesday, March 13, 2019 - link

ksec - Thursday, March 14, 2019 - link

boozed - Wednesday, March 13, 2019 - link

farmergann - Wednesday, March 13, 2019 - link

Theorem21 - Wednesday, March 13, 2019 - link

PixyMisa - Wednesday, March 13, 2019 - link

abufrejoval - Thursday, March 14, 2019 - link

piroroadkill - Monday, March 18, 2019 - link

32TB - Thursday, March 14, 2019 - link

Hgp123 - Thursday, March 14, 2019 - link

PeachNCream - Thursday, March 14, 2019 - link

SzymonM - Thursday, March 14, 2019 - link

abufrejoval - Thursday, March 14, 2019 - link

kawmic - Thursday, March 14, 2019 - link

plsbugmenot - Thursday, March 14, 2019 - link