←back to thread

203 points binwiederhier | 5 comments | | HN request time: 1.191s | source
Show context
dale_glass ◴[] No.45050455[source]
But what's actually happening? There seems to be a lack of technical information.

And why does the SSD allow this to happen? A SSD has its own onboard computer, it's not just allowing the OS to do whatever it wants. Obviously the OS can write way too much and reach the endurance limit but that should have been figured out almost instantly, with OS write stats and SMART stats.

replies(4): >>45050465 #>>45051002 #>>45051449 #>>45051652 #
Sesse__ ◴[] No.45051002[source]
> And why does the SSD allow this to happen? A SSD has its own onboard computer, it's not just allowing the OS to do whatever it wants.

If the device is DRAM-less, much of its central information (large parts of the FTL, in particular) resides in the host's RAM, where the OS could presumably touch it. If that area of RAM is _somehow_ being overwritten or out-of-sync or otherwise unreliable, you can get pretty bad corruption.

replies(1): >>45051145 #
dark-star ◴[] No.45051145[source]
no, the FTL is still in the SSD unless it's a host-managed SSD which is also operating in host-managed mode, which none of the articles have mentioned to be related to the issue
replies(2): >>45051569 #>>45054012 #
1. pjdesno ◴[] No.45054012[source]
The FTL executes on the SSD controller, which (on a DRAM-less controller) has limited on-chip SRAM and no DRAM. In contrast, a controller for more expensive SSDs which will require an external on-SSD DRAM chip of 1+GB.

The FTL algorithm still needs one or more large tables. The driver allocates host-side memory for these tables, and the CPU on the SSD that runs the FTL has to reach out over the PCIe bus (e.g. using DMA operations) to write or read these tables.

It's an abomination that wouldn't exist in an ideal world, but in that same ideal world people wouldn't buy a crappy product because it's $5 cheaper.

replies(1): >>45054296 #
2. pjdesno ◴[] No.45054296[source]
One of the Japanese sites has a list of SSDs that people have observed the problem on - most of them seem to be dramless, especially if "Phison PS5012-E12" is an error. (PS5012-E12S is the dramless version)

Then again, I think dramless SSDs represent a large fraction of the consumer SSD market, so they'd probably be well-represented no matter what causes the issue.

Finally, I'll point out that there's a lot of nonsense about DRAMless SSDs on the internet - e.g. Google shows this snippet from r/hardware: "Top answer: DRAM on the drive benefits writes, not reads. Gaming is extremely read-heavy, and reads are..."

FTL stands for flash TRANSLATION layer - it needs to translate from a logical disk address to a real location on the flash chip, and every time you write a logical block that real location changes, because you can't overwrite data in flash. (you have to wait and then erase a huge group of blocks - i.e. garbage collection)

If you put the translation table in on-SSD DRAM, it's real fast, but gets huge for a modern SSD (1+GB per TB of SSD). If you put all of it on flash - well, that's one reason thumb drives are so slow. I believe most DRAM-full consumer SSDs nowadays keep their translation tables in flash, but use a bunch of DRAM to cache as much as they can, and use the rest of their DRAM for write buffering.

DRAMless controllers put those tables in host memory, although I'd bet they still treat it as a cache and put the full table in flash. I can't imagine them using it as a write buffer; instead I'm guessing when they DMA a block from the host, they buffer 512B or so on-chip to compute ECC, then send those chunks directly to the flash chips.

There's a lot of guesswork here - I don't have engineering-level access to SSD vendors, and it's been a decade since I've put a logic analyzer on an SSD and done any reverse-engineering; SSDs are far more complicated today. If anyone has some hard facts they can share, I'd appreciate it.

replies(1): >>45054885 #
3. rasz ◴[] No.45054885[source]
I dont buy this. There are plenty of dramless SATA SSDs which should be impossible if your description was correct, not to mention DRAMless drives working just fine inside USB-NVME enclosures.

>but gets huge for a modern SSD (1+GB per TB of SSD)

except most drives allocate 64MB thru HMB. Do you know of any NVME drives that steal Gigabytes of ram? Afaik Windows limits HMB to ~200MB?

>Finally, I'll point out that there's a lot of nonsense about DRAMless SSDs on the internet

FTL doesnt need all that ram. Ram on drives _is_ used for caching writes, or more specifically reordering and grouping small writes to efficiently fill whole NAND pages preventing fragmentation that destroys endurance and write speed.

replies(1): >>45057387 #
4. dijit ◴[] No.45057387{3}[source]
but isn’t it the case that SATA devices must receive AT commands to the disk controller while NVMe is mapped directly to the CPU?

Surely that distinction would make one more vulnerable to corruption than the other?

replies(1): >>45065891 #
5. Sesse__ ◴[] No.45065891{4}[source]
Are you talking about the fact that NVMe works by MMIO and DMA? So is pretty much any SATA controller, so there's no inherent difference there (there are _many_ years since the dominant way of talking to devices was through programmed I/O ports). Unless you have a NVM device with host-backed memory (as discussed elsewhere in the thread), it's not like the CPU can just go and poke freely at the flash, just as it cannot overwrite a SATA disk's internal RAM or forcefully rotate its platters. It can talk to the controller by placing commands and data in a special shared memory area, but the controller is fundamentally its own device with separate resources.