Lower cost of better reliability?
As with most technology innovations, solid-state drives (SSDs) began with high performance, as well as a high price tag. Data centers saw the value, and as technology progressed and OEMs saw the potential for slimmer, lighter form factors (which gave rise to new products like the Apple MacBook Air) SSDs have found their way into mainstream consumer technology. And with mainstream consumer technology, comes a high sensitivity to price. While end users may flinch at a conversation about Error Code Correction (ECC) mechanisms, and say their primary concern is price, these same users would go crazy if their low-priced SSD loses their data! And thus, we engineers have to be concerned about things like ECC mechanisms – and we enjoy those conversations.
So let the discussions begin. As stated, consumer markets with embedded storage using solid-state, or NAND-flash devices, are especially cost sensitive. Much of what we do can be collectively known as “signal processing” to mitigate the issues that affect the bottom line of consumer storage products. The basic building block of any solid-state storage product is a floating-gate transistor cell. The floating gate can store discrete levels of electron charges. These levels translate into one or more stored binary bits. NAND-flash manufacturers generally adopt two methods to increase the density of storage 1) physically squeeze as many floating-gate devices as close together as possible, and 2) use each storage element to store as many bits as possible (current state-of-the-art technology stores 3 bits per floating-gate transistor). However, both directives tend to increase the error probability of the bits during retrieval. Marvell’s challenge was to create an enhanced ECC technology, that when used on efficient hardware architectures, would achieve the same data integrity with high-density NAND-flash that would otherwise tend to have a higher raw bit-error rate.
Adding to the complexity, each floating-gate transistor has a limited number of program-erase (P/E) cycles beyond which probability of error increases above a threshold that renders the transistor useless and unrepairable. This limitation is due to the erase procedure, which subjects the devices to doses of high voltages that cause physical deterioration of the transistors. As the number of P/E cycles increases, the probability of error also increases. A good error-correction strategy can mitigate these effects, and therefore extend the lifetime of the devices.
Marvell is currently in the midst of a development cycle for the third generation of Low Density Parity Check codes for solid-state storage applications. Our goal is to provide effective ECC management and strategies that allow the customer to lower the cost-per-unit storage, without sacrificing reliability. And that’s something to talk about!