Memory
SSD
USB Drives
Flash Cards
Wireless
Support
MEMORY SEARCH

Technical Brief

Understanding over-provisioning

Solid-state drives (SSDs) are similar to hard disk drives (HDDs) in physical dimensions (e.g., height, width and length) and external interface (e.g., SATA or SAS interface), but the internal low level operation and components of an SSD differ vastly from the spinning magnetic platter design of an HDD.

Once all components comprising an SSD are assembled, including a single Flash Storage Processor (FSP) to manage the multitude of NAND Flash die, the SSD manufacturer can reserve an additional percentage of the total drive capacity for over-provisioning (OP) during firmware programming.

To calculate the over-provisioned percentage of an SSD, the formula in Figure 1 can be used.

% Over-Provisioning =

Physical Capacity - User Capacity


User Capacity

Figure 1. Over-provisioning formula.

As illustrated in Figure 2, in a client application class configuration consisting of 128 Gigabytes (GB) of Flash memory, 120GB of which is available to the user, the system is over-provisioned by 7 per cent rounded up.

Physical capacity User capacity % Over-Provisioning Application class
64 GB 60 GB 7% Client
96 GB 90 GB 7% Client
128 GB 120 GB 7% Client
128 GB 100 GB 28% Enterprise
256 GB 240 GB 7% Client
256 GB 200 GB 28% Enterprise
512 GB 480 GB 7% Client
512 GB 400GB 28% Enterprise

Figure 2 Over-provisioning based on capacity and application class

The OP capacity set by the SSD manufacturer can vary in size, depending on the application class of the SSD and the total NAND Flash memory capacity.

Larger capacity and different application class drives are typically configured with proportionally bigger over-provisioning due to the resource requirements in managing more NAND Flash with the use of Garbage Collection, spare blocks and enhanced data protection features (LSI® SandForce® R.A.I.S.E.™).

This over-provisioned capacity is non-user accessible and invisible to the host operating system.

The basics

To understand why an SSD is configured with over-provisioning and how it benefits the FSP, we have to delve into the typical operation of an SSD and the limitations of non-volatile NAND Flash memory.

Each NAND Flash cell has a finite life expectancy, based on its program and erase endurance (P/E), which is characterised during the manufacturing process by the NAND Flash manufacturer because each program or erase function executed on a NAND Flash cell erodes the cell’s capability to reliably store an electrical charge and may therefore threaten data integrity.

With each new generation of lithography shrink, the reduced geometry of the NAND Flash cell will typically also lead to a lower P/E endurance and increased complexity in managing the life expectancy and reliability in its application class.

To summarise, the three main factors that affect SSD endurance are:

  • NAND Flash program/erase endurance and geometry related read/program/erase complexity
  • SSD capacity
  • FSP capability and efficiency (garbage collection, write amplification, block management, wear-levelling, error correcting code)

Operating non-volatile NAND Flash memory

Each NAND Flash memory die is constructed of multiple blocks that contain a further multitude of pages.

NAND Flash can be read and written on a page level but only erased on a block level.

If a single page has to be modified or erased on an already programmed page within a block, then the entire block contents consisting of multiple pages must first be read into a temporary memory, and then erased, before the new block contents can be programmed to the same block address.

The only scenario in which a page can be written directly to a block within NAND Flash without this tedious read-erase-modify-write cycle is when the page is already in an empty state.

Therefore, keeping a large quantity of blocks empty and in reserve via over-provisioning aids in keeping performance consistent, especially in random write scenarios that exhibit the highest write amplification. [1]

Another term associated with NAND Flash operation is Write Amplification Factor (WAF).

WAF is a ratio of actual writes committed to the NAND Flash relative to the type of writes presumed written by the operating system.

Write Amplification (WA) is affected by five major factors including:

  • Sequential writes (Lower WA) vs. Random writes (Higher WA)
  • Transaction size (Larger transaction = lower WA)
  • Availability of free space from Over-provision capacity and unused user capacity if TRIM is present (more space = lower WA)
  • Data entropy / compressibility (Lower entropy = lower WA)
  • Transaction sizes aligned to page size (4K aligned = lower WA)

Typical client workloads using LSI SandForce DuraWrite technology exercise a WAF of 0.5 (50 per cent reduced data footprint) and will experience an extended Flash-rated endurance by 20x or more when compared to standard SSD controllers. [2]

Maintaining performance and endurance via OP

To avoid a scenario where the SSD is filled to full capacity with invalid pages, over-provisioning is used by the FSP garbage collection function as a temporary workspace to manage scheduled valid page merges and reclaim blocks filled with invalid pages.

Any reclaimed pages/blocks are then added to the over-provisioned capacity to accommodate write operations from the FSP and maximise performance during peak traffic load as the performance impact of reading, erasing, modifying and writing all valid pages back into an already partially full block filled with invalid pages can be a slow exercise.

The garbage collection operates independently of the operating system and is automatically triggered during periods of low activity, periodically or by issuing the respective ATA Data Set Management TRIM command.

An always-available number of empty blocks via the over-provisioned capacity assists in maintaining effective wear-levelling on the NAND Flash as the FSP intelligently redistributes write operations across all NAND Flash memory cells evenly without impacting the SSD’s overall performance during peak traffic loads.

Together with LSI SandForce DuraWrite technology, the ATA Data Set Management TRIM command can add to dynamic over-provisioning by reclaiming any invalid pages and unused user capacity using on the fly data compression/reduction techniques. This reduces the actual footprint of presumed valid data and NAND Flash wear.

As shown in Figure 3, after a presumed 25 Gigabytes Microsoft Windows Vista operating system and Office 2007 application installation, SandForce DuraWrite technology has effectively reduced the total writes committed from the host computer to the NAND Flash to 11 Gigabytes. [3]

Figure 3. The Power of DuraWrite™ [3]

In Figure 4, we show a scenario using 28 per cent over-provisioned capacity with no ATA Data Set Management TRIM support. A large portion of presumed valid data written by the host computer is left unoccupied and unused on the actual NAND Flash.

With the use of DuraWrite data reduction technology, the original data was compressed to occupy less capacity and ATA Data Set Management TRIM allowed the SSD to reclaim the presumed valid data on the drive up to 50 per cent for use with the pre-allocated OP capacity.

Figure 4. Dynamic Over-provisioning [4]

Without the use of dynamic over-provisioning on SSDs, the Flash Storage Processor has to rely solely on using a pre-allocated static over-provisioned capacity or reduced capacity Dynamic Random Access Memory (DRAM) for managing the NAND Flash.

To manage this data reduction technology via the FSP, some over-provisioning of the NAND Flash is allocated to the FSP to handle the target data. The compressibility of the target data is described by the associated entropy level and data with an entropy level of 100 per cent is typically completely random and cannot be compressed further.

Using varying entropy levels of sustained 4KB Random Write transactions which typically represent the highest write amplification and most arduous performance exercise due to their operational behaviour on NAND Flash, we can observe and measure the effect increased over-provisioning has on the overall endurance and performance of the SSD.

Figure 5 shows a considerable increase in Random Write (4KB Sustained) performance across all entropy levels of data with each percentage increase in over-provisioning. Entropy levels of 75–100 per cent show the biggest increases in performance with each proportional increase in over-provisioning and illustrate the importance and benefit of over-provisioning not only on data transactions with low entropy levels, but also the critical importance of larger over-provisioning in increasing sustained performance for highly random/higher entropy data.

Figure 5. Random Write performance vs. percentage Over-provisioning [4]

Figure 6 shows the same transactions respective of write amplification and the critical role over-provisioning plays in lowering the write amplification for data with higher entropy levels to increase NAND Flash endurance.

The lack of suitably sized over-provisioning would eventually decrease the compression efficiency, sustained random write performance and have a real user-perceivable impact via slower operating system/application performance and eventual premature wear out of the SSD due to a higher write amplification, especially in scenarios involving data with high entropy levels.

Figure 6. Random Write WA vs percentage Over-provisioning [4]

Increasing reliability

Additional measures including R.A.I.S.E. (Redundant Array of Independent Silicon Elements) rely on over-provisioning proportionate to the SSD capacity to allow RAID-like protection and redundancy on a single SSD up to page/block level in the event that the FSP Error Correcting Code (ECC) cannot detect and correct NAND Flash bit errors. [5]

References

Conclusion

Over-provisioning is an integral part of today’s SSDs and allows intelligent, autonomous and efficient management of NAND Flash memory for reliable data integrity, high sustained performance and extended endurance through the use of Garbage Collection, Wear-levelling, DuraWrite technology and Redundant Array of Independent Silicon Elements (R.A.I.S.E.) features.