Although power loss protection on SSDs is not a new concept, the applications and techniques to safeguard an SSD during and after a power loss event have largely improved in recent SSD designs. The objective of power loss protection is to accomplish two primary goals:
- Safely flush data in-flight (or data that resides in the drive’s DRAM or SRAM cache buffers) to the persistent or non-volatile Flash memory and
- Maintain the integrity of the SSD’s mapping table so that the SSD is recognised and useable again upon reboot of the system.
Note: The SSD mapping table, aka Flash Transition Layer (FTL), is responsible for the physical to logical mapping of data on an SSD.
Under a normal system shutdown, the SSD receives a command (Standby Immediate Command) from the host ATA driver alerting the SSD that the system is shutting down and the SSD prepares for power removal. In a normal system shutdown, the SSD has plenty of time to flush its cache buffers and update its mapping tables.
A well-designed SSD will employ a hardware-based design with hold-up power capacitors on-board the SSD and/or a firmware Pfail implementation where important metadata information is written to Flash memory to ensure successful recovery of the SSD on the next power up.
Early generation SSDs were not as resilient to sudden power loss as today’s models. It was common for an SSD that experienced a sudden power loss event to become unresponsive on the following power cycle. In a lot of these early cases, the power loss event rendered the SSD unrecoverable and data loss occurred.
A closer look at the two PFAIL approaches
Hardware PFAIL – Hardware PFAIL is designed with the primary goal of reducing data loss by holding up power to the SSD with on-board power capacitors (Power Caps) long enough so that data that resides in the SSD’s cache buffer can be written to Flash Memory and its mapping tables updated. A conceptual overview of a typical hardware based PFAIL event on an SSD would look something like this.
Sudden power loss is detected by the SSD controller
1. The on-board power capacitors hold up power to the SSD
2. The controller issues an internal command to flush its cache buffer
3. The controller updates its mapping tables in preparation for power removal
4. The drive powers off gracefully
Firmware PFAIL – Firmware PFAIL protection is also designed to reduce the likelihood of data loss by ensuring the firmware’s ability to rebuild the mapping table upon the next power-on following a power loss event. A conceptual overview of firmware based PFAIL protection would look something like this:
1. The SSD’s mapping table is stored in Flash memory and is updated in DRAM
2. When new data is written to the SSD, the firmware updates the mapping table
3. The new data that is written is always written with tags (or spare bytes) which include LBA, EEC and other structure data information
4. Sudden power loss occurs
5. The spare bytes that contain data structure information combined with the original mapping table enable the SSD firmware to rebuild the SSDs mapping table upon the next power on
Firmware PFAIL protection is a highly effective method for preventing data loss in enterprise storage applications. For example, it is essential that SSDs configured in RAID arrays are able to recover and return to a healthy state after a power fail event to retain the integrity of the RAID array. One or several failed array members can result in an off-line array with a high potential for data loss.
Another enterprise scenario could involve SSDs that make up a large “share pool” of storage where physical SSDs are segmented into multiple LUNs and shared amongst multiple hosts. High availability is a critical design consideration in this example and firmware-based PFAIL protection ensures successful recovery of the SSD that services these LUNs and hosts.
Kingston makes sudden power loss resiliency a top priority
Kingston® puts its SSDs (client and enterprise) through a very strenuous engineering power cycling test as part of its standard qualification process. In addition to compatibility, performance and endurance testing, Kingston SSDs must successfully pass numerous unsafe power loss events, boot up and be fully functional in order to pass the qualification process. If an SSD “bricks” during power-loss testing, the engineering qualification testing is stopped, the cause of the issue is resolved and the qualification process starts over again.
Each application and environment is unique and certain considerations should be made when deciding what type of PFAIL fits your environment.
Most enterprise applications today are safeguarded with redundant power supplies, battery backup systems and power generators to keep datacentres running in the event of unexpected power loss. Software and high-speed networks have paved the way for an increased number of data replication architectures, removing the hardware as a single point of failure.
The stability of datacentre power in conjunction with HA practices should be significant factors in determining which type of SSD PFAIL protection best suits a storage application.