Languages Business Resources Where To Buy
Memory
SSD
USB Drives
Flash Cards
Wireless
Support
MEMORY SEARCH

Best Practices

Enterprise versus Client SSD

A growing number of enterprise datacentres requiring high data throughput and low transaction latency previously reliant on Hard Disk Drives (HDD) in their servers are now running into performance bottlenecks and are looking to Solid State Disks (SSD) as a viable storage solution to increase their datacentre performance, efficiency, reliability and lowering overall operating expenses (OpEx).

To begin to understand the differences between SSD classes we have to distinguish the two key components of an SSD, the Flash Storage Processor and the non-volatile NAND flash memory used to store data.

In today’s market, SSD and NAND flash memory consumption are split into three main groups; consumer devices (Tablets, cameras, mobile phones), client (Netbook, notebook, ultrabook, AIO, desktop personal computers), embedded/industrial (Gaming kiosk) and enterprise computing (HPC, datacentre servers).

Choosing the right SSD storage device for enterprise datacentres can, however, be a long and arduous process of learning and qualifying a multitude of different SSD vendors and product types as not all SSDs and NAND flash memory are in fact created equal.

SSDs are manufactured to be easily deployable as a replacement or complement to rotating magnetic-platter-based Hard Disk Drives (HDD) and are available in a number of different form factors including 2.5”. They features communication protocols/interfaces including Serial ATA (SATA) and Serial Attached SCSI (SAS) to transfer data to and from the Central Processing Unit (CPU) of a server.

Being easily deployable, however, does not guarantee that all SSDs will be suitable in the long term for the enterprise application they were deployed in and the cost of choosing the wrong SSD can often negate any initial cost-savings and performance benefits gained when the SSDs are either worn out prematurely due to excessive writes, achieve far lower sustained write performance over their expected life time or introduce additional latency in the storage array and thus require early field replacement.

In this paper we will discuss the three main qualities that distinguish an enterprise and client class SSD to assist in making the right purchasing decision when the time comes to replace or add further storage to an enterprise datacentre.

Performance

SSDs can deliver incredibly high read and write performance for both sequential and random data requests from the CPU through the use of multi-channel architecture and parallel access from the FSP to the NAND flash dies.

In a typical datacentre scenario that involves the processing of millions of bytes of random company data, including collaboration on technical CAD drawings, seismic data for analysis (e.g. Big Data) or accessing worldwide customer data for banking transactions (e.g. OLTP), the storage devices must be accessible with the least amount of latency and can involve a large number of clients needing access to the same item of data simultaneously with no degradation in response time.

A client application will only involve a single user or application access with a higher tolerable delta between the minimum and maximum response time on any user or system actions.

Complex storage arrays using SSDs (e.g. Network Attached Storage, Direct Attached Storage or Storage Area Network) are also adversely affected by mismatched performance and can cause havoc on the storage array latency, sustained performance and ultimately, quality of service.

Unlike client SSDs, enterprise-class SSDs such as the Kingston E100 solid-state drive are optimised not only for peak performance in the first few seconds of access but using a larger over-provisioned area (OP) they also offer a higher sustained steady state performance over longer periods of time. [1]

This guarantees that the storage array performance stays consistent with the organisation's expected quality of service during peak traffic load.

Reliability

NAND flash memory has a number of inherent issues associated with it, the two most important include a finite life expectancy and a naturally occurring error rate.

During the production process of NAND flash, each NAND flash die is tested and characterised with a raw Bit Error Rate (BER or RBER).

The BER defines the rate at which naturally occurring bit errors in NAND flash occur without the benefit of Error Correction Code (ECC) and which the FSP corrects using on the fly advanced ECC without disrupting user or system access.

The flash storage processor's ability to correct these bit errors can be interpreted by the Uncorrectable Bit Error Ratio (UBER), “a metric for data corruption rate equal to the number of data errors per bit read after applying any specified error-correction method”. [2]

As defined and standardised by the JEDEC Committee in 2010 by documents JESD218A:Solid State Drive (SSD) Requirements and Endurance Test Method and JESD219:Solid State Drive (SSD) Endurance Workloads, the enterprise class differs in a number of ways from client class SSDs including but not limited in their ability to support heavier write workloads, more extreme environmental conditions and recovery from a higher BER than a client SSD. [3] [4]

Application Class Workload (see JESD219) Active Use (power on) Retention Use (power off) Functional Failure Requirement (FFR) Uber Requirement
Client Client 40° C
8 hrs/day
30° C
1 year
≤3% ≤10 -15th
Enterprise Enterprise 55° C 24hrs/day 40° C
3 monts
≤3% ≤10 -16

Table 1 - JESD218A:Solid State Drive (SSD) Requirements and Endurance Test Method
Copyright JEDEC. Reproduced with permission by JEDEC.

Using the JEDEC proposed UBER requirement for enterprise versus client SSD, an enterprise class SSD is expected to experience only 1 unrecoverable bit error at a ratio of 1 bit error for every 10 quadrillion bits (~1.11 Petabytes) compared to a client SSD at 1 bit error for every 1 quadrillion bits (~0.11 Petabytes) processed.

Additional protection methods including Redundant Array of Independent Silicon Elements (R.A.I.S.E. ™) technology from LSI® SandForce® can be implemented on enterprise class SSDs through the use of striped parity across the NAND flash dies to combat circumstances when the FSP ECC cannot recover from a bit error.

R.A.I.S.E. ™ technology can effectively lower the UBER down to 1 bit error for every 100 octillion bits (10-29) or ~111022302462515.66 Petabytes processed and offers an UBER up to nearly 1 quadrillion times less than a standard SSD. [5]

To complement the R.A.I.S.E. ™ technology on the Kingston E100 SSD, periodic checkpoint creation and a Cyclic Redundancy Check (CRC) end-to-end internal protection scheme are also implemented to guarantee the integrity of data from the host through the flash and back to the host.

Similar to an enterprise-class SSD's enhanced ECC protection against bit errors, they should typically also contain control electronics with power loss detection logic equivalent to the Kingston E100 power failure support to monitor incoming power and provide temporary power using Tantalum capacitors in the event of a power loss scenario to complete any internally or externally issued outstanding writes.

Endurance

All NAND flash memory contained in flash storage devices degrade in their ability to reliably store bits of data with every program or erase (P/E) cycle of a NAND flash memory cell until the NAND flash can no longer reliably store data. At this point it should be removed from the user addressable storage pool and the logical address moved to a new physical address on NAND flash storage array.

As the cell is constantly programmed or erased, the BER also increases linearly and therefore a complex set of management techniques must be implemented on the enterprise SSD FSP to manage the cell capability in reliably storing data over the expected life of the SSD. [6]

The P/E endurance of a given NAND flash memory can vary substantially depending on the current lithography manufacturing process and type of NAND flash produced.

NAND flash memory type TLC MLC e-MLC SLC
Architecture 3 bits per cell 2 bits per cell 2 bits per cell 1 bit per cell
Capacity Highest capacity High capacity High capacity Lowest capacity
Endurance (P/E) Lowest endurance Medium endurance High endurance Highest endurance
Cost $ $$ $$$ $$$$
Approx NAND Bit Error Rate (BER) 10^4 10^7 10^8 10^9

Table 2 – NAND flash memory types [6] [7] [8] [9]

As an enterprise-class SSD must be able to withstand heavy write activity in scenarios typical for a datacentre server that requires access to the data for 24 hours of every day in the week compared to a client-class SSD which is typically only fully utilised for 8 hours each day during the week, e-MLC is the perfect match for high performance, capacity and endurance SSDs.

As an enterprise class SSD must be able to withstand heavy write activity in scenarios typical with a datacentre server requiring access to the data across the entire 24 hours of every day in the week compared to a Client class SSD which is typically only fully utilised for 8 hours a day in the week, e-MLC is the perfect match for high performance, capacity and endurance SSDs.

Understanding the write endurance of any application or SSD can be complex which is why the JEDEC committee also proposed an endurance measurement metric using the TeraBytes Written (TBW) value to indicate the amount of raw data that can be written to the SSD before the NAND flash contained in the SSD becomes an unreliable storage medium and the drive should be retired.

Using the JESD218A testing methods and JESD219 enterprise-class workloads proposed by JEDEC, it becomes an easier task to interpret an SSD manufacturer's endurance calculations via TBW and extrapolate a more understandable endurance measure that can be applied to any datacentre.

As noted in documents JESD218 and JESD219, different application class workloads can also suffer from a write amplification factor (WAF) in order of magnitude higher than the actual writes submitted by the host and easily lead to unmanageable NAND flash wear, higher NAND flash BER from excessive writes over time and slower performance from widely distributed invalid pages across the SSD. The on the fly compression mechanism utilised on the Kingston E100 with LSI® SandForce® DuraWrite™ technology reduces the overall WAF and extends the NAND flash rated endurance for applications in the enterprise class.

While TBW is an important topic for the discussion between enterprise and client-class SSDs, TBW is only a NAND flash level endurance prediction model and the Mean Time Between Failure (MTBF) should be observed as a component-level endurance and reliability prediction model based on the reliability of components used on the device. The expectations placed on an enterprise-class SSD's components include longevity and working harder at managing the voltages across all NAND flash memory over the SSD's life expectancy.

S.M.A.R.T. monitoring and reporting on enterprise class SSDs allows the device to be easily queried pre-failure for life expectancy based on the current write amplification factor and wear level. Pre-failure predictive warnings for failure events such as a loss of power, bit errors occurring from the physical interface or un-even wear distribution are often also supported.

Client class SSDs may only feature the minimum S.M.A.R.T. output for monitoring the SSD during standard use or post –failure.

Depending on the application class and capacity of the SSD, an increased reserve capacity of NAND flash memory can also be allocated as an over-provisioned (OP) spare capacity. The OP capacity is hidden from user and operating system access and can be utilised as a temporary write buffer for higher sustained performance and as a replacement of defective flash memory cells during the life-expectancy of the SSD to enhance the reliability and endurance of the SSD.

Conclusion

There are distinctive differences between enterprise and client-class SSDs ranging from their NAND flash memory program and erase endurance to their complex management techniques to suit different application class workloads.

Understanding these differences in application classes as it pertains to performance, reliability and endurance can be an effective tool in minimising and managing the risk of disruptive downtime in the demanding and often mission critical, enterprise environment.