This site uses cookies to provide enhanced features and functionality. By using the site, you are consenting to this. We value your privacy and data security. Please review our Cookie Policy and Privacy Policy, as both have recently been updated.

Best Practices

Memory channels, frequency and performance

Although most people don't realize it, the world runs on many different types of databases, all of which have one thing in common – the need for high-performance memory to deliver data quickly and reliably.

From the time we wake up to a phone call processed by our cellular service providers’ customer record database, to our weekly electronic shopping payment being processed by the financial institutions transaction database and our late night movie matinee streaming experience serving us a database of movie recommendations based on our viewing habits, databases serve many of our daily queries and need to perform consistently fast and scale dynamically to meet customer demand. [1]

Serving data with consistent performance and transaction integrity is no easy task and often requires in-memory databases to serve viewing recommendations and relational data nearly instantaneously to multiple users.

In-memory databases (IMDB) rely primarily on the use of high capacity and most importantly high-performance DRAM (Dynamic Random Access Memory). They can service a high volume of requests up to x times faster than traditional disk-bound databases and serve as the backbone in any scenario that requires fast response times when querying useful data and can be used to complement big data applications.

DDR3 SDRAM (Double Data Rate Synchronous Dynamic Random Access Memory) technology memory DIMMs (Dual In-line Memory Module) are available in different capacities and speeds. The speed of a memory module is often referred to as memory frequency and is denoted using MegaHertz (MHz).

Memory frequency does have a direct relationship with memory performance,; as the memory frequency increases, so does the memory performance.

DRAM is, however, only one piece of the pie for achieving optimal memory subsystem performance. A memory controller is needed to manage the memory subsystem and different population rules governing the memory controller will affect the frequency/speed and latency at which a memory module can be addressed .

Newer generation memory controllers are embedded into the processors for best performance but require attention as some memory controllers can only run the memory subsystem at a maximum memory bandwidth of 800MHz.

Using the Intel® Romley platforms’ available 24 DIMM (Dual Inline Memory Module) sockets connected to the Intel Xeon E5 family memory subsystem, we can gauge the sustained memory bandwidth in different memory configurations using SiSoft Sandra 2012 integrated STREAM memory benchmark with different memory channel population and memory clock speeds. [2]

The Intel Xeon E5 family features numerous performance improvements over the previous generation of Xeon 5500 and Xeon 5600 Server processors, including two important performance related upgrades discussed in this paper, quad channel memory addressing and support for 1600MHz (MegaHertz) DDR3 (Double Data Rate) memory speeds with faster 8GT/s (GigaTransfers per second) QuickPath Interconnect (QPI) micro-architecture that benefits the connectivity bandwidth available for the reduced latency to the memory array. [3]

Channel population performance

Figure 1. Channel population performance measured using SiSoft Sandra 2012
Test configuration included SiSoftware Sandra 2012 Memory benchmark on Intel Romley platform
S2600GZ with two Xeon E5-2665 2.40GHz processors and 64GB of memory (2 x KVR16R11D4K4/32
@1600 MHz) installed. CPU Hyper-threading and power saving features disabled.

As seen in Figure 1, the performance of the memory subsystem increases near-linearly from the slowest configuration, a single memory channel populated on either Xeon processor memory controller by a single 8 Gigabytes (GB) DDR3 1600 MHz memory module; to the fastest, using a quad channel (also known as 1 DIMM per channel (DPC) populated memory subsystem with four 8GB 1600MHz memory modules populating each memory socket in the first available memory bank of either processor.

Even with the increased electrical load of a quad channel configuration (1 DPC), a near four-fold increase in memory subsystem performance to ~70GB/s compared to a single channel configuration is observed, an ideal solution for applications requiring high performance for resource intensive applications such as IMDB.

Memory frequency performance

Figure 2. Relative memory frequency performance measured using SiSoft Sandra 2012
Test configuration included SiSoftware Sandra 2012 Memory benchmark on Intel Romley platform
S2600GZ with two Xeon E5-2665 2.40GHz processors and 192GB of memory (2 x KVR16R11D4K4/32)
installed. CPU Hyper-threading and power saving features disabled.

In Figure 2, we utilize the same eight 8GB DDR3 memory modules running at four different memory speeds (MHz) symmetrically across both Intel Xeon E5 family memory subsystems to achieve a balanced configuration and showing the best case performance at all memory speeds.

Running the memory modules at 800MHz, we see the slowest performance with ~40GB/s sustained transfer speeds measured using SiSoft Sandra 2012 integrated STREAM memory benchmark.

As we scale the frequency higher, we can see memory performance increase nearly linearly up to the maximum of ~70GB/s when running at 1600MHz, ideal for scenarios where resources written to memory require the highest achievable performance to remain efficient.

Memory capacities versus frequency performance

Figure 3. Memory capacities versus frequency performance measured using SiSoft Sandra 2012.
Test configuration included SiSoftware Sandra 2012 Memory benchmark on Intel Romley platform S2600GZ
with two Xeon E5-2665 2.40GHz processors and 192GB of memory (KVR16R11D4K4/32) installed. CPU
Hyper-threading and power saving features disabled.

To conclude our research into memory performance, in Figure 3, we look at the performance of a memory subsystem populated with 192GB of memory running at 1066MHz versus a configuration using 128GB and 64GB, both running at 1600MHz.

Increased memory capacities running at the same 1600MHz memory speeds using either 128GB (16x 8GB) or 64GB (8x 8GB) spread symmetrically across both memory subsystems shows approximately the same ~70GB/s sustained performance.

A larger, 192GB memory capacity (24x 8GB), albeit running at a slower 1066MHz, shows a negligible ~17GB/s drop in sustained performance as a trade-off for an increased memory capacity.


Obeying the channel population rules specific to the server processor and memory controller allows us to easily strike the right balance in optimizing our memory for best performance using simple steps like populating all four memory channels, thus increasing memory performance up four times, increasing the ROI (Return on investment) while simultaneously reducing the TCO (Total cost of ownership) over the lifecycle of the server.

[1] Predicting User Preference for Movies using NetFlix database, Department of Electrical and Computer Engineering Carnegie Mellon University

[2] SiSoft Sandra Q & A - Memory Benchmark, SiSoftware

[3] Intel® Xeon® Processor E5-2600 Product Family News Fact Sheet, Intel®

        Back To Top