4 Things Data Center Managers Can Learn from HPC

Jun 2020

If you were to ask a lay person on the street what they thought a supercomputer was, you’d probably get a large percentage citing examples from popular movies ― and, usually examples with a nefarious reputation. From HAL 9000 (2001: A Space Odyssey) to iRobot’s VIKI, and even The Terminator’s Skynet; pop culture often references supercomputers as sentient systems that have evolved and turned against humanity.

Tell that to researchers at Lawrence Livermore National Laboratory, or the National Weather Service, and they’d laugh you out of the room. The truth is that supercomputers today are far from self-aware, and the only AI is essentially an overblown search bar that is scanning very large data sets.

Today, supercomputers are powering a multitude of applications that are at the forefront of progress: from oil and gas exploration to weather predictions, financial markets to developing new technologies. Supercomputers are the Lamborghini or Bugatti of the computing world, and at Kingston, we pay a lot of attention to the advancements that are pushing the computing boundaries. From DRAM utilization and tuning, to firmware advancements in managing storage arrays, to even the emphasis on consistency of transfer and latency speeds instead of peak values, our technologies are deeply influenced the bleeding-edge of supercomputing.

Similarly, there are a lot of things that cloud and on-premise data center managers can learn from supercomputing when it comes to designing and managing their infrastructures, as well as how to best select the components that will be ready for future advancements without huge overhauls.

1. Supercomputers are Purpose-Built for Consistency

Unlike most Cloud-computing platforms, like Amazon Web Services or Microsoft Azure which are built to power a variety of applications that can utilize shared resources and infrastructures, most supercomputers are purpose-built for specific needs. The most recent update of the TOP500 list of the world’s fastest supercomputers (publicly-known and declassified), notes not only the locations and speeds of installations, but the primary field of application.

Eleven of the top dozen machines are dedicated for energy research, nuclear testing and defense applications. The only outlier is Frontera, a new NSF-funded petascale computing system at the Texas Advanced Computing Center at the University of Texas, which provides academic resources for science and engineering research partners. Of the next 20 supercomputers on the TOP500 list, almost all are dedicated for Government defense and intelligence applications. Machines between numbers 30-50 on the list are largely dedicated for weather predictions. The last 50 of the top 100 are a mix of corporate computing (NVIDIA, Facebook, et.al), midrange weather predictions, space programs, oil and gas exploration, academic and specific government uses.

These machines aren’t a one-size-fits-all box. They’re custom developed with manufacturers like Intel, Cray, HP, Toshiba and IBM to perform specific types of calculations on very specific datasets ― either in real-time or asynchronous computations.

They have defined acceptable latency thresholds:

Preset computing resources leveraging millions of processing cores
Deliver clock rates between 18,000 and 200,000 teraFLOPS.

Their storage capacities are measured in exabytes ― far beyond the petabytes in modern data warehouses.

Systems like Frontera don’t just have to sprint in a peak compute load, but instead have to consistently read vast amounts of data to arrive at a result. A spike in compute performance could actually cause errors in results, thus the emphasis is on consistency.

Today’s data center manager needs to first ask, “What are we doing with the system?,” in order to architect, manage resources and build in predictable fail-safes. Managing a data center that runs a bunch of virtual desktops is a lot different than a 911 call center, or air-traffic control systems. They have different needs, demands, service-level agreements and budgets - and need to be designed accordingly.

Likewise, there needs to be a consideration about how to achieve consistent performance without requiring custom builds. Companies like Amazon, Google and Microsoft have the budgets to engineer custom storage or computing infrastructures, but the majority of service providers have to be more selective with off-the-shelf hardware.

Thus, more data center managers need to set strict criteria for performance benchmarks that address QoS and ensure the greatest emphasis is not only on compute speed and latency, but also consistency.

server with glowing lines representing a network

2. Your Real-Time is Not My Real-Time

With supercomputing applications, most instances of real-time data have major implications. From stopping a nuclear reaction to telemetry data for a rocket launch, compute latency can have catastrophic effects ― and the data sets are massive. These streams aren’t just feeding from a single source; but are often delivered from a network of reporting nodes.

But the data is short-lived. When working with real-time feeds, most of the data doesn’t get held forever. It’s written and then overwritten with a shelf life for sequential writes and overwrites. Real-time data is always changing, and very few applications would need every bit stored from the beginning of time. The data gets processed in batches, computed to create a result (be it an average, statistical model or algorithm) and the result is what’s kept.

Take National Oceanographic and Atmospheric Administration (NOAA) supercomputer predictions for example. There are always constant changes in meteorological factors, be it precipitation, air and ground temperature, barometric pressure, time of day, solar effects, wind and even how it passes over terrain. That changes every second and gets reported as a real-time stream of information. But NOAA’s National Weather Service (NWS) doesn’t need the raw data forever. You need the forecasting models! As the global forecasting system (GFS) model takes shape, new data gets pushed through it, forming more accurate and updated predictions.

Moreover, local meteorologists who share and receive data from the NWS don’t need access to the entire global dataset of weather. They just limit their models to local areas. This allows them to supplement NWS data with local weather stations thus giving insights to microclimates and accelerating more accurate local predictions by creatin batches, computed to create a result (be it an average, statistical model or algorithm) and the result is what’s kept.

The same could be said for stock trading, or financial models, which work with moving averages - each with specific indicators and action triggers built in, based on specific parameters for acceptable market behavior thresholds. Designing a system that uses “real-time” data doesn’t have to store everything that it ingests ― but should leverage non-volatile random access memory (NVRAM) and dynamic random access memory (DRAM) to cache and process data in-flight, then deliver computed output to storage.

flash memory chip illustration with glowing circuit traces

3. Latency Thresholds, NAND Flash and Tuning DRAM

Most of the latency thresholds are set because of the application demands. In trading scenarios, seconds mean millions, if not billions of dollars. For weather predictions and hurricane tracking, it could mean deciding between evacuating New Orleans or Houston.

Supercomputers operate with the a priori burden of service level - be it latency, computing resources, storage or bandwidth. Most employ fail-aware computing, whereby the system can reroute data streams for optimal latency conditions (based on 𝛱+Δmax clocking), shifting to asynchronous computing models, or prioritizing compute resources to deliver sufficient processing power or bandwidth for jobs.

Whether you’re working with high-end workstations, iron servers, or HPC and scientific workloads, big computers and Big Data require huge DRAM loadouts. Supercomputers like the Tianhe-2, use huge RAM loadouts combined with specialized accelerator cards. The ways in which supercomputing fine-tunes the hardware and controller framework is unique to the application design. Often specific computational tasks, where disk access creates a huge bottleneck with RAM requirements, make DRAM impractical but are small enough to fit into NAND flash. The FPGA clusters are also further tuned for each specific workload to ensure large data sets take huge performance hits if they have to use traditional media to retrieve data.

The teams collaborating between the University of Utah, Lawrence Berkeley Lab, the University of Southern California, and Argonne National Lab have shown new models for Automatic Performance Tuning (or Auto-tuning) as an effective means of providing performance portability between architectures. Rather than depending on a compiler that can deliver optimal performance on more novel multicore architectures, auto-tuned kernels and applications can auto-tune on the target CPU, network, and programming model.

helmeted IT working man with a laptop in front of a heads-up display illustration

4. Multiple Layers of Fail-Safes

Energy distribution within the HPC data center is increasingly challenging ― especially with infrastructures that are leveraged as shared resources. In either dedicated or as-a-service provisioned infrastructures, data centers need to ensure continuous operation and reduce the risk of damaging fragile hardware components in the event of a power failure, spike or changes in peak demand.

Architects use a mix of loss-distribution transformers:

DC power distribution and UPS backups,
Trigeneration (creating electricity through heat to store in backup)
Active monitoring

“Save and save often” is the mantra for any application, and the same is true for data centers where “backup” becomes the operative term.

Most data centers today operate with a high-level RAID structure to ensure continuous and near-simultaneous writes across storage arrays. Furthermore, HPC infrastructures leverage a high amount of NVRAM to cache data in process, which are either livestreams of data that don’t pull across storage arrays, or parallel-processed information creating a scratch disk-esque usage to free up additional compute resources. The previously mentioned Frontera system leverages 50PB of total scratch capacity. Users with very high bandwidth or IOPS requirements will be able to request an allocation on an all-NVMe (non-volatile memory express) file system with an approximate capacity of 3PB, and bandwidth of ~1.2TB/s.

This constant RAID backup for storage, and consistent caching of NVMe buffers are dependent on the total I/O thresholds for the controllers on device, and for the total available or provisioned bandwidth for remote storage/backup.

Most HPC infrastructures are also eliminating the chance of hardware failures with spinning drives by moving completely to solid-state arrays and flash storage blocks. These storage solutions provide consistent IOPS and have predictable latencies that fall within the application-specific latency thresholds. Many supercomputers also leverage multiple tape libraries (with capacity scalable to an exabyte or more), in order to have reliable data archival for every bit processed and stored.

Many are also ensuring that should everything else fail in the chain, there are power-fail (PFail) capacitors (P-Cap) also labeled as power-loss-protection (PLP) installed on SSDs and DRAM. P-Caps allow drives (either independent or across an array) to complete writes in-progress, thus reducing the amount of data potentially lost during a catastrophic failure.

Concluding

Again, custom is key in the supercomputing world, but knowing your needs is the first step when building a data center and how to achieve the most consistent type of performance. No matter the size of the data center, why not think of it as important or in terms of a supercomputer when it comes to generating, storing or sharing data. By evaluating those factors, architects can design high performance infrastructures that are ready for future advancements, even with off-the-shelf components.

#KingstonIsWithYou

DC600M 2.5” SATA Enterprise SSD
Optimized for mixed-use applications

2.5” form factor

480GB, 960GB, 1920GB, 3840GB, 7680GB

Up to 560MB/s Read, 530MB/s Write
Learn more Buy
DC1000B M.2 NVMe SSD
Optimized Server Boot Drive

M.2 NVMe form factor

240GB, 480GB

Up to 3,200MB/s read, 565MB/s write
Learn more Buy

Blog Home

The Difference Between Enterprise & Client SSD

The differences between SSD classes lies in two components; the processor and the NAND memory.
How do Kingston SSDs & QNAP NAS power the DCP Workflow for Digital Theaters?

Kingston and QNAP have improved the efficiency of digital theaters with their hardware.
Find the Best SSD for Your Data Center

Questions to ask when seeking the right SSD for your organization’s data center.
How Many Drive Writes Per Day (DWPD) Do You Actually Need?

Here are some key points to help you better balance price, performance, and longevity.
Memory and Storage Unleashed: Fueling Creativity in Entertainment

We explore the how media organizations can speed up workflows through storage and memory.
2024: Technology Experts' Predictions

2023 has been a year full of challenges and innovations. But what will 2024 bring?
Kingston DC600M Data Center SSD: Now VMware ESXi and vSAN Compatible

Elevate your data center with Kingston's VMware-compatible DC600M enterprise SSDs.
Enhancing Content Creator’s Workflow with Kingston SSDs and Memory

Kingston's server SSD and memory transformed Android Basha's production workflow.
Performance Photographer Ralph Larmann in the Digital Darkroom

Kingston storage solutions help improve performance photographer Ralph Larmann’s workflow.
How Did Kingston SSDs Help DASH Pictures Transform On-location and Post-production Workflows?

DASH Pictures enhances their media production efficiency with Kingston SSDs.
More for Less: Why Is Upgrading Important in Today's Economic Climate?

Discover why upgrading your technology in this economy matters more than ever.
Unlocking the Power: 5 Reasons for Using Data Center SSDs

Server SSDs have higher endurance, better reliability, and improved performance over client SSDs.
Extend the Life of Your IT: Upgrade Vs. Replace

We explore the benefits of upgrading vs. replacing, and how organizations can succeed.
Upgrading Server Hardware vs. Buying the Latest Platform

Assess your existing server hardware to extend its lifespan and maximize your investment.
NVMe Is the Silent IT Revolution—What Is It, and Why Should You Adopt It?

Learn about NVMe and why choosing it is beneficial to your organization’s infrastructure.
The Rise of the Digital Twin: The Secret Weapon to Next-Level Operations

We discuss the journey of digital twin technology, exploring real-world examples for success.
SDS vs. Hardware RAID vs. Software RAID: What Is the Future?

In this whitepaper, we explain the benefits of SDS vs. its HW and SW RAID counterparts.
Kingston and 2CRSi Solve Data Center Energy Consumption Challenges

In this video, we explain how Kingston and 2CRSi work together to solve data center challenges.
Kingston Memory and SSD Meet Integrator APLIGO GmbH's Strict Requirements

Learn why APLIGO chose Kingston SSD and memory to support their system integration business.
Evaluating Storage Products for Your Enterprise Needs

Choosing storage products for your enterprise is a complex process. Kingston offers our expertise.
Optimizing Storage in a Creative World

Learn how Kingston and QNAP solutions help optimize content creation.
Kingston Memory and Storage Supporting the Needs of a Leading Hosting Provider

Learn how Kingston supported a leading hosting provider with optimal configuration.
Build resilient, responsive databases with Kingston’s DC1500M Enterprise NVMe SSDs

Kingston examines how its DC1500M Enterprise NVMe SSD affects workloads and compares to competitors.
SSDs: The Changing Face of Data Storage eBook

In this eBook, we speak with experts about the journey of data storage, and what the future holds.
SSDs and RAM power every step of the streaming media and entertainment pipeline

From shooting, to post-production, to encoding, to distribution at data centres, SSDs and RAM are powering the world of OTT Media & Entertainment (M&E) video and audio streaming.
How to Choose Server Memory

We explore the different types of memory & how to make the right choice for your server.
Top 12 Tips SMEs Can Take to Enhance Cybersecurity

We explore the top 12 tips small and medium size enterprises can take to enhance cybersecurity.
What Is Driving the Growth of Data Centers?

This infographic is about the different types of data centers, the myths, and Kingston's DC solutions.
Kingston’s 3 Predictions for the Data Center and Enterprise IT in 2022

We’ve examined several factors using unique research to identify what may impact markets globally in 2022.
2022: Technology Expert Predictions

2021 has been a year full of challenges and innovations. But what will 2022 bring?
Digital Transformation: Sustaining Success in a New Business Era

Read our eBook, about the rise of digital transformation and what the future holds.
Sustained Commitment Required for Cybersecurity

Bill Mew shares his thoughts how the largest security challenges need commitment from the boardroom.
Are You Close to the Edge? Why Edge Computing Needs Security

Rob May shares his thoughts on how close we are to edge computing and the security it requires.
Enterprise Capabilities in the Palm of Your Hand: A Videographer’s Experience with the DC500M

The high-performance DC500M Server SSD is the best storage choice for a pro videographer.
How Memory and Storage Have Supported the Digital Evolution

Memory and storage have evolved over the years. Get an insight from our industry expert.
Accelerate Your Server Performance with SSD RAID Arrays

Our partnership with Microchip’s RAID controllers helps deliver high performance for server storage.
Case study: How Simply Hosting benefited from collaborative service and support

Kingston helped optimize Simply Hosting’s storage in its data centers to ensure they were always on.
How Kingston helped lower data center power costs by 60%

Learn how Kingston helped to lower power costs and increase performance so Hostmein could deliver on SLAs.
Beyond Smart Cities: How IoT is Changing the World

In this eBook, we speak with experts about IoT’s journey and prepare organizations for IoT’s future.
The Critical Roles of Data Centers During COVID-19

The pandemic has increased internet traffic which has placed importance on the role of data centers.
Should I Make the Switch to NVMe?

Cameron Crandall of Kingston helps you self-evaluate the need to move to your server storage solution to NVMe.
The NVMe Promise: Squeeze More Out of Existing CPUs

NVMe over Fabric helps CPUs to run more efficiently with lower latency and higher throughput.
Sustainability, Innovation, and Partnership Series - Episode 2

Join industry experts to discuss how technology partners like Kingston support their business growth and sustainability.
Sustainability, Innovation, and Partnership Series - Episode 1

Industry experts discuss topics like the key pillars of tech relationships, sustainability, and IT optimization.
Webinar: How Is NVMe Revolutionizing Storage?

Watch this webinar to discover the benefits of NVMe SSDs.
NVMe: Redundancy and RAID

The switch to NVMe requires a full stack review from IT architects to ensure redundancy exists on every layer of the stack.
Using AI to Turn Today’s Challenges into Tomorrow’s Opportunities

In this eBook, we talk to industry experts as we explore the benefits of artificial intelligence, how it’s fuelling data consumption and how you can prepare your organization for the opportunities it presents.
Driving NVMe Into the Future

Here are seven of the predictions of what will drive NVMe adoption for 2021.
How to Test an Enterprise SSD Part 2: Know Which Metrics to Test and Use the Right Software

Test SSDs to know the real endurance, changes in latency and IOPS in sequential and random read or write scenarios.
How to Test an Enterprise SSD Part 1: Use Your Exact Environment with Real Data, Apps, and Hardware

SSD test beds for enterprise servers should be done with the real hardware, OS and data. We’ll explain why.
Lockdowns, Upgrades, and Industry Challenges in the Cloud and Data Centers

MSP industry expert Rob May’s insight into how memory/storage upgrades help companies with remote workers.
Top 4 SSD Buying Mistakes

Don’t pick the wrong type of SSD for your server. A wrong choice means a higher cost of ownership. Learn to pick the right SSDs.
Influencer Tech Insights for 2021

What will 2021 bring in tech and trends? What do our KingstonCognate members and industry experts predict for the future?
Kingston SSDs are the best hardware choice for software-defined storage solutions

Learn why the future of business depends on SSD-enabled SDS, and how SSD fits into software-defined storage solutions.
What are the benefits of next-gen 16Gbit DDR4 DRAM?

Planning a new system? Watch this video to learn about the benefits 16Gbit DRAM next-gen technology.
Case Study: Helping Rapid On-Premises Application Growth with Kingston Enterprise Products and Services

Learn how Hosteur supported their rapid growth & SLAs with Kingston Enterprise products & services.
Rip and Replace vs Predictability: Why SSDs with Predictable Latency Matters

Data centers should be using server SSDs. There are many benefits over client drives and costs have come down.
The Benefits of NVMe in Enterprise

NVMe is now the standard protocol for SSDs to empower data centers and enterprise environments.
Case Study: Accelerating virtual machines with Kingston DC500M SSDs

Find out how Hardwareluxx were able to manage the growth of their web traffic using Kingston's DC500M SSD.
Is Now the Time for SDS (Software Defined Storage)?

SDS hasn’t lived up to its hype but now that NVMe is more affordable the commodity hardware is ready to deliver.
The Right Solid-State Drive (SSD) Matters

Choosing the right SSD for your server is important since server SSDs are optimized to perform at a predictable latency level while client (desktop/laptop) SSDs are not. These difference result in better uptime and less lag for critical apps and services.
Why does 5G need Edge Computing in a Micro Data Center?

Industry expert Simon Besteman provide insight on why 5G demands edge computing in micro data centers.
The Demand on Data Centers in the Time of Coronavirus

What are the demands of the Data Centers in these unprecedented times? Read this article from Industry Expert, Dr Sally Eaves & she will provide you with an insight on the demands.
Kingston Memory & Enterprise SSDs Give Managed Hosting Provider i3D.net a Competitive Advantage

Kingston delivered compatible memory that met the performance goals of i3D.net's servers, extending the service life of their existing hardware.
The Future of Data Centers - 5G & Edge Computing

Why is Edge Computing Data Centers important for 5G? Download and read Kingston’s eBook on Edge Servers and the 5G rollout.
Improve SQL Server Performance with DC500M Enterprise SSDs

This whitepaper demonstrates how using Kingston Technology’s Data Centre DC500 SSDs can reduce your overall capital and licence costs by 39%.
Quality of Service (QoS)

Data Center 500 Series SSDs (DC500R / DC500M) – Consistency, predictability of Latency (response time) and IOPS (I/Os Per Second) performance.
ECC and Spare Blocks help to keep Kingston SSD data protected from errors

End-to-End Data Protection protects customer’s data as soon as it is transferred by the host system to the SSD, and then from the SSD to the host computer. All Kingston SSDs incorporate this protection.
SSDs for Software-Defined Storage (SDS) Arrays

SDS provides a policy-based control of data tiers, independent of the underlying storage hardware.
SSD Over-provisioning (OP)

A percentage of space on an SSD is reserved for OP in firmware. OP can improve performance.
A Closer Look At SSD Power Loss Protection

Firmware/hardware PFAIL protection is an highly effective method for preventing data loss in enterprise SSD.
Kingston’s garbage collection methods allows greater SSD performance

Kingston uses LSI^® SandForce^®-based controllers in some SSDs that use proprietary tech for Garbage Collection.
SSDs for High-Performance Computing (HPC)

HPC can require massive amounts of data. SSDs consume a fraction of the power of their spinning disk.
SSDs for Online Transaction Processing (OLTP)

Kingston datacenter SSDs provide excellent resiliency to protect sensitive data in OLTP workloads.
FAQs for SATA and M.2 SSDs - Kingston Technology

Frequently asked questions around SSD technologies and terms like SATA, M.2, NAND, RAID, NVMe, PCIe, SAS, and keying.
SSDs for Virtual Desktop Infrastructure

Storage can be the most challenging component for VDI performance.
Solid-State Drive Testing 101

Testing is a cornerstone of our commitment to deliver the most reliable products on the market. We perform rigorous tests on all of our products during each stage of production. These tests ensure quality control throughout the entire manufacturing process.
What is DDR4 Memory? Higher Performance

Learn how DDR4 delivers faster speeds, reduced power consumption and increased capacity over DDR3.