HPC clusters require a different approach to security

By Alan Gush

 

The massive power of HPC clusters built from thousands of nearly identical servers makes them ideal for tackling critical tasks, but their very purpose makes them high-value targets for sophisticated nation-state actors motivated to spy on and harvest all that data.

 

Attacks like this are innovative and adaptive, powered by teams of engineers and resources not typically available to everyday hackers. HPC hackers perform detailed analyses and devise ways to exploit vulnerabilities on a single cluster machine, knowing it can provide a gateway to the rest of an HPC cluster. When they get in, they don’t just smash-and-grab. They look to establish a beachhead for long-term benefit.

 

Unfortunately, traditional approaches that require agents, daemons and intrusion-detection tools to secure conventional servers don’t work well in HPC clusters designed for raw speed. Traditional tools usually add overhead and slow performance, and they don’t address the inherent vulnerability of having so many cloned machines that allow exploits to spread. A vulnerability on one is a vulnerability on all.

 

System diversity dramatically reduces risk

Slowing or halting that spread requires each machine to look different to malicious code and determined bad-actors without compromising the work each node is doing. To do that, clusters need parallel performance, seamless integration and machine-level diversity that allows every server to perform the same and produce the right results — without extra overhead. Doing that manually is a tall order for cluster maintainers, but it can be done readily with polymorphism.

 

Polymorphism scrambles and diversifies machine code while maintaining absolute uniformity of all the semantics used in data-mining, analysis and other HPC use cases. All caches, libraries, registries, memory access, and other processes perform the same, but the systems on which they’re running are unique.

 

This uniqueness means sophisticated exploits begin and end at a single machine. Since file locations, memory addresses and everything else about the system is randomized, bac actors can’t easily move across a cluster. To succeed, they would need to develop new techniques to exploit every node. That’s time-consuming, costly and something they want to avoid.

 

Security that’s preventive, not reactive

Solutions that use polymorphism to compile and scramble your code without changing your existing workflows allow you to create unique Linux OS and application images based on software and tools you know and love. But instead of relying on public binaries that can be readily exploited, each node in your cluster becomes an instance that doesn’t look like any other, and no longer offers insight into any other server in your cluster. Though machine performance is unaffected, exploits that target common vulnerabilities simply won’t work.

 

Instead of reacting to security problems as they happen, polymorphism is a preventive approach that can be applied to one machine or tens of thousands. This doesn’t just reduce the attack surface of an HPC cluster, which is small to begin with, but increases the “fog or war” that can thwart sophisticated nation-state attackers.

 

Though many HPC clusters are air-gapped for security, attackers are innovative and adaptive, looking for code or security weaknesses in low-level systems to gain a foothold. Tracking those attempts with alerts alone just creates a flood of noise that’s difficult for cluster maintainers to act on. By preventing zero-day exploits and avoiding Common Weakness Enumeration (CWE) and Common Vulnerability Exploits (CVE) threats common to stock Linux distributions, maintainers instead can focus on truly pressing security issues.

 

This approach flips the script, forcing would-be attackers to constantly devise new exploits and change their approaches while cluster maintainers can confidently harden and tune security measures at a pace that makes sense. For example, cluster maintainers can diversify the image on every machine or on just some, or replace those randomized images daily, weekly or on the same cadence as their regular update schedule. This sort of flexibility makes attackers work hard, not HPC managers.

 

Polyverse makes polymorphism easy and scalable

Polyverse and Polymorphic Build Farm for Open Source enable you to take advantage of polymorphism by handling all the recompiling, code scrambling and deployment of your Linux OS and application images. Together, they allow you to take a preventive approach that hardens your security measures without impacting critical HPC performance. All this makes your HPC environment far less appealing to bad actors, increases their risk of discovery, and leads them to move on to other, less secure operations. 

 

Read more about how Polyverse can help you take on even the most sophisticated nation-state attackers and keep your HPC cluster — and all its valuable data — safe. Read the HPC Whitepaper.

 

 

Interested in learning more?

Be the first to hear about the latest product releases and cybersecurity news.

The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive licensee of Linus Torvalds, owner of the mark on a world­wide basis.