Paid function There are many ways to scale systems up and down, and it’s as much a problem as a solution for distributed system architects.
Once powered up, few distributed systems are rarely asked to do one thing and one thing only – and they share this trait with large NUMA shared memory systems that run back-office applications in the world of the company. Both types of machines do many things, often at the same time in parallel or in series one after the other.
And that means that there is usually an impedance mismatch between the needs of a given application and the hardware available to run that application. This is an impossible situation, and for that we need an “impossible server”.
This server must be malleable so that it can adapt to changing workloads and their specific needs in terms of CPU and accelerator capacity, and CPU/accelerator ratio. At the same time, the impossible server allows the sharing of these resources to increase the utilization of these components over time so that the huge sums that organizations pay for clustered systems are not wasted.
“True hardware heterogeneity is hard to achieve without composability and without fabrics,” said Alan Benjamin, co-founder and CEO of GigaIO. The next platform. “The world of running applications on general-purpose X86 processors is over. And while the Nvidia A100 is a great accelerator, it’s also not the answer to everything. If you’re building a static infrastructure, whatever you build is probably ideal for one application and either unbalanced for other applications or not really appropriate at all However, if you are building an impossible server – a server that has elements composed on the fly from pools of custom CPUs, GPUs, FPGAs, ASICs and so on – then any given collection of specific machines composed through PCI-Express fabrics can only exist for as long as needed and when a job is completed can be handed over in the pool.
As is the case with all-CPU and hybrid CPU-GPU supercomputers today, it is up to research institutes and enterprise customers to maintain a queue full of jobs ready to run and to job schedulers playing Tetris-style, to adapt jobs to unused resources in the cluster. And to that end, GigaIO has built northward integrations with popular HPC task schedulers, including open-source SLURM and PBS, as well as cluster configuration tools like Bright Computing’s Bright Cluster Manager (which now does part of Nvidia).
Another aspect of the Impossible Server is that it can be configured in a way not usually done inside physical servers. For example, many jobs may require only two or four CPU-affiliated GPUs, which may have one or two CPUs, and possibly even four CPUs, depending on the serial processing and memory capacity needs of the host machine.
But what happens, as with AI training workloads, when the best configuration is to have sixteen GPU accelerators attached to a server? There are very few OEM machines that do this, and buying them can be very expensive.
But with a PCI-Express framework such as GigaIO’s FabreX and its composability software, using PCI-Express to link many GPUs or other PCI-Express-based accelerators to a single node is very simple and nearly instantaneous. . In a future release of Bright Cluster Manager, scheduled for delivery in early 2022, reconfiguration of hosts and accelerators will occur automatically. And quick.
“In an optimal environment, where you’re starting from clean hosts, you can recompose a host and configure new GPUs as desired in five seconds per node,” says Matt Demas, CTO for Sales world. and vice president of sales engineering at GigaIO, who joined the company from composability rival Liqid in August.
“So if I’m doing a ten node cluster, it can be done in less than a minute. In some cases, depending on the scenario, you might need to restart the host and you need to factor that into the equation .
The other interesting aspect of this “impossible server” according to GigaIO, is the company’s ability to treat the entire rack as a single server today, as its IP focuses on extending PCI-Express by as a routable network throughout the rack, using DMA (Direct Memory Access) In effect, this creates a ring of memory around all resources. With the advent of CXL, which GigaIO supported from the start as a Consortium Contributing Member, the reality of what we described in a previous PCI-Express and CXL article is closer than ever: “In essence, the rack becomes the server There are of course challenges, but having an entire rack addressable by load/store is very interesting indeed.
Social organization of computing
The San Diego Supercomputer Center (SDSC) is a flagship and early GigaIO customer using FabreX in an HPC environment.
“What they want is for SDSC researchers to submit work through SLURM, like they already do, and they want to be able to control the cluster through Bright Cluster Manager, like they do now,” says Benjamin.
“We need to meet them where they want to be met, and we’ve been able to do that through our integrations with SLURM and Bright, which means searchers don’t have to change what they’re doing, but they can always add composability and increase usability and efficiency.We have other northbound integrations that we have done for other customers, such as for OpenStack in fintech, and we also integrate Singularity and Kubernetes for provide material composability under containers.
The idea here is that customers don’t need another window to manage the underlying composability, but instead have integrations into those other task schedulers, cluster managers, and container platforms that they probably already using. Some customers prefer to use their own tools and appreciate the open standards approach provided by a robust command-line interface and Redfish API integration for FabreX’s underlying composition.
“If we’ve done our job well, people won’t even know FabreX exists. It’s just something that happens transparently under applications and invisibly across cluster resources,” says Benjamin.
That would actually be ideal. And, ultimately, it’s a necessary thing for what Frank Wuerthwein, acting SDSC director, calls “the social organization of computing,” which he talked about during a Birds of a Feather session. with GigaIO at the recent SC21 supercomputing conference. .
Wuerthwein explained the three use cases supported by SDSC and his desire to use the same hardware to support different modes and stacks. This includes traditional HPC simulation and modeling under the control of SLURM, which submits HPC tasks in batch mode against hardware. Then there is another system that integrates containers to run scientific workflows and services on Kubernetes. And then there are bare metal testbeds allowing researchers to run testbeds like Chameleon, CloudLab, and Fabric.
“These three ways of using the same hardware in different environments are fundamentally not interoperable, consistent, or easily co-existing,” says Wuerthwein. “And that poses a challenge for stores like SDSC because rather than buying three of each, we would much rather be able to start – or whatever you want to call it – in one dynamically and reuse material for all three.”
Miscellaneous Hardware Environments
However, the diversity of use cases is only one determining factor. SDSC also has diversification across hardware. Existing systems at SDSC include a variety of CPU host systems, with different architectures, core counts, and performance.
They include machines with six different types of Alveo FPGA accelerator cards from Xilinx and five different “Ampere” GPU accelerators from Nvidia, and that’s not counting older GPU cards, and AMD Instinct and Intel Xand HPC cards that he plans to acquire in the future. In addition to this – or perhaps rather between them – SDSC systems also have different types of SmartNICs and DPUs with FPGA or GPU accelerators.
“In an ideal world, we would like to be able to dynamically reconfigure hosts with different devices – FPGAs, GPUs, DPUs, SmartNICs, etc. – for a purpose desired by a scientific use case,” says Wuerthwein. “For this, we are preparing to use GigaIO in the new systems that we are integrating into the store and which we will talk about extensively at SC22.”
The old and the new: two pools of resources
At Purdue University, a major US public research institution that starts proof-of-concept trials with GigaIO’s FabreX, a new cluster comes online approximately every year. As Alex Younts, senior research engineer in the Information Technology Research Computing group at The Next Platform, puts it, Purdue “likes to refresh them.”
Over the past five years or so, the university has begun adding GPUs and other types of accelerators to systems, ranging from the latest, most powerful device to older versions “that we never seem to throw away,” like Younts Put the.
“We have these two resource pools, with lots of CPUs and lots of GPUs, and the two don’t meet,” he says. “And our mission is not only to support research, but also teaching, and having strict resource pools for these two missions causes us problems because we cannot mix and match them. What we’re working on is having one of these annual community clusters, with maybe 500 or 1,000 nodes, and disaggregating all of our rack-scale CPUs and GPUs and giving us the ability to meet research and instruction needs based on Kubernetes container-based workflows and virtual graphical workstations to support engineering courses.
This is exactly the kind of thing we expect to see in all academic supercomputing centers, and there are many thousands around the world who want to converge supercomputing and education on the same infrastructure. This is a big deal, and with an immediate gain for the institutions.
Sponsored by GigaIO