Systematic approach Since our recent articles have traced the history and told the lessons of software-defined networking, it seems like the time has come to do the same with the other drastic change in networking over the past decade: virtualization. network functions, or NFV. .
Most people trace NFV back to call to action [PDF] several network operators presented at Layer123 SDN & OpenFlow World Congress in October 2012.
My involvement spanned several months prior to this presentation, working with colleagues from BT, Intel, HP, Wind River and TailF on a proof of concept that gave the idea enough credibility for these operators to put the NFV involvement in the soil.
At the time, I was at the start of the Verivue CDN software, which contributed to what we thought was the canonical Virtualized Networking Feature (VNF) – a scalable web cache. Don Clarke, who spearheaded the entire effort on behalf of BT, has an excellent 4-part story of the journey, which my own direct experience has been limited to part one.
Two things stand out from the proof of concept. The first is the obvious technical quality of co-locating an access network technology (like a virtualized broadband gateway) and a cloud service (like scalable cache) on a commodity server rack. The proof of concept itself, as is often the case, was crippled by the fact that we had to tinker with existing software components that were developed for different environments. But it worked and it was an impressive technical achievement.
The second is that the entrepreneurial minds in the room were working hard to build an ROI case to associate with the tech story. The value of feature agility and speed improvements was lip serviced, but the cost savings were quantifiable, so they received the most attention.
Operators viewed NFV as a way to run purpose-built appliances in virtual machines on commodity hardware, but that’s the simpler part of the problem. Simply inserting a hypervisor between the appliance software and the underlying hardware can yield modest cost savings by enabling server consolidation, but this falls far short of the zero-touch management victory enjoyed by IT operators. modern data centers when deploying cloud services.
In practice, telecom operators still had to deal with NOT point VM configurations to operationalize NOT virtualized functions. The expectation that NFV would shift its operational challenge from pet care to livestock raising did not materialize; operators found themselves dealing with virtualized pets.
Although it was widely advertised at first, NFV did not deliver on its promises
Streamlining operational processes is hard enough under the best of circumstances, but operators have approached the problem with the burden of preserving their legacy Operation and maintenance practices (i.e. they actively avoided changes that would allow for rationalization). Essentially, carriers set out to build a telecommunications cloud by adopting cloud technologies piecemeal (starting with hypervisors). However, it turned out that NFV started a second business track which now results in a cloud-based telecommunications company. Let me explain the difference to you.
Looking at NFV PoC in hindsight, it’s clear that setting up a small cluster of servers to demo a few VNFs bypassed the real challenge, of repeating this process over time, for a arbitrary number of VNFs. It is the problem of continuous integration, continuous deployment, and continuous orchestration of cloud workloads that has spurred the development of a rich set of cloud-native tools, including Kubernetes, Helm, and Terraform.
Such tools were not generally available in 2012, even though they were emerging inside hyperscalers, and so the operators behind the NFV initiative began to (a) set up a Hosted by ETSI standardization effort to catalyze the development of VNFs, and (b) modernize their existing O&M mechanisms to support this new collection of VNFs. Without evaluating the NFV reference architecture point by point, it seems fair to say that integrating a VNF into an element management system (EMS), as if it were another device-based device, is a perfect example of how such an approach does not scale operations.
Meanwhile, the laudable goal of running virtualized functions on commodity hardware inspired a parallel effort that existed entirely outside the ETSI standards process: to create cloud-native implementations of access network technologies, which could then work side-by-side with other cloud-native workloads. This parallel path, known as the central office, has been rebuilt into a data center (ROPE), eventually led to Kubernetes-based implementations of multiple access technologies (e.g. PON/GPON and RAN). These access networks operate as microservices that can be Helm Chart deployed on your preferred Kubernetes platform, typically running at the edge (e.g., Ether).
Again, looking back, it’s interesting to look back at the two main arguments for NFV – lower costs and improved agility – and see how they were overtaken by events. On the cost side, it is clear that resolving the operational challenge was a prerequisite to realizing any capital savings. What the cloud-native experience teaches us is that a well-defined CI/CD tool chain and the means to easily extend the management plane to integrate new services over time is the price of entry to take advantage of the cloud economy.
From an agility perspective, NFV’s approach was to support service chaining, a mechanism that allows customers to customize their connectivity by “chaining” a sequence of VNFs.
Since VNFs run in virtual machines, in theory it seemed plausible that one could programmatically interconnect a sequence of them. In practice, providing a general purpose service chaining mechanism has proven elusive. Indeed, feature customization is a difficult problem in general, but starting with the wrong abstractions (a bump-in-the-wire model based on an outdated device-centric worldview) makes it unsolvable.
It just doesn’t fit the realities of building cloud services. The canonical VNF CDN is a great example. HTTP requests are not tunneled through a cache as it was (virtually or physically) wired into the end-to-end chain, but instead a completely separate Request a redirect service outside the data path dynamically directs HTTP GET messages to the nearest cache. (Ironically, this was true during the PoC since the Verivue CDN was actually container-based and built on cloud-native principles, even though it predated Kubernetes.)
A firewall is another example: in a device-centric world, a firewall is a “middlebox” that can be inserted into a chain of services, but in the cloud, an access control functionality equivalent is distributed on virtual and physical switches.
When we look at the problem of service agility through the prism of current technology, a utility mesh provides a better conceptual model to quickly deliver new features to customers, with connectivity as a service turning out to be another cloud service.
But the biggest lesson of NFV systems is that operations should be treated as a first class property of a cloud. The limited impact of NFV can be directly attributed to its proponents’ reluctance to refactor their operating model from the start. ®