đĽ Platform Weekly here! You know that feeling you get when you flip your pillow to the cold side in the middle of the night? Reading us is better than that.
Letâs get bakinâ
What makes K8s upgrades so challenging?
Text by Fawad Khaliq, Founder and CTO at Chkk
Memes by Luca Galante
Our last 100+ conversations with DevOps/SREs are summarized in 4 nouns and 3 emotions: âKubernetes Cluster Version UpgradesââŚ. âHard, Pain, Workâ.
Why are Kubernetes upgrades so challenging? Why isnât a Kubernetes upgrade as easy as an iPhone upgrade experience? Hereâs what makes it hard and why DevOps/SREs find change management stressful:
1ď¸âŁ Kubernetes isnât, and shouldnât be, vertically integrated.
K8s is designed for flexibility and cloud providers work hard to ensure this flexibility isnât compromised.
đĄThe solution is a cloud-owned k8s control plane (EKS, GKE, AKS, OKE âŚ) with a few managed add-ons (e.g. CoreDNS, CNI âŚ) and some guidance on how to build apps, while giving the flexibility of introducing new components/add-ons/apps to DevOps/SRE teams.
The cost of this flexibility is that these DevOps/SRE teams must now own the lifecycle of the add-ons and the applications that run on top of the k8s infrastructure.
2ď¸âŁ You donât know whatâll break before it breaks.
With so many moving pieces, itâs hard to know if your running k8s components have incompatibilities or latent risks.
Many users use spreadsheets to track what they are running vs what they should be running, which is both painful and error prone.
We all know that âNot broken != working-as-it-shouldâ. Latent risks and unsupported versions may keep lurking around for weeks/months until they cause impact.
Whatâs needed here is sharing the collective knowledge of the DevOps/SRE teams, so if one team has encountered an upgrade risk then everyone else just gets to avoid it without any extra work on their end.
Using spreadsheets to track what's running on k8s be like:
3ď¸âŁ Getting an upgrade right takes a lot of time.
Deloitteâs CIO survey estimates that 80% of DevOps/SRE time is spent in operations/maintenance, and only 20% is spent on innovation.
I am not surprised as cooking up a âsafeâ upgrade plan is a huge time sink. You have to read an inordinate amount of text and code (on release notes, GitHub issues/PRs, blogs, etc.) to really understand whatâs relevant to you vs whatâs not.
This can take weeks of effort, which is time that you couldâve spent on business critical functions like architectural projects and infrastructure scaling/optimization.
Fawad is the Founder and CTO at Chkk - a company focused on eliminating operational risks through Collective Learning. Formerly, he was a technical lead for Amazon EKS, and early engineer at PLUMgrid, creators of eBPF. You can follow him on Twitter @fawadkhaliq.
Read the full article here.
Is the era of microservices over?
Lambda and serverless were touted by AWS to be the future, but even their own engineers disagree.
Last week, an Amazon Prime Video case study stirred up some controversy when the team revealed they had reduced costs by 90% by moving from microservices back to a monolith: âMicroservices and serverless components are tools that do work at high scale, but whether to use them over monolith has to be made on a case-by-case basis.â
Itâs surprising to some because AWS frequently frames microservices and serverless architecture as the best way to modernize applications.
But it also isnât surprising (or, at least, it shouldnât be đŹ) that some architectures work well for some businesses but not for others.
Amazon Prime Videoâs old architecture was based on Amazon Lambda, which is good if you want to build services quickly. However, it wasnât cost-effective when running at high scale. Letâs take the orchestration workflow, for example. Alex Xusuccinctly explained that âAWS step functions charge users by state transitions and the orchestration performs multiple state transitions every second.â
Furthermore, in the old architecture, intermediate data was stored in Amazon S3 before it was downloaded. High volume downloads became très cher đ¸.
A monolithic architecture is supposed to address these cost issues. From Alex Xu, again: âThere are still 3 components, but the media converter and defect detector are deployed in the same process, saving the cost of passing data over the network.â
And thatâs where the 90% cost reduction came from! Pretty neat, right?
So the next time someone tells you âmicroservices good, monoliths badâ (or âmonoliths good, microservices badâ for that matter), kindly send them this newsletter. đ And remember: your business should determine your architecture, not the other way around.
Have you joined the Platform Engineering Slack channel? If not, you're missing out. Join us to weigh in on some open questions:
- What is the best option for automation on a product deployed on an AWS Kubernetes cluster?
- How do create a value proposition your platform?
- How do you manage your repository data?
Stay crunchy đĽ
Luca
â