Cloud Native End User Lounge: Managing 600 clusters at CERN

About this event

Usage of Kubernetes and other Cloud Native projects (Prometheus, FluentD, CoreDNS, Harbor, ...) at CERN has exploded over the last few years, with more than 600 clusters in total and thousands of nodes. During this time a lot of work was put in integrating with our identity, networking, monitoring and storage services. More recently the focus has been on improving the deployment and lifecycle of applications, with a strong focus on GitOps and tools like Flux and ArgoCD.

Use cases in production include local services for CERN users (administrative applications, pension fund, internal and publicly visible websites, ...) where security and high availability are very important, as well as services for large scale data analysis. These include interactive analysis using notebooks or remote environments, as well as batch systems.

The future still poses some big challenges: a rootless stack will help integrating High Performance Computing (HPC) resources; faster image distribution and container deployment will improve performance on large clusters and ease cluster auto scaling; and work on multi-cluster will greatly simplify our current setups to scale out our on-premises capacity adding external resources.