Keynote: Strimzi past, present and future (Kate Stanley & Paolo Patierno @ Red Hat) - River room | |
Upgrade yourself to the Business Class (Jakub Scholz @ Red Hat) - River room | Upgrading your software and keeping it up-to-date is an important part of the software lifecycle. Yet, we see many users using very old versions of Strimzi and Apache Kafka. And often - according to their own words - they use them to run a “critical infrastructure”. This talk will try to explain why staying up to date is important for security reasons but also for example to get help. It will also show that staying up to date is not as hard as many users think by going through the various ways of upgrading Strimzi and demoing some of them. |
Exploring Strimzi’s implementation of rolling updates (Gantigmaa Selenge @ Red Hat) - Waterfall room | Managing rolling updates is a crucial aspect of Kafka operations. In a production environment, it’s essential to ensure that rolling updates cause minimal disruption to eliminate the potential impact on the cluster’s availability and client applications. Automating this process with Kubernetes is not straightforward, as it lacks knowledge about Kafka clusters and how to maximise their availability during updates. Strimzi has a component known as KafkaRoller, which manages coordination of rolling restarts and reconfiguration of Kafka nodes. In this session, we will delve into KafkaRoller’s decision making process and the safety assessment when performing rolling updates, leveraging various data sources such as Kafka metrics, Admin API and Kubernetes API. |
Transition to Apache Kafka on Kubernetes with Strimzi (Steffen Wirenfeldt Karlsson @ Maersk) - River room | This is a classic migration case study (the past, current and the future) at scale from a world-wide company transitioning from Confluent Platform and Confluent Cloud to self-managed Apache Kafka on Kubernetes using Strimzi.
At Maersk, we have been architecting, designing and implementing our 3rd generation Event Streaming Platform. This platform is based on Kubernetes in Azure and using Strimzi to operate Apache Kafka at large scale, highly reliable, segregating data based on isolated use cases. Our 2nd generation was based on OnPrem Confluent Platform and Confluent Cloud and this presentation is the story of this migration and reasoning behind it.
Furthermore, we would get into details on how we monitor (Grafana, Prometheus), alert (GoAlert and alert as code), operate and provide self-service solutions on top of Strimzi to enable business critical application in Maersk, implemented in GoLang using the GitOps deployment model with Flux and Kustomization among others.
Finally, if time allows we will end with a demo of an open-source self service tool to monitor and explore the cluster with most wanted features such as topic message browsing and configuring and restarting connectors. |
Modernizing Nubank's Kafka Platform For The Next 10 Years Starting at 1 Trillion Messages per Month (Julio Turolla, Roni Silva @ Nubank) - Waterfall room | Nubank is a leading digital bank and one of the world's largest fintech companies. With over 90 million customers, Kafka has been the cornerstone of all operations at the company since Day 1, handling more than 1 trillion messages per month across over 50 clusters in three countries. In Brazil alone, Nubank's Kafka clusters process 30% of the country's instant transfer transactions.
After a decade of operating Kafka on raw EC2 instances, the regular operations began to suffer from scalability challenges. Cluster upgrades, security enhancements and storage management became a nightmare without a proper automation platform.
In this session, Nubank engineers will delve into the architecture and the process of live migrating to a Strimzi setup that manages 1 trillion messages per month. We will discuss the reasons behind choosing Strimzi, the strategies employed to minimize the impact of failures through a cell-based architecture, and the detailed mechanism of topic-by-topic live migration. Additionally, we will share the intricate architectural details developed to ensure a successful deployment.
We will also explore Kafka Flavors, a cluster specialization strategy designed to optimize the management of workloads with varying requirements, and how topics are allocated to specific clusters. |
Enhancing Kafka Topic Management in Kubernetes with the Unidirectional Topic Operator (Federico Valeri @ Red Hat) - River room | Apache Kafka stands as a critical element in real-time data streaming, yet managing topics within Kubernetes environments often presents challenges. Conventional manual methods frequently lead to errors and operational hurdles. This presentation unveils the Unidirectional Topic Operator (UTO) for Strimzi, offering Kafka administrators a seamless method to streamline topic management via Kubernetes custom resources. Distinguishing itself from its predecessor, the Bidirectional Topic Operator (BTO), the UTO enhances scalability and extends support to Kafka clusters operating in KRaft mode. By adopting a declarative approach, administrators can specify topic configurations, leaving intricate details to the operator's management. The UTO employs a unidirectional reconciliation process from Kubernetes to Kafka, effectively mitigating complexities and ensuring consistent cluster states. We will delve into migration strategies, scalability enhancements, and operational considerations, positioning the UTO as a significant advancement in Kafka topic management within Kubernetes environments. |
Partial Multi-Tenancy on Kafka using Strimzi at LittleHorse (Colt McNealy @ LittleHorse) - Waterfall room | Managing a Kafka platform that serves multiple different applications within an organization (what we call "partial multi-tenancy") is a daunting task. How do you create topics? How do you enforce the principle of least privilege? How do you mitigate the noisy neighbor problem? The good news is that Strimzi has kube-native answers to all of these questions.
Apache Kafka is the core dependency for the LittleHorse workflow engine. In LittleHorse Cloud, we use Strimzi (of course) to manage the Kafka clusters that back our LittleHorse Clusters. In particular, the "LH Cloud Enterprise" product uses Strimzi CRD's to enable partial multi-tenancy, in which multiple LittleHorse clusters share a single Kafka cluster.
We will cover how the KafkaTopic CRD simplifies the provisioning of new LH Clusters, how we use the KafkaUser CRD to enable tenant isolation, and how we use the KafkaRebalance CRD to take advantage of Cruise Control.
Lastly, we will discuss some up-and-coming Strimzi features such as KRaft support with KafkaNodePool's and topic replication factor changes with the Unidirectional Topic Operator. These new Strimzi features will allow us to deliver some great new functionality to our users and we are very excited to see these Strimzi features cross the finish line! |
How to survive managing Strimzi in an enterprise (Richard Bosch, @ Axual) - River room | You are in a platform team responsible for Kafka (Strimzi); the digital nervous system of a large enterprise. You have been so successful in advocating for Kafka by inspiring the business, that teams start queuing up at your desk to get connected. Before you know it, you are supporting dozens of development teams with their topics, and connectors.
Then suddenly something happens in production. Load on a topic is dropping. A consuming application starts lagging on another topic. Meanwhile developer John Doe contacts you about his very important topic that he wants to promote from Acceptance to Production. You want to help them all, if only you could. Who is the data owner of these topics? Which applications are producing/consuming? Who are the teams owning those applications? And, more importantly, how can we reach them?
That’s when you realize an overview of the enterprise’s streaming landscape, including all topics and connected applications, is missing. Moreover, you want a simple administration tool that makes it easy to define and deploy topics and administer contact information for data & application owners so you can get in touch quickly.
Doing all this tedious functional maintenance yourself doesn’t necessarily make you happy. You want to empower others so they can administer their own topics and applications in an easy way. This allows for better onboarding of newcomers on the platform. And it helps the organization advance its event-driven architecture by providing insight and stimulating reuse of existing topics.
In this talk, I will talk about experiences and insights gathered running an event streaming platform for a Tier-1 bank in The Netherlands, which eventually led to the development of a noteworthy enterprise topic management solution. |
Support Tiered Storage in Strimzi operated Kafka (Lixin Yao, Vrishali Bhor, bo gao @ Apple) - Waterfall room | In the realm of real time streaming application, Kafka is commonly chosen as the datastore. This demands the use of faster and expensive storage at scale. However, the need for high-performance storage volume often translates to steep costs, hindering its wider adoption. To tackle this challenge, we have integrated the tiered storage feature introduced in KIP-405 in our Strimzi-operated Kafka. With tiered storage and our custom remote plugin, we provide a cost-effective solution allowing data to be stored in a cheaper storage, and thus enabling extended retention periods and allow efficiently backfilling of past data. This innovation allows our streaming applications to benefit without the strain of hefty storage expenses.
In this session, we'll familiarize audience on how we use Strimzi operated Kafka to build an end-to-end streaming applications while also touching the behind-the-scene aspect of how we were able to reduce the cost to make this affordable for wider adoption. We will share our experience on integrating the tiered storage feature with Strimzi Kafka operator and Strimzi operated Kafka clusters, and shared our proposal to support tiered storage feature in Strimzi natively. Moreover, we will share our journey of optimizing performance for our remote storage manager implementation and the valuable insights gained along the way. |