ABSTRACT OF THE TALK
ETL/ELT on Kubernetes is currently an unsolved problem. There are a lot of different approaches vying for a spot as the de facto method, but none are clear winners. Considering that the cloud-native landscape is built for deploying Dockerized, open-source software, many of the closed-source solutions fall flat and don't mesh with the trajectory of the community.
Airbyte is an open-source ETL/ELT tool that harmonizes well with the cloud-native landscape and lives to enable your stateful workloads on Kubernetes. Previously, I have talked about a theoretical deployment on Kubernetes and the nuances behind deploying an ETL/ELT pipeline in such an environment. Now, I'm looking to follow that up with how we actually implemented that strategy as we launched our K8s beta. Additionally, I'll dive into some of the nitty gritty details that we needed to figure out in order to get this all working... stuff that isn't really found online!
Overall, this will be a really unique case of getting to do a retrospective on what we planned our architecture to look like and following up with some great development insights as we solidified the final implementation.
KEY TAKE-AWAYS FROM THE TALK
- Quick overview of Airbyte and open-source ETL/ELT [5 minutes]
- Why run your ETL/ELT in K8s? [3 minutes]
- A quick recap on the previous talk (what we thought the architecture would look like) [5 minutes]
- Display the actual architecture and implementation [10 minutes]
-> Talk about how to communicate with k8s pods on STDOUT and STDIN pipes
-> Describe parent-child process termination strategy
-> Describe persistence layer/strategy and config storage
- Quick demo of an Airbyte deployment on K8s [10 minutes]