Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows

Data on Kubernetes

Jan 18, 2022, 5:00 – 6:00 PM

Virtual event

Severin Ryberg - Senior Infrastructure Architect, ACCURE Battery Intelligence

About this event

https://go.dok.community/slack

https://dok.community/

ABSTRACT OF THE TALK

Complex computational workloads in Python are a common sight these days, especially in the context of processing large and complex datasets. Battle-hardened modules such as Numpy, Pandas, and Scikit-Learn can perform low-level tasks, while tools like Dask makes it easy to parallelize these workloads across distributed computational environments. Meanwhile, Argo Workflows offers a Kubernetes-native solution to provisioning cloud resources in Kubernetes and triggering workflows on a regular schedule. Being Kubernetes-native, Argo Workflows also meshes nicely with other Kubernetes tools. This talk discusses the combination of these two worlds by showcasing a set-up for Argo-managed workflows which schedule and automatically scale-out Dask-powered data pipelines in Python.

BIO

Former academic in the field of renewable energy simulation and energy systems analysis. Currently responsible for architecting and maintaining the cloud- and data strategy at ACCURE Battery Intelligence

KEY TAKE-AWAYS FROM THE TALK

Argo Workflows + Dask is a nice combination for data-processing pipelines. There are a a few "gotchyas" to be on the look-out for, but in nevertheless this is still a generally-applicable and powerful combination.

https://github.com/sevberg

Speaker

  • Severin Ryberg

    ACCURE Battery Intelligence

    Senior Infrastructure Architect

Host

  • Bart Farrell

    Data on Kubernetes

    Community Builder

Organizers

  • Paul Au

    Constantia.io

    Community Manager

  • Melissa Logan

    Organizer

  • Diogenese Topper

    Data on Kubernetes Community

    Organizer

  • Kristi Tan

    Head of Marketing

CONTACT US