Kubeflow Virtual Planning Symposium 2025

Name: Kubeflow Virtual Planning Symposium 2025
Start: 2025-07-09T11:00:00-04:00
End: 2025-07-09T14:15:00-04:00

Jul 9, 3:00 – 6:15 PM (UTC)

Virtual Project Events (Hosted by CNCF)

Registration for this event is closed.
Please contact the event organizer for assistance.

About this event

This event unites the Kubeflow Leadership, Working Group Leads, Project Leads, and Contributors to define the strategic direction of the Kubeflow Project. It includes updates from Working Groups and Teams, roadmaps, marketing and outreach strategies, and a call to action for increased participation. Join us to learn how Kubeflow community drive GenAI and MLOps/LLMOps innovations in cloud native ecosystem.
Kubeflow 2025 VIRTUAL topics include:
Updates from Working Groups and Teams, roadmaps, marketing and outreach strategies, and a call to action for increased participation.
Kubeflow 101
GenAI
MLOps/LLMOps innovations
New Sub-Projects (Spark Operator, Model Registry)
Proposed New Sub-Projects Projects (Feast, Arrow Cache)
Working Group Updates
Curated Talks
Project Growth + Community Outreach

When

Wednesday, July 9, 2025
3:00 PM – 6:15 PM (UTC)

Agenda

3:00 PM	Opening Remarks - Chase Christensen, Kubeflow Outreach Chair	Join us as Chase Christensen, one of the Chairs of the Kubeflow Outreach Committee, opens the day with a warm welcome from the Kubeflow community. He’ll introduce Kubeflow, how to get involved, where to share your ideas, and how to connect with the community. We’ll also walk through the day’s schedule and highlight the exciting talks and discussions ahead.
3:05 PM	Bringing Kubeflow Training Local: SDK-Driven “Local-exec” Mode - Saad Zaher & Anna Kramar & Eoin Fennessy, OpenShift AI	Kubeflow’s Python SDK makes it easy to define and submit training jobs on remote clusters—but developing, debugging, and iterating on your training code still often requires a full Kubernetes round-trip. In this session you’ll learn how the SDK’s new local_exec execution mode lets you run your training job locally on your machine before submitting it to kubernetes.
3:20 PM	Transition Time	Take five to grab a snack, refill your coffee, or send that quick email while our next speaker gets set up.
3:25 PM	Inferencing LLMs in production with Kubernetes and KubeFlow - Chamod Perera, Circles & Suresh Peiris, Articom.io	Large Language Models (LLMs) are powerful but deploying them reliably, cost-effectively, and at scale in production is a different challenge altogether. In this session, we’ll walk through how to operationalize LLM inference using Kubeflow on Kubernetes, leveraging open-source and cloud-native tools to build resilient, scalable, and observable GenAI infrastructure.
3:40 PM	Transition Time	Take a few minutes to stretch, refuel, or catch up on messages while we get ready for the next session.
3:45 PM	Streamline LLM Fine-Tuning on Kubernetes with Kubeflow LLM Trainer - Shao Wang, Kubeflow Maintainer	Fine-tuning LLMs on Kubernetes is challenging for data scientists due to the complex Kubernetes configurations, diverse fine-tuning techniques, and different distributed strategies like data and model-parallelism. It’s crucial to hide the complex infrastructure configurations from users, and allow them to gracefully shift among diverse models, datasets, fine-tuning techniques and distributed strategies. This talk will introduce Kubeflow LLM Trainer, a tool that leverages pre-configured blueprints and flexible configuration overrides to streamline the LLM fine-tuning lifecycle on Kubernetes. Shao Wang ( Kubeflow WG Training/AutoML) will demonstrate how Kubeflow LLM Trainer integrates with multiple fine-tuning techniques and distributed strategies, while offering a simple yet flexible Python API. Attendees will see how LLMs can be fine-tuned on Kubernetes with just a single line of code, highlighting how the Kubeflow LLM Trainer streamlines, simplifies, and scales LLM fine-tuning on Kubernetes.
4:15 PM	Break	We’re taking a 15-minute break! Use this time to grab a bite, take a walk, or recharge before we jump back into the next session.
4:30 PM	Kubeflow for enabling AI Powered Drug Discovery and development in AstraZeneca - Shrinidhi Venkataraman & Nithin R, AstraZeneca	AstraZeneca’s robust AI platform, Azimuth—their first enterprise cloud-native machine learning platform—relies heavily on Kubeflow to power scalable and efficient AI workflows. In this session, we’ll explore how Kubeflow supports diverse AI use cases, with each project operating in its own dedicated namespace and persistent volumes ensuring durable data storage. We’ll cover how to enable cross-namespace volume access, build custom Kubeflow notebook images with VS Code and other editors, and use self-hosted GitHub runners to trigger pipelines. You’ll also see how integrations with tools like Grafana, ArgoCD, and Argo Workflows enhance the platform’s functionality. To maintain security and compliance, custom image governance is enforced through Kyverno policies. Finally, we’ll introduce the GreenOps framework—a set of practices focused on building sustainable AI solutions. Join us for an in-depth look at how Kubeflow powers enterprise-scale AI at AstraZeneca.
5:00 PM	Transition Time	We’ll get started in just a few minutes—take five to stretch, top off your drink, or get settled before the next session begins.
5:05 PM	Spark Operator - Feature Engineering with Spark on Kubeflow - Vikas Saxena, RAICS.AI	Real-world ML rarely deals with clean tables—more often, it involves messy inputs like PDFs, scanned documents, images, ZIP files, and data from enterprise warehouses. In this session, we’ll explore how to transform that diverse data into model-ready features using Apache Spark with the Kubeflow Spark Operator, all orchestrated through Kubeflow Pipelines. We’ll walk through how this approach bridges a previous gap in Kubeflow: extracting actionable insights from massive volumes of raw data—hundreds of terabytes—using fully open-source tools and technologies. Target Audience: Data and ML engineers with basic Spark or Kubernetes experience.
5:35 PM	Transition Time	We’re taking five! Step away for a moment while we prepare for the upcoming session.
5:40 PM	Simplifying Generative AI Model Training on Kubernetes using Helm Charts - Ajay Vohra & Omri Shiv, AWS	Training generative AI models on Kubernetes offers a wide range of frameworks, tools, and orchestration options. While this diversity fuels innovation, it also introduces significant complexity. In this talk, we present a Helm-based approach that simplifies AI model training using Kubeflow Training Operators. This method abstracts much of the underlying complexity while preserving flexibility in choosing training technologies. Our solution is accelerator-agnostic and provides a consistent YAML interface across various training frameworks. We’ll also introduce a new Kubeflow Pipeline component that enables the construction of complex, end-to-end training workflows using Helm charts. Through real-world examples, we’ll showcase training pipelines using Accelerate, Ray Train + Lightning, and NVIDIA’s NeMo-Megatron libraries. We’ll also demonstrate automatic scaling of accelerator infrastructure using Karpenter.
6:10 PM	Closing Remarks - Valentina Rodriguez Sosa, Red Hat	Join Valentina Rodriguez Sosa, one of the Chairs of the Kubeflow Outreach Committee and Principal Architect at Red Hat, as she closes out the day and shares a heartfelt farewell—for now. She’ll highlight upcoming events, calls to action, and ways you can stay involved in the Kubeflow community.

Moderators

Chase Christensen

Kubeflow

Outreach Chair

Valentina Rodriguez Sosa

Red Hat

Principal Architect and Kubeflow Outreach Chair

Andrey Velichkevich

Apple

Senior Software Engineer

Speakers

Shao Wang

Kubeflow

Maintainer, Kubeflow

Eoin Fennessy

Red Hat

Software Engineer - OpenShift AI

Vikas Saxena

RAICS.AI

ML Enthusiast and Open Source Champion

Saad Zaher

Red Hat

Principal Software Engineer - OpenShift AI

Anna Kramar

Red Hat

Associate Software Engineer - OpenShift AI

Ajay Vohra

Amazon

AWS, Principal Generative AI Applied Engineer

Omri Shiv

AWS

Open Source Machine Learning Enginee

Shrinidhi Venkataraman

AstraZeneca India Private Limited

Kuberentes and Cloud Native Associate

Nithin R

AstraZeneca

AI Platform Engineer

Chamod Perera

Circles

Software Engineer II | CNCF Ambassador

Suresh Peiris

Inforwaves and Articom.io

Co-Founder - Articom.io and Inforwaves | Organizer - GDG Sri Lanka

Connect with Us on Social!

Linked In

BlueSky

Contact Us:

kubeflow-discuss@googlegroups.com

Kubeflow Virtual Planning Symposium 2025

Registration for this event is closed.Please contact the event organizer for assistance.

About this event

When

Agenda

Moderators

Chase Christensen

Valentina Rodriguez Sosa

Andrey Velichkevich

Speakers

Shao Wang

Eoin Fennessy

Vikas Saxena

Saad Zaher

Anna Kramar

Ajay Vohra

Omri Shiv

Shrinidhi Venkataraman

Nithin R

Chamod Perera

Suresh Peiris

Registration for this event is closed.
Please contact the event organizer for assistance.