2 Talks: Accelerated ML using G/TPUs and a talk from Aptomi

Google Building MAT1 - 1184 North Mathilda Avenue Sunnyvale - View Map Mountain View
Thu, Apr 19, 2018, 6:00 PM (PDT)

About this event

Hi Kubernauts! Welcome to our 4th edition in 2018 of Bay Area Kubernetes!

Google - MAT1 Building

1184 N. Mathilda Avenue

Sunnyvale, CA

Parking around the building after hours

FOOD @ 6pm

Sliders, Veggie Burgers, Samosas, and more!


TITLE: Accelerated Machine Learning using GPUs/TPUs and Kubernetes
SPEAKERS: Yang Guo, Software Engineer; Vishnu Kannan, Senior Software Engineer; Google

Machine learning (ML) is a diverse workloads with compute requirements that vary across the spectrum. Kubernetes provides flexible APIs that allow for managing ML workloads at scale. This talk will explore the variety of hardware options that are currently supported by Kubernetes along with their performance characteristics based on commonly used ML models. Support for Google’s ultra performance Tensor Process Units (TPUs) will be covered as well. TPU is an all-new ML accelerator that powers various Google products, including Google Search and Google Translate. The presenters will then share some of the limitations of Kubernetes itself and suggest best practices. Workload portability is one of the key strengths of Kubernetes and this talk will highlight that for machine learning workloads moving not just across clouds, but also across hardware.

Yang Guo is a Software Engineer at Google. Prior to working on the Kubernetes and Google Kubernetes Engine (GKE), he worked on RPC infrastructure and security. He received a MS in Computer Science from University of Southern California.

Vishnu Kannan is a Senior Software Engineer at Google. Vishnu received his Masters in ECE from Georgia Tech. He has been a systems engineer ever since he graduated. He hacked on the Linux Kernel for a couple of years at Cisco. He then worked on Borg at Google. He is currently focused on Open Source Containers, spending most of his time on Kubernetes.


TITLE: Aptomi - application delivery engine for Kubernetes
SPEAKERS: Roman Alekseenkov, Aptomi; Andres Vega, Cisco
PROJECT: https://github.com/Aptomi/aptomi/

Aptomi (https://github.com/Aptomi/aptomi/) is an open-source project that simplifies roll-out and operation of container-based applications on Kubernetes. It introduces a service-centric abstraction, allowing dev teams to compose applications from multiple components connected together (components can be packaged via Helm, k8s YAMLs, ksonnet, or defined in any other Kubernetes-friendly way)

Once a service is defined, you can run it across multiple envs (dev, stage, prod), k8s clusters, as well as control its lifecycle and updates. Service owners can very efficiently manage multiple instances of their service, without having to deal with multiple copies of YAMLs and lower-level component configuration.

Aptomi also provides provides contextualized visibility for application owners, allowing to visualize dependencies and impact of changes. When you have hundreds and thousands of containers running on k8s, it may be difficult to understand which applications they belong to, who owns them, why they were created, what is no longer in use, and the impact of changes. Aptomi solves that problem with its UI for contextualized visibility.

Roman Alekseenkov is a former VP Engineering at Mirantis, who ran cloud infrastructure engineering, enabling Mirantis’ customers to run VMs on OpenStack and containers on Kubernetes. Before that, he was building a number of products for various customers - from networking (Cisco) to social media analytics (Attensity). ACM ICPC, TopCoder and Google Code Jam finalist. MS in Applied Math & Computer Science.

Andres Vega is engineering product manager at Cisco primarily focused on the intersection of cloud and data center infrastructure with open source projects such as the Linux kernel, Linux containers, and container orchestration frameworks like Kubernetes, seeking to drive innovation in open, secure, and programmable infrastructure to contribute to the solution of complex distributed system problems.


Thursday, Apr 19
6:00 PM - 8:30 PM (PDT)


Google Building MAT1
1184 North Mathilda Avenue Sunnyvale