Navigating the Processing Unit Landscape in Kubernetes for AI Use Cases

Pitch

Explain the difference between CPU, GPU and TPU for running LLM and ML workloads. Why people should choose one over the other and where it makes sense to use which one.

Description

With the emergence of LLMs (Large Language Models) and other Machine Learning (ML) workloads running on Kubernetes, gone are the days when just CPU is enough. Machine Learning and Artificial Intelligence workloads are best served by specialized processing units. While CPUs are great at doing work sequentially, Artificial Intelligence and Machine Learning require a different approach to processing information - a highly parallel one. In Kubernetes, that means GPUs (Graphical Processing Units) and TPUs (Tensor Processing Units). This talk gives you an introduction of what each type of processing unit is, what they are good at, and how to use them well in Kubernetes.