As organizations move toward building and operating their own AI infrastructure, managing large-scale GPU environments becomes both an opportunity and a challenge. This session shares real-world lessons from running a production AI stack powered by a 13-GPU fleet—covering how teams design, deploy, and optimize infrastructure to support demanding AI workloads and large language models. From workload orchestration and performance tuning to cost management and system reliability, the discussion will highlight practical insights gained while operating GPU clusters in a live environment. Attendees will learn what it takes to maintain stability, scale efficiently, and ensure consistent performance when running AI systems in production