Amol Umbarkar

Amol Umbarkar is an AI infrastructure engineer specializing in optimizing large language model (LLM) training and inference at scale. He currently works on the TIR AI Platform at E2E Networks, where he has been instrumental in architecting and developing the platform from its early days, focusing on scaling model training and improving inference performance across large GPU clusters. With deep expertise in technologies such as NeMo, PyTorch, vLLM, Lightning, and distributed systems using Slurm and NCCL, Amol has built and optimized systems designed for high-performance AI workloads.

Over the years, he has often taken on the role of a “day-0 engineer,” writing the first lines of code for several impactful products, including the TIR AI platform, the enterprise version of SigNoz, and hyperML, an open-source framework for running AI on Kubernetes. Previously, he contributed to building enterprise features at SigNoz and led engineering initiatives across cloud infrastructure and product development. Amol actively shares his thoughts on engineering and AI systems through his writing platform, mindhash.xyz, where he discusses real-world lessons from building and scaling modern AI infrastructure

All Sessions by Amol Umbarkar

Day 2
11:20

AI Infra at Scale: Inside High-Throughput, Low Latency LLM Performance

11:20 - 12:10

As enterprises rapidly deploy large language models into real-world applications, achieving high throughput and low latency has become a critical requirement for modern AI infrastructure. This session explores what it takes to run LLMs at scale—from optimizing model serving pipelines and distributed compute to efficient workload scheduling and inference acceleration. We’ll take a closer look at how advanced GPU architectures and AI platforms from NVIDIA enable faster processing, reduced response times, and consistent performance even under heavy demand. Attendees will gain insights into practical strategies for designing scalable AI systems, balancing cost and performance, and building infrastructure that can support next-generation generative AI workloads in production environments

HALL 3 - Tech Talks

Where

When

Amol Umbarkar

All Sessions by Amol Umbarkar

Day 2

AI Infra at Scale: Inside High-Throughput, Low Latency LLM Performance

Past Editions

OUR CONFERENCES

Collaborate

OUR BRANDS

AIM Media House

Share