Vinay Jayanna

Field Guides

Each guide is a complete treatment of a production ML systems topic — not a survey, not a tutorial, but a decision framework you can apply directly to real deployments.

Field Guide · v1.0

Sizing LLM Inference Systems at Scale

A complete framework for GPU capacity planning: workload characterization, memory budgeting, roofline analysis, quantization, parallelism, batching, KV cache optimization, and a 13-step sizing algorithm. Written for engineers who need to produce a GPU count and cost estimate they can stand behind.

📄 107 pages◎ 9 chapters👤 Staff / Principal ML Engineers

Read the guide →

Field Guide · In Progress

Agentic Systems in Production

Production architecture for multi-agent LLM systems: orchestration patterns, tool reliability, memory and state management, latency budgets, failure modes, observability, and cost control. The guide that bridges research prototypes and production deployments.

📄 100+ pages◎ 10 chapters👤 Staff / Principal ML EngineersIn Progress

Writing

Shorter technical pieces on LinkedIn and Substack.

TensorRT-LLM vs. vLLM: A Production Comparison

LinkedIn · 2024

Ray Serve for Production LLM Serving

LinkedIn · 2024

LLM Inference &ML Infrastructure

Field Guides

Writing

LLM Inference &
ML Infrastructure