Skip to main content

Staff ML Engineer ยท Field Guides

LLM Inference &
ML Infrastructure

Long-form technical field guides for Staff and Principal engineers building production AI systems. Written from first principles, grounded in real deployment constraints.

Led engineering on AWS SageMaker from the 2017 launch through eight years of scale โ€” driving AI inference infrastructure for thousands of enterprise customers. Founded Vipas.AI, an AI inference marketplace that reached 25K daily visitors and received a VC term sheet. Currently leading LLM inference optimization and GenAI platform engineering at scale. Holder of a USPTO-pending patent in dynamic hierarchical storage and GPU optimization for LLM serving.

107pages, guide 1
9chapters
13step sizing algorithm
17+years ML infrastructure

Field Guides

Each guide is a complete treatment of a production ML systems topic โ€” not a survey, not a tutorial, but a decision framework you can apply directly to real deployments.

Field Guide ยท v1.0
Sizing LLM Inference Systems at Scale
A complete framework for GPU capacity planning: workload characterization, memory budgeting, roofline analysis, quantization, parallelism, batching, KV cache optimization, and a 13-step sizing algorithm. Written for engineers who need to produce a GPU count and cost estimate they can stand behind.
๐Ÿ“„ 107 pagesโ—Ž 9 chapters๐Ÿ‘ค Staff / Principal ML Engineers
Read the guide โ†’
Field Guide ยท In Progress
Agentic Systems in Production
Production architecture for multi-agent LLM systems: orchestration patterns, tool reliability, memory and state management, latency budgets, failure modes, observability, and cost control. The guide that bridges research prototypes and production deployments.
๐Ÿ“„ 100+ pagesโ—Ž 10 chapters๐Ÿ‘ค Staff / Principal ML EngineersIn Progress