Publication

Engineering notes

Systems engineering, AI, cloud, DevOps. Writing about production infrastructure, LLMs, and observability.

Top posts

View all →

May 13, 2026 Production-grade

Data Pipelines for LLM Training and Fine-Tuning

Cleaning, deduplication, instruction formatting, tokenization choices, and dataset hygiene for supervised fine-tuning and preference tuning—emphasizing data quality as the dominant lever.

ml data fine-tuning 6 min
May 10, 2026 Advanced

Streaming LLM Systems and Token-Level Response Design

How partial decoding and streaming protocols shape UX, back-end buffering, and client rendering—without coupling to any single provider’s wire format.

streaming llm ux 6 min
May 7, 2026 Production-grade

Building Real-Time Conversational AI Systems

WebSocket and HTTP streaming architectures, session memory, cancellation, and interruption handling for low-latency chat—where network, orchestration, and model tiers all shape the experience.

real-time websockets streaming 6 min
May 4, 2026 Production-grade

API Safety Design for AI Agents

Rate limits, permissioning, tool sandboxing, and execution boundaries for agent-facing APIs—where the agent runtime is a new class of client that amplifies abuse patterns.

api-design security agents 6 min
May 2, 2026 Production-grade

Secure RAG Systems and Prompt Injection Prevention

How untrusted documents and web pages become indirect injection channels into retrieval pipelines—and how engineers harden ingest, retrieval, and tool boundaries without pretending RAG eliminates adversarial text.

rag security prompt-injection 6 min
Apr 30, 2026 Production-grade

Safety Layers in Production LLM Systems

Prompt injection defenses, output filters, policy enforcement, and sandboxing patterns that stack like defense in depth—because no single layer catches every abuse case.

llm safety security 6 min
Apr 24, 2026 Production-grade

Evaluation Frameworks for LLM Applications at Scale

Golden datasets, regression suites, LLM-as-judge patterns, and offline versus online evaluation loops—emphasizing measurement discipline over benchmark theater.

evals llm testing 6 min
Apr 19, 2026 Production-grade

LLM Observability in Production Systems

Prompt and completion logging, distributed tracing, token accounting, feedback capture, and debugging pipelines for LLMOps—balanced with privacy, retention, and cost.

llmops observability logging 6 min
Apr 17, 2026 Production-grade

Memory Systems for LLM Agents — Short-Term vs Long-Term Memory

Episodic buffers, summarization, retrieval-augmented memory, and persistence patterns for agents—separating conversation state from durable knowledge stores.

agents memory rag 6 min

Engineering notes

Top posts

Data Pipelines for LLM Training and Fine-Tuning

Streaming LLM Systems and Token-Level Response Design

Building Real-Time Conversational AI Systems

API Safety Design for AI Agents

Secure RAG Systems and Prompt Injection Prevention

Safety Layers in Production LLM Systems

Evaluation Frameworks for LLM Applications at Scale

LLM Observability in Production Systems

Memory Systems for LLM Agents — Short-Term vs Long-Term Memory