About

I'm Dmytro Chaban. I run LLMx out of Berlin.

About a decade of software, the last few years almost entirely on LLM systems — agent pipelines, MCP integration, prompt and skill tooling, evals, and the failure modes you only see in production. Most of what's on this site comes out of work I do for paying clients, then turn into a benchmark, a tool, or a writeup once the lessons stop being specific to one codebase.

What I've built that you can use

  • LLM Misinformation Resistance Benchmark — 39 frontier and open-weight models tested against 32 adversarial prompts. First-party scoring, methodology published. The benchmark that decided most of the model recommendations elsewhere on this site.
  • Prompt Studio — desktop IDE for prompt engineering with version control, matrix model comparison, agent testing, and BYOK. Local-first; GDPR; no vendor lock-in.
  • n8n Workflow Diff — visual diff for n8n workflows. Free, in-browser, handles AI/LangChain node types n8n's enterprise diff doesn't.
  • The blog — pillar posts on frontier coding LLMs, open-weight options, the cheapest-tier-by-band breakdown, AI browsers, and other things I've benchmarked.

What I do for clients

Most of my engagements look like one of these:

  • Agent pipelines that survive production — taking a prototype that works in a clean demo and making it actually finish a 5-step ticket-to-PR run when the inputs are messy. The misinformation benchmark exists because client work kept surfacing the same kind of silent failure.
  • MCP integration — building custom MCP servers, connecting agents to internal tools and data, designing the context layer.
  • Prompt and skill infrastructure — turning ad-hoc prompts into versioned, testable, team-reviewable artifacts. Prompt Studio came out of needing this enough times.
  • n8n / make.com / agent orchestration — automation workflows where the AI part actually has to be reliable, not a demo.

If any of that is your problem too, get in touch.

Stack

Python and TypeScript mostly. FastAPI / Next.js / React on the application side. Anthropic Claude / OpenAI / Google as the model providers most of the time, with DeepSeek, Kimi, and GLM in the budget tier. Vector stores: pgvector, Pinecone, Weaviate, Chroma depending on the constraint. Deploy on GCP and Cloudflare.

Contact

Based in Berlin, Germany. Working across timezones. Open to consulting, technical due-diligence calls, and benchmark collaborations.