About
I'm Dmytro Chaban. I run LLMx out of Berlin.
About a decade of software, the last few years almost entirely on LLM systems — agent pipelines, MCP integration, prompt and skill tooling, evals, and the failure modes you only see in production. Most of what's on this site comes out of work I do for paying clients, then turn into a benchmark, a tool, or a writeup once the lessons stop being specific to one codebase.
What I've built that you can use
- LLM Misinformation Resistance Benchmark — 39 frontier and open-weight models tested against 32 adversarial prompts. First-party scoring, methodology published. The benchmark that decided most of the model recommendations elsewhere on this site.
- Prompt Studio — desktop IDE for prompt engineering with version control, matrix model comparison, agent testing, and BYOK. Local-first; GDPR; no vendor lock-in.
- n8n Workflow Diff — visual diff for n8n workflows. Free, in-browser, handles AI/LangChain node types n8n's enterprise diff doesn't.
- The blog — pillar posts on frontier coding LLMs, open-weight options, the cheapest-tier-by-band breakdown, AI browsers, and other things I've benchmarked.
What I do for clients
Most of my engagements look like one of these:
- Agent pipelines that survive production — taking a prototype that works in a clean demo and making it actually finish a 5-step ticket-to-PR run when the inputs are messy. The misinformation benchmark exists because client work kept surfacing the same kind of silent failure.
- MCP integration — building custom MCP servers, connecting agents to internal tools and data, designing the context layer.
- Prompt and skill infrastructure — turning ad-hoc prompts into versioned, testable, team-reviewable artifacts. Prompt Studio came out of needing this enough times.
- n8n / make.com / agent orchestration — automation workflows where the AI part actually has to be reliable, not a demo.
If any of that is your problem too, get in touch.
Stack
Python and TypeScript mostly. FastAPI / Next.js / React on the application side. Anthropic Claude / OpenAI / Google as the model providers most of the time, with DeepSeek, Kimi, and GLM in the budget tier. Vector stores: pgvector, Pinecone, Weaviate, Chroma depending on the constraint. Deploy on GCP and Cloudflare.
Contact
- Email: dmytro@llmx.tech
- LinkedIn: linkedin.com/in/dmytro--ch
- GitHub: github.com/dmi3coder (personal) · github.com/llmx-tech (LLMx org)
Based in Berlin, Germany. Working across timezones. Open to consulting, technical due-diligence calls, and benchmark collaborations.