Qwen 3.6 27B is the sweet spot for local development
Qwen 3.6 27B is finally a smart model we can use for coding on Macbook or Nvidia RTX - with llama.cpp and OpenCode.
production-ready through
Independent evaluation and training for the AI agent ecosystem. Real-world complexity through simulation environments where agents face multi-hour tasks.
Talk to FounderLarge-scale RL datasets with tuned difficulty distributions. Cheat-proof reward functions. Teach skills scarce in public data (e.g. dependency hell, distributed system debugging).
Measure quality and uncover blind spots. Pick optimal models, tune prompts in a fast-changing world. Benchmark against competitors. Win deals and deliver on performance promises.
Independent verification of what actually works. Design processes based on real capabilities, not marketing hype. ROI-driven deployment decisions. Move from FOMO to measurable P&L impact.
Explore our research on AI agents, benchmarking, and evaluation
Qwen 3.6 27B is finally a smart model we can use for coding on Macbook or Nvidia RTX - with llama.cpp and OpenCode.
An independent audit of agentic scaffolding and harnesses. We analyze how agent workflows, codebase documentation, and test verification impact performance compared to raw base models like GPT-5.4, Gemini 3.1 Pro, and Claude Code.
Decompiling the classic puzzle game with lasers Chromatron from WinXP and PowerPC executables into Rust. A pixel-perfect porting using Claude Code, Opus 4.6, Cursor, GPT-5.2-Codex, and Ghidra.