Praxis is legal operations software for modern law firms.

Praxis is built for modern law firms that want one place to bring in clients, run matters, and grow their firm.

What pages are included?

Praxis includes the essentials for legal operations with clear calls to action like Get early access.

Blog

Engineering

Lessons from Our Routing Layer

Not all tasks belong to the same model. How we built a dispatcher that picks the right one.

Emily Osei

Parley

Senior Technical Writer

Modern AI systems rarely rely on a single model anymore. In production environments, the real constraint is not intelligence — it is efficiency: cost, latency, reliability, and predictability. This is why most serious systems evolve toward multi-model architectures.

We learned this the hard way while building a routing layer that dynamically assigns tasks across a fleet of models. The intuition was simple: small models should handle simple tasks, large models should handle complex reasoning, and everything in between should be optimized for cost-performance tradeoffs.

In practice, the system quickly became more nuanced.

Some tasks that look “hard” are actually format-constrained and are solved better by smaller models. Other tasks that look “simple” require contextual reasoning that only larger models can reliably handle. This breaks the naive assumption that complexity maps directly to model size.

Our routing layer evolved into a decision system that considers:

input complexity signals
required output structure
latency budget
historical success rates per model
cost sensitivity per request type

One of the most important realizations was that routing errors are more expensive than weak model outputs. A slightly worse answer from the right model is almost always better than a strong answer from the wrong model.

Over time, we also discovered that routing is not static. It is a learning system. We continuously re-train routing heuristics based on observed failures, drift in model behavior, and changes in cost-performance curves across providers.

Ultimately, the system stops behaving like “choosing a model” and starts behaving like “coordinating intelligence.” That shift is where most of the real gains come from.

Engineering

Building Real-Time Agent Memory

May 28, 2026

Engineering

Building Real-Time Agent Memory

May 28, 2026

Engineering

Why We Replaced Webhooks

Nov 6, 2025

Engineering

Why We Replaced Webhooks

Nov 6, 2025

Engineering

Scaling Our Agent Pipeline

Oct 9, 2025

Engineering

Scaling Our Agent Pipeline

Oct 9, 2025

Lessons from Our Routing Layer

Related posts

Building Real-Time Agent Memory

Building Real-Time Agent Memory

Why We Replaced Webhooks

Why We Replaced Webhooks

Scaling Our Agent Pipeline

Scaling Our Agent Pipeline