How it's built

How Ezra is built

Ezra is a Slack-native AI agent that connects to affiliate platforms via their APIs. Here's how the architecture works — the stack, the design decisions, and the hard problems we're solving.

The stack

Slack

Primary interface. Bot API for DMs and channels, Block Kit for approval cards and interactive elements. OAuth for workspace install. Ezra lives where your team already works — no new tab, no new login.

Fly.io

Python FastAPI backend. Handles Slack webhooks, LLM routing, and platform API calls. Deploys globally with low latency and a clean deploy story.

Anthropic Claude

The reasoning engine. Sonnet for most operations, Haiku for classification, Opus for complex analysis. Per-intent model routing to balance cost and quality — not every question deserves the most expensive model.

Platform adapters

Abstraction layer over Impact.com, Everflow, Tune, and Trcker APIs. Read and suggest for all platforms. Each adapter normalizes partner, conversion, and performance data into a common schema so Ezra can reason about your program regardless of which platform you're on.

Postgres

Workspace configs, conversation history, API credentials (encrypted), and a full audit trail. Boring and good.

The manager-in-the-loop architecture

Every action flows through suggest → approve → execute. This isn't just a UX choice — it's a technical constraint baked into the adapter layer. Write operations require an explicit approval token from the Slack interaction payload. There's no code path that bypasses it.

Ezra will pull data, analyze trends, and draft recommendations on its own. But changing a commission rate, pausing a partner, or sending an outreach message? That requires you to tap "Approve" in Slack first. Every time.

The hard problems

Platform API inconsistency

Impact, Everflow, Tune, and Trcker all model affiliates, conversions, and commissions differently. The normalization layer is where most of the complexity lives. Same concept, four different schemas, four different auth flows, four different rate limits.

LLM routing

Not every question needs the most expensive model. Ezra classifies intent first, then routes to Haiku (quick lookups), Sonnet (analysis), or Opus (complex reasoning). Misrouting wastes money or produces bad answers. Getting the classifier right is an ongoing calibration problem.

Morning briefings

Pulling overnight data across timezones, computing meaningful deltas against 7-day averages, and presenting it in a scannable Slack message at exactly 8am local time. Sounds simple. Isn't.

Conversation memory

Ezra needs to remember your program's patterns — "this partner always spikes on Fridays" — without retaining data longer than necessary. Memory is auditable and deletable on demand.

Security and privacy

API credentials encrypted at rest (AES-256)
Claude API in no-training mode — your data never trains a model
Slack's permissions model respected — Ezra only accesses channels it's invited to
"Delete my data" wipes credentials, conversation history, and cached data within 24 hours
Full audit trail for every action Ezra suggests and every approval you give