What Does It Cost to Run an AI Agent? (And How Does It Work?)

A friend messaged me after reading my last post about Flint. His mission: build himself a Flint. His first question: “Explain why it’s costing you $1k/month to run Flint.”

Fair question. And one I keep getting from people who know what we’re doing with AI agents at Paid Memberships Pro. So let me break it down.

What Flint Actually Does All Day

First, context. Flint isn’t a chatbot I talk to occasionally. He’s a chatbot I talk to constantly. He’s embedded in our team’s workflow. Fifteen people on our PMPro team interact with him daily through Slack. On a typical day, that’s about 50 message threads covering:

  • Code review — He reviews select pull requests on our GitHub repos, catches bugs, suggests improvements, and drafts fixes.
  • Marketing — He audits blog post titles, builds interlinking recommendations, drafts content, and does research for blog posts.
  • Support — He generates summaries of escalated support tickets to help the team coordinate. Then he will answer follow up questions in a Slack thread.
  • Research — “Watch this YouTube video and tell me what we should apply to our business.” Real analysis, not summaries.
  • Development — He writes code, ships features, runs overnight autonomous work sessions, and manages infrastructure.

This is all worth much more than $1k/month. We have paid outside agencies much more than this for help with research, marketing, dev, and support.

Two Ways I Use Claude (and Why They Cost Different Amounts)

Here’s where it gets technical — and where the cost question actually lives.

1. Claude Code (the dev tool)

When I’m working with Flint — pair programming, building features, debugging together in real time — that mostly goes through Claude Code, Anthropic’s developer tool. We have a business plan with Anthropic and pay about $100/mo for 5 or so of our team to use the Max plans with Claude Code. I’m doing quite a bit of work with Claude Code and rarely hit the usage limits. It will depend on how much you are automating and if you are doing a lot of vision/image type stuff.

One thing people don’t realize: using a Claude Code account is subsidized by Anthropic right now. If you pay $20/month and use it well, that actually costs Anthropic something like $200/month to serve you. For $100/mo, we’re really using something like $1-$2k/mo (or more) in tokens.

Why does Anthropic do this? A couple of reasons I think. Primarily they are getting the training data. They spent tons of money to have developers bootstrap the learning for the early versions of Claude Code. Now they have millions of professional programmers using Claude Code all day to build real world things and their models are going to be able to use that data to get even better at coding.

After that, Anthropic and all the other leading model companies expect to see 10x improvements in price performance over the next 1-2 years, which means future tokens should cost them less. Or they could always raise prices on us later once we’re addicted to the flow.

Again $100 per month sounds a lot compared to other SAAS tools we buy for our employees. But compared to what we pay for employee salaries or contractor fees, it’s not much at all… even if we were paying full price for those tokens.

2. The API (Flint’s autonomous runtime)

So Claude Code is $100/mo, why am I also spending $50/day in API costs on top of that?

When Flint responds in Slack, runs a skill, does a code review, or handles any of those 50 daily threads — that’s a completely separate call stack. I built a custom harness that calls the Anthropic API directly. There’s no Claude Code in that loop. It’s just code talking to an API endpoint, streaming tokens back, running tools.

I built it this way because it gives me full control: custom tool definitions, fine-grained token limits, abort conditions, per-user permission checks. When 15 people are talking to your agent all day, you need guardrails that Claude Code’s interactive mode doesn’t provide.

The tradeoff? API pricing. Every Slack message, every skill run, every memory recall costs real API credits. No subscription discount. No subsidy.

The $1k/Month Breakdown

We’re currently running about $50/day in API costs, trending upward. Here’s the thing about optimization: every time we make Flint more efficient, we find more things for him to do. The bill doesn’t go down — the value goes up.

At roughly 50 threads per day, that’s about $1 per message thread on average. Some threads are cheap (a quick lookup, a memory recall). Some are expensive (a full code review with codebase search, or a skill that runs for several minutes doing deep analysis).

For context: that $1/thread is replacing work we’d otherwise pay third parties significantly more for. A $5k/month marketing agency, a support contractor at $25/hour, a freelance developer at $100+/hour for code review. The ROI math is straightforward.

How I Could Make It Cheaper

There are real options here, each with tradeoffs.

Option 1: Route API calls through Claude Code’s pipe mode

Claude Code has a headless mode (claude -p) that takes a prompt and returns output. In theory, I could route Flint’s API calls through this instead, and they’d count against my Claude Code subscription instead of API credits. Since the Max plan is effectively unlimited, this would dramatically reduce the bill.

The problem is security and isolation. When you invoke claude -p, it loads your full Claude Code environment by default — your project files, settings, tool permissions. That means Flint running a Slack skill would inherit the same permissions I have when I’m actively building. The guardrails I’ve built to prevent Flint from running too long, spending too many tokens, or doing things only admins should do — all of that gets complicated.

It’s solvable. You can pass isolated settings files, use --no-config, scope the tool permissions down. But it requires care, and I haven’t finished working it out.

Option 2: Run cheaper models where it doesn’t matter

Not every call needs Claude Sonnet. A lot of what Flint does is relatively mechanical: recall a memory, classify a message, do a simple lookup. Anthropic’s Haiku model handles that at a fraction of the cost.

I’ve also been experimenting with routing through other providers — models that are 4-5x cheaper per token. The tradeoff is personality and reliability. The responses are correct, but they don’t quite sound like Flint. And different models handle tool calls slightly differently, so a format that Claude parses perfectly might produce malformed JSON from another provider. You have to test every code path.

One model I’ve been using a lot is the GLM-5 model from Z.ai. It’s very good and balanced. It costs less than Haiku and sits somewhere between Sonnet and Opus in terms of creativity and coding quality. It’s an awesome no-nonsense model that just gets work done. It’s my leading candidate now to replace aspects of things to save money. I assume there will be other, less expensive open models I can swap in to replace aspects of Flint’s system.

Option 3: Be smarter about what needs AI at all

Some of what Flint does could be a simple database query or a grep command. Not every task needs an LLM. The wins from pushing more work into deterministic code paths (search indexes, cached responses, pre-computed results) are real and have zero marginal cost.

What About Everyone Else?

I know a lot of people want to build their own version of this. The honest truth: if you’re not a programmer or hardcore DIY web tinkerer, I think you might have trouble running things in Claude Code. Although, I am hearing stories of folks who never coded getting great things done with it. Still, it might be the case that using Claude’s desktop app “Cowork” and waiting for them to implement some of the more agential features could be the way to go.

In the meantime, if you are technical, I wrote a gist for getting started with Claude Code identity, soul, and memory. Point Claude Code at it and you’ll get a lot of the benefit of “Flint-type” behavior without the full infrastructure.

I’m currently trying to launch a second agent for our other business, LifterLMS. If I can nail that down and manage two well, the next step is figuring out how to launch and manage a bunch — maybe even as a service for other businesses.

So if you are interested in that, subscribe here to stay up to date on what I’m doing with these AI agents.