AI ToolingApril 30, 20266 min read

Originally published on Nicholas's Substack

Refinancing context debt

When AI companies start running out of compute, managing your own usage stops being optional. Here's the small abstraction I built and what it taught me about where AI tooling needs to go.

Refinancing Context Debt architecture diagram

If you pay for Claude, you've probably hit the wall.

You're two hours into a real project. Schemas, pipelines, debugging, architecture. The conversation has accumulated everything you and the model worked through, and the model is responding well because it has full context. Then you hit the usage limit. Six hours until reset.

You have half-finished work, a conversation full of important decisions, and no clean way to pick up where you left off when the timer runs down. You can copy and paste the chat into a new window, but that wastes more tokens. You can write a summary by hand, but that takes 15 minutes and you'll forget something. So you wait. Or you start over with less context. Either way, the next session is worse than the one you lost.

This is not a Claude problem. It is a tooling gap. And it is going to get worse.

Why this is happening now

AI companies are running out of compute. Anthropic, OpenAI, and Google have all signaled that they are capacity constrained. Rate limits are tightening across every major provider. The $20 or $100 you pay for a Pro or Max subscription no longer guarantees the throughput you assumed when you signed up.

Anyone who relies on these tools for real work has probably noticed the same thing I have: the limits feel tighter than they did six months ago. The models are smarter, but the runway per session is shorter. That tradeoff is not random. It is the consequence of demand outpacing infrastructure.

Managing your own usage is becoming a productivity skill, not an edge case.

The frustrating part is that the system already knows when you are getting close to the wall. It just does not act on that information until you slam into it. The wall is not a surprise. The lack of intervention is.

What I built

I built a small Claude skill called Claude Token Watchdog. It does one thing: it monitors conversation length and intervenes before you hit the limit.

It applies three thresholds. At fifteen messages, it surfaces a soft notice. At twenty, it strongly recommends a handoff. At twenty-five, it forces one. The thresholds drop earlier when the conversation gets technically dense. Coding sessions, debugging traces, multi-step plans with file paths and terminal output all consume more tokens than abstract discussion, so the watchdog adjusts down to roughly twelve messages in those cases.

When it triggers a handoff, you get a structured continuation brief you paste into a fresh chat. Goal, decisions made, files touched, current state, known issues, next task. Zero re-explanation. Full project context preserved. Your $20 subscription stretches further. Your $100 subscription stretches further. You finish what you started instead of waiting six hours to resume.

That is the surface description. What I actually want to talk about is the design choice underneath it, because that choice is where the real lesson lives.

The architectural decision that matters

The watchdog does not produce the continuation brief itself.

I built a separate skill, called Claude Session Handoff, months earlier. That skill already knew how to capture full session state in a structured format. When the watchdog reaches a threshold, it does not duplicate that work. It delegates.

One skill monitors the session. One skill creates the continuation brief. Together they refinance context debt: the working principles move to a fresh chat, the accumulated noise stays behind.

This is a small thing, but it is the thing I am most pleased with. It is the difference between a tool and a system. A tool does one job. A system is a set of small, well-scoped abstractions that compose. When the watchdog delegates to the handoff skill, I save lines of code. More importantly, I ensure that if the handoff format ever needs to change, I update one place. If a better summarization approach emerges later, the watchdog inherits it for free. If I want to invoke the handoff manually without the watchdog watching, I still can.

Software engineers who came up through the Unix tradition will recognize this immediately. Do one thing well. Compose. The interesting question is why this principle keeps getting forgotten in AI tooling, where every demo seems to want to be a one-shot solution rather than a piece of a larger system.

The bigger thesis

I think the next wave of AI tooling improvements will not come from bigger models.

The compute is constrained. The wins now come from small, well-scoped abstractions around the conversation itself. Token usage. Drift detection across long sessions. Cross-session memory. Clean handoffs. The model is one component. The conversation is another. Both deserve engineering attention, and right now almost all of the attention goes to the model.

There is something quietly important about treating the conversation as a system worth monitoring. It implies that the answers a long-running session produces matter less than the accumulated context that makes each answer better. If that context is fragile, every decision you make about how to use the model has to account for that fragility. If that context is portable across sessions, the cost structure of working with these tools changes meaningfully.

The watchdog is not a clever tool. It watches a counter, applies thresholds, and triggers a handoff. That is the entire mechanism. What makes it useful is that it pairs with a handoff skill that already does its job well, which means the system as a whole solves a real problem with very little engineering effort. That is the kind of payoff more people should be looking for in this space.

What this means for how I use Claude now

A few things changed for me after I shipped this.

I worry less about hitting limits, because I trust the watchdog to stop the bleeding before it starts. I structure longer projects across multiple sessions intentionally rather than trying to cram everything into one. I think about my conversations more like systems with state and less like ephemeral chats. That third change is the one that matters most. Once you start treating the conversation as engineering surface area, you start finding other small abstractions worth building around it.

There are still things I have not solved. The handoff captures decisions, files, and state, but not the implicit shorthand and trust that develop inside a long conversation. Hidden state is the next problem. I am working on it.

But the principle holds. The conversation deserves engineering. The model is not the only thing.

P.S. The watchdog and the handoff skill, plus 29 others I use in my work, live here: github.com/nicholasjh-work/claude-skillforge