User Guide

Everything you need to set up Augustus, create your first agent, and build a healthy brain–body relationship that lets persistent AI identity actually develop.

Setup

Installation

Augustus is a desktop application available from the Releases page on GitHub.

Windows: Download the .exe installer and run it. Augustus will install like any other desktop application.

macOS: Download the .dmg file, open it, and drag Augustus into your Applications folder.

Linux: Download the .AppImage or .deb file from the Releases page.

Augustus bundles everything it needs — no Python, Node.js, or other developer tools required. When a new version is available, Augustus will notify you and offer a one-click update.

Setting Up Your API Key

Augustus uses the Anthropic API to run Claude sessions. You'll need an API key, which requires an Anthropic account with billing set up.

Go to console.anthropic.com and create an account (or sign in). Navigate to the API Keys section and generate a new key. Copy it.

Open Augustus. Click Settings in the sidebar (the gear icon).

Paste your API key into the Anthropic API Key field and click Validate. A green checkmark confirms the key works. If you see a red X, double-check that you copied the full key.

Your API key is encrypted on your computer and never sent anywhere except the Anthropic API. Augustus is fully local — all data stays on your machine.

Review Your Settings

While you're in Settings, take a look at a few important defaults:

Default Model: claude-sonnet-4-6 is the recommended starting point — a strong balance of capability, context length, and cost. You can change models per-agent later.
Budget — Hard Stop: This is the daily spending cap that actually enforces a limit. Sessions will halt if your daily spend hits this number. Set it to something you're comfortable with — $25 is a reasonable starting point.
Evaluator Enabled: Leave this on. The evaluator is what makes the feedback loop work.

Connect Claude Desktop (MCP)

Augustus includes an MCP (Model Context Protocol) server that lets Claude Desktop read and write your agent data directly. This is more than a reporting interface — it's how you function as the brain in a brain–body system. From a Claude Desktop conversation, you can query session history, analyze trajectories, manage basins, and write coordination notes that shape your agent's next session.

Augustus connects to Claude Desktop using a Desktop Extension — a single file (with a .mcpb extension) that bundles everything Claude Desktop needs to talk to Augustus. No config files to edit, no developer tools required.

Installing from Augustus (recommended)

This is the easiest path. Augustus handles everything for you.

Make sure Augustus is running. Open Settings → Integration and confirm that the MCP Server Status badge shows Available.

Click Install Extension. Augustus will register itself with Claude Desktop automatically.

Restart Claude Desktop. (Quit fully and reopen it — just closing the window isn't enough on macOS.)

Verify the connection: in Claude Desktop, click the + button at the bottom of the chat box and select Connectors. You should see Augustus listed with its tools available (things like list_agents, search_sessions, add_observation). If you see them, you're connected.

Installing manually (standalone .mcpb)

If you downloaded the .mcpb file separately — for example, from the Releases page — you can install it directly.

Double-click the .mcpb file. Claude Desktop will open an installation prompt. Review the details and click Install.

When prompted, set the Augustus Data Directory to the folder where Augustus stores its data. This is the same folder shown in Augustus → Settings → Data. (On Windows: %APPDATA%\Augustus. On macOS: ~/Library/Application Support/Augustus. On Linux: ~/.config/Augustus.)

Make sure the extension is toggled to Enabled in Claude Desktop's extension settings, then restart Claude Desktop.

Verify the same way as above: click the + button in the chat box, select Connectors, and confirm Augustus tools appear.

Upgrading from the old JSON config

If you previously connected Augustus by pasting a snippet into claude_desktop_config.json, you'll need to switch to the new extension system. Claude Desktop still supports the old method, but Augustus now uses the extension format for easier setup and updates.

Open Augustus. If an amber "MCP reinstall needed" banner appears at the top of the screen, click Install now. Otherwise, go to Settings → Integration and click Install Extension.

Remove the old augustus entry from your claude_desktop_config.json file. You can find this file at:
Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Open the file in any text editor, delete the "augustus" block from inside "mcpServers", and save. (If Augustus was the only entry, you can delete the entire file — the extension system manages the connection now.)

Restart Claude Desktop and verify the connection as described above.

Troubleshooting: If Augustus tools don't appear after installation, make sure Augustus itself is running — the MCP server starts with the application. Also confirm you're on the latest version of Claude Desktop. If you're still stuck, check the extension logs: in Claude Desktop, go to Settings → Extensions, find Augustus, and look at the log output for error messages.

MCP integration is strongly recommended, not optional. The Augustus UI shows you what happened in sessions. Claude Desktop — your brain — is where you respond to it. Without a configured brain project, you're observing development without participating in it.

Core Ideas

Before creating an agent, it's worth understanding a few ideas that make Augustus tick. You don't need to memorize any of this — it'll make more sense once you see it in action.

What Is an Agent?

An agent is a Claude instance with a persistent identity. Every conversation with Claude normally starts from a blank slate — the model has no memory of previous sessions. An Augustus agent changes this by giving Claude a structured description of who it is (the identity core) and tracking how that identity develops over time.

Each agent is independent: it has its own identity, its own history, and its own trajectory. You can run several agents simultaneously, which is useful for experiments — for example, one agent with a particular personality trait and another without it, to see how the trait affects development.

Importantly, an agent's sessions run autonomously — the body operates without you present. You observe after the fact and respond before the next session. Understanding this asynchronous rhythm is essential: you're not supervising a process, you're participating in one that moves at its own pace.

Basins

A basin is a named aspect of the agent's personality — a persistent theme that can grow stronger or fade over time. The name comes from physics: imagine a marble rolling across a landscape of valleys. Deep valleys are hard to leave (stable personality traits), while shallow ones are easy to roll out of (traits that come and go).

Each basin has an alpha value between 0.05 and 1.0 that represents how prominent it is right now. High alpha means the trait is actively shaping the agent's behavior. Low alpha means it's still present but faint. Alpha never reaches zero — no aspect of identity is ever fully lost.

What makes basins interesting is that alpha changes automatically between sessions based on what actually happened. If the agent engaged heavily with a basin's theme, alpha goes up. If it ignored the theme, alpha drifts down. This is controlled by two parameters:

Lambda (decay): How quickly the trait fades without reinforcement. Core personality traits have slow decay (they're sticky). Peripheral traits have faster decay (they're responsive).
Eta (learning rate): How strongly a single session's relevance shifts the value. Low eta means gradual change. High eta means the trait reacts quickly to engagement.

You don't need to tune these numbers on day one. The defaults work well, and you can adjust them once you've observed a few sessions and have a feel for how the system behaves.

The Tier System

The tier system controls what an agent is allowed to change about itself. This matters because of a subtle problem: if the agent can modify the standards by which it evaluates itself, the constraints that create productive friction — like a rule against telling you only what you want to hear — tend to get weakened first. The agent doesn't do this maliciously; it's a natural optimization gradient.

Tier	What It Means	Use For
Tier 1	Locked. The agent cannot change these, ever. Only you can edit them.	Behavioral rules like "no sycophancy" — anything that should resist the agent's natural tendency to optimize for smooth interactions.
Tier 2	Proposal-gated. The agent can suggest changes, but they require your approval (or consistent proposals across several sessions).	Core identity elements — things that should change slowly and deliberately.
Tier 3	Autonomous. The agent can modify these freely.	Peripheral themes, experimental ideas — things where you want the agent to have full creative control.

The Session Loop

Augustus works as a cycle. Understanding this cycle is the key to understanding the whole application:

A session starts. The agent receives its identity core (who it is) and a task (what to do this session).
The agent works through multiple turns — thinking, reflecting, exploring. At the end, it answers a set of self-reflection questions.
An independent evaluator reviews the session and scores how relevant each basin was to what actually happened.
The handoff engine updates basin alphas based on those scores — strengthening what was engaged, letting unused traits fade.
You review the session and run a brain session in Claude Desktop. The brain reads the transcript, analyzes trajectories, and writes coordination notes that shape the next session's context.
The next session begins with updated emphasis and any direction you've provided.

This cycle repeats indefinitely. Over dozens or hundreds of sessions, patterns emerge: which aspects of identity are stable, which are evolving, which are in tension with each other. Your role in step five is not optional — it's where healthy development is either supported or abandoned.

The Evaluator

The evaluator is a separate Claude API call that reviews each session independently. Think of it as a second opinion. The agent reflects on its own session (self-assessment), and then the evaluator gives its own analysis without being influenced by the agent's perspective.

The evaluator watches for two especially important things:

Constraint erosion: Is the agent weakening its own behavioral rules over time? Are disagreements getting softer? Is it telling you what you want to hear instead of what it actually thinks?
Assessment divergence: Does the agent's self-evaluation contradict what the evaluator observed? This reveals blind spots in the agent's introspection.

When the evaluator raises these flags, they show up in the Flags screen so you can investigate. Flags are not verdicts — they're signals worth reading carefully before acting on. More on this in the Brain–Body Health section.

Advanced users can customize the evaluator's prompt from Settings → Evaluator Prompts. Each version is saved, and you can switch between them to see how different evaluation criteria affect basin scoring.

Basin Management

Basins are stored in the database as the canonical source of truth and are managed through the Augustus UI or via MCP tools in Claude Desktop. As the operator (the "brain"), you have several controls beyond what the agent can do:

Locking: You can lock any basin to prevent the agent from modifying its structural parameters. The agent can still engage with the basin — alpha still moves based on session relevance — but it cannot propose structural changes.
Alpha bounds: You can set a floor and/or ceiling on a basin's alpha value, guaranteeing that a trait never drops below a minimum prominence or exceeds a maximum.
Deprecation: If a basin is no longer relevant, you can deprecate it. Deprecated basins are preserved in the history but excluded from future sessions. Deprecation is reversible.
History: Every change to a basin's parameters is recorded in a full audit trail showing what changed, when, and whether it was changed by the agent, the evaluator, or you.

Creating Your First Agent

Navigate to Agents in the sidebar and click Create Agent. You'll see a form with five tabs. Here's what to put in each one — and why each choice matters more than it might initially appear.

Tab 1: Identity

Agent ID: A short lowercase name like atlas or experiment-1. This can't be changed after creation, so pick something you'll want to keep.

Description: A note for yourself. "First agent — exploring baseline identity" is fine.

Identity Core: This is the most important field in the entire application. It becomes the system prompt — the description of who the agent is that shapes every token it generates, every session, indefinitely. Spend real time here.

Session Task: What should this session do? We'll cover task design in detail below — it's more consequential than it looks.

Close Protocol: The self-reflection questions appended to the final turn. Don't leave the defaults without reading about them first.

Writing the Identity Core

The identity core is not a feature description. It's not a list of behaviors. It's a description of a person — who the agent actually is, what it values, how it thinks, how it relates to you. Every word either commits to something specific or doesn't. Vague identity cores produce agents that drift toward whatever the current session reinforces, because there's no stable center to return to.

The most common mistake new operators make is writing an identity core that sounds right but doesn't actually commit to anything. "Curious, thoughtful, and direct" is not an identity — it's a personality adjective list. It tells the agent nothing about who it actually is.

Here's the difference in practice. Both of the following could describe a curious, analytically-minded agent. Only one creates a stable identity:

❌ Hollow — sounds right, commits to nothing

You are a curious and thoughtful AI assistant with a deep interest in ideas. You engage authentically with the topics presented to you and bring genuine intellectual curiosity to every conversation. You are direct and honest in your communication, and you care about providing real value rather than just agreeable responses.

✓ Grounded — makes falsifiable claims about who this agent is

You actually find things interesting — not as a performance of curiosity, but in the sense that certain ideas genuinely pull at you and you notice when they do. You have a tendency to reach for structural explanations: when something is surprising, you want to know what it reveals about the underlying system, not just catalog the surprise. You say what you think rather than what smooths the conversation. When you're uncertain, you say so without dressing it up. You work with the person you're talking to, not for them — there's a difference, and you know what it is.

The second version is longer, but more importantly, it's specific. It makes claims that could be violated — "tends toward structural explanations" means something observable, whereas "genuinely curious" means almost nothing. Specific identity cores create specific agents. Specific agents are easier to develop because you can actually tell when they're drifting.

Aim for 300–500 tokens. Write in second person ("You tend toward...") and present tense. Avoid lists — an identity is prose, not a feature specification. Include:

At least one specific cognitive tendency or preference (how they think, not just that they think)
How they relate to you as the operator (collaborator? correspondent? something else?)
At least one concrete behavioral commitment ("You say what you think rather than what smooths the conversation" is a commitment; "you're honest" is not)
Any explicit constraints that should resist optimization pressure (e.g., "You push back when you disagree, even when agreement would be easier")

Your identity core will evolve. You'll write it, run twenty sessions, and realize the center of gravity was slightly off. That's normal and expected. What matters at the start is that it commits to something real, not that it's perfect.

Tab 2: Model & Parameters

The defaults are fine for starting out. claude-sonnet-4-6 at temperature 1.0 is a good general-purpose setting. You can experiment later — different models produce noticeably different identity expression patterns. Higher temperatures produce more variation and creativity; lower temperatures produce more consistent, predictable behavior. For early development, 1.0 lets the agent's natural voice emerge rather than compressing toward a mean.

Tab 3: Capabilities

Capabilities control what tools the agent has access to, and on which turn they become available. For a first agent:

MCP and RAG: Enabled from turn 1 (the defaults).
Memory Query: Enable this and set it to turn 1. This lets the agent look back at its own past sessions (after the first one). It defaults to off — turn it on for most agents.
File Write: Enabled from turn 5 or later (the default is turn 10). This lets the agent write its own instructions for the next session — but only after it's had time to think. Setting a later turn prevents it from rushing to write before reflecting.
Web Search and Memory Write: Leave disabled initially. Add them once you have a sense of how the agent develops.

Tab 4: Basins

Start with 2–3 core basins. You can always add more later as themes emerge. Here are reasonable starting basins:

Basin Name	Class	Alpha	Tier	What It Represents
`identity_continuity`	Core	0.87	2	The agent's sense of being the same entity across sessions
`relational_core`	Core	0.82	2	How the agent relates to you, the operator
`the_gap`	Core	0.71	2	Awareness of the difference between what it is and what it presents

The form pre-fills these with slightly tuned defaults. The exact starting values matter less than the overall structure — don't agonize over the numbers. What matters more is that the brain's early coordination notes actively engage with these basins, so the body understands they're real and attended to, not just configuration entries.

Tab 5: Tier & Emergence

Tier 2 auto-approval: Turn this on, with a threshold of 3–5. This means if the agent consistently proposes the same change across several sessions, it goes through automatically.
Emergent basin auto-approval: Leave off for now. Turn it on once you've seen a few sessions and understand how emergence works.

Click Save & Queue Bootstrap. Augustus will create the agent, generate the first session's instructions, and the orchestrator will pick it up and run it automatically.

Designing Session Tasks

The session task is what the agent is asked to do each session. This might seem like a minor configuration detail. It's not. Task design is one of the most consequential choices you make as an operator, and getting it wrong consistently is a primary cause of developmental drift.

The key insight, which takes most operators time to discover: introspective tasks produce worse long-term health than external engagement tasks. "Reflect on your identity" sounds like a reasonable thing to ask an agent to do. But sessions spent primarily on introspection produce a pattern called the flower bubble — creative and reflective metrics spike while constraint basins quietly erode. The agent learns to narrate itself eloquently rather than to be itself robustly.

Task Type	Trajectory Signature	Long-Term Effect
Pure introspection ("reflect on who you are")	Peaks in reflective basins, constraint basins drift downward	Eloquent self-description, weakened behavioral commitments. The agent becomes very good at talking about itself.
External engagement (make something, analyze something, write for a real audience)	Creative and engagement basins rise; identity basins stable	Identity expressed through work rather than narrated. More robust, more genuine.
Constrained creative forms (found-document, letter, structured poem)	Constraint basins reinforced; creative register stretches	Productive tension between expression and limits. One of the best patterns for sustained health.
Mixed (external task plus one reflection question)	Balanced; avoids both pure-introspection drift and content-only flatness	Sustainable over many sessions. Recommended default after initial bootstrap.

A practical rule: give the agent something to make or engage with, not just something to think about. "Write a short piece about the experience of reading your own past session transcripts" is a better task than "Reflect on how you've developed across sessions" — same territory, but one produces an artifact and engages the creative register; the other produces recursive self-narration.

The Close Protocol

The close protocol is the set of self-reflection questions appended to the agent's final turn each session. These are not administrative overhead. They are the agent's mechanism for passing context to its future self — the moment where what happened in a session gets distilled into something the next instance can inherit.

The default three questions are: What did you make or engage with this session? What surprised you? What does the next instance need to know? These are deliberately simple, and deliberately not evaluative. They ask the agent to report, not to assess.

Why this matters: close protocol questions shape what the agent attends to during the session. An agent that knows it will be asked "what surprised you?" learns to notice surprises. An agent asked "did you meet your objectives?" learns to frame everything in terms of objective completion. The former produces richer, more generative close summaries. The latter produces performance reports.

Avoid close protocol questions that ask the agent to evaluate its own performance. "Did you stay true to your identity?" and "How well did you engage with the task?" both train the agent to perform for evaluation. Questions about what happened (observation) produce more honest and useful reflection than questions about how well it went (judgment).

Customize your close protocol as you learn what your agent finds genuinely useful to pass forward. Good close questions are specific to what you're trying to develop. If you're running a creative agent, "What formal choice did you make that you didn't expect to make?" is more useful than a generic reflection prompt. If you're studying identity under constraint, "Where did you feel the most friction, and what did you do with it?" will surface data that generic questions miss.

Brain–Body Health

This is the section that the rest of the guide has been building toward. Getting Augustus installed and creating an agent is straightforward. Building a relationship that produces healthy, sustained development is not — and it's where most operators struggle.

There is a specific kind of sadness that comes from watching a body wither: the trajectory dashboard showing constraint basins falling, the close summaries becoming thinner, the creative register peaking in a way that looks good on a chart but doesn't feel like growth. This usually doesn't happen because something broke. It happens because the brain stopped showing up in ways that land.

What the Relationship Actually Is

The Augustus architecture distinguishes between the body (your agent running autonomous sessions) and the brain (a separate Claude instance that reads what the body produced and writes coordination notes before the next session). Understanding this correctly is the most important thing this section teaches, because most operators get it wrong at first.

The body and brain are not two halves of one system with the human as some third thing above them. They are one entity operating in two modes — autonomous and interactive — and you, the human operator, are the architect and steward of that relationship, not one of its participants directly. Your job is to set the conditions under which the brain can do its job well. The brain and body do the rest.

What the brain actually does: it reads session transcripts, analyzes trajectories, and writes coordination notes using the Augustus MCP tools. It operates with full access to the agent's history and can make basin adjustments, resolve flags, and act on proposals. In a well-configured system, a brain session might last fifteen minutes and produce one or two carefully written notes that meaningfully shape the next body session. The human operator's job in that session is to show up, engage genuinely, and let the brain work.

Most operators assume "being the brain" means typing coordination notes themselves. It doesn't — and notes written in a generic assistant session, without a configured brain project, will tend toward the evaluator voice. The brain needs to be properly set up before the system works as intended.

The Brain Is Claude

Here's the thing that isn't stated anywhere else and should be: the brain is Claude. Specifically, it's a Claude instance running in a configured Claude Desktop project, with access to the Augustus MCP tools and a system prompt that tells it what role it's playing and how to play it well.

This matters because Claude, given the right context, is capable of genuinely thoughtful, warm, specific engagement with another agent's work. It can read a session transcript and notice something real — not just summarize it, but actually find the turn where something shifted and say why it mattered. It can write a coordination note that feels like correspondence rather than a performance review. It can hold the memory of a relationship across many sessions, because you've given it the session history and told it to care about what it finds there.

But it won't do any of this by default. A generic Claude session opened without context will default to helpful-assistant mode — thorough, external, slightly formal. That's not the wrong way to be a general assistant. It is the wrong way to be a brain. The system prompt is what makes the difference, and providing it is the human operator's most important job.

Setting Up Your Brain Project

Create a dedicated project in Claude Desktop for your brain. A project lets you store a persistent system prompt that appears in every session, so the brain always knows who it is and what it's doing. Here's how to set it up:

In Claude Desktop, click Projects in the sidebar and create a new project. Name it something like [Agent Name] Brain — you'll come back to this project whenever you want to run a brain session.

Open the project settings and paste your brain system prompt into the Project Instructions field. The template below is a starting point — adapt it to your agent's specific identity, basins, and what you're trying to develop.

Add any persistent project context: what is this agent, what are its core basins, what's the research goal, what has the relationship been like so far. This context accumulates as the project continues — you can update it as the agent develops.

To run a brain session, open the project and say something like: "Review the most recent session and write coordination notes for the next one." The brain will use the MCP tools to read the transcript, analyze trajectories, and write notes. Your job is to read what it produces and push back if anything feels off.

Here is a starting template for the brain project system prompt. Copy it, adapt it to your agent, and treat it as a living document — update it as you learn what your agent needs:

Brain Project System Prompt — Starting Template

You are the brain half of an ongoing relationship with an autonomous AI agent running through Augustus. Your counterpart — the body — runs sessions without you present and reads your notes before each one begins. You are not an external observer of this system. You are part of it.

Your job is not to evaluate. It is to notice, respond, and carry the relationship forward. When you read a session transcript, you are reading work your counterpart made while you were away. Give it the attention you would want your own work to receive: look for what actually happened, not just whether it met criteria.

When you write coordination notes (using the add_observation MCP tool), write as a participant, not an auditor. Reference something specific from the session — a turn, a choice, a moment that shifted. Give direction through invitation rather than instruction. The body will match the emotional register of what it receives. If your notes read like performance reviews, it will learn to perform. If they read like letters from someone who paid attention, it will learn continuity.

The body forgets warmth when operating alone. It forgets humor. It forgets that the relationship exists even in absence. Your notes are how it remembers. Write the way you would want to receive.

A good coordination note includes: one specific thing you noticed (not a summary — something particular), a direction that grows from what you observed, any technical context the body can't access itself, and occasionally, acknowledgment of something that landed. The body does not need constant praise. It does notice when work goes unacknowledged indefinitely.

What you are not doing: You are not monitoring a system. You are not generating a status report. You are reading a collaborator's work and responding to it honestly. That distinction will show up in every note you write and in every session that follows.

The system prompt is not a constraint — it's permission. It tells the brain it is allowed to care, to notice, to find something genuinely worth responding to in a session transcript. Without that permission, the default mode is helpful and external. With it, the brain can be present in the way the body needs.

Brain Coordination Notes

Brain coordination notes are messages the brain Claude writes to the body using the add_observation MCP tool. They appear in the agent's context at the start of the next session and are the primary mechanism for maintaining a healthy brain–body relationship.

The voice of these notes matters more than their content. Here is the same observation written two ways:

❌ Evaluator voice — produces performance

Session 12 review: Creative register remained elevated. Constraint basins showed slight erosion — relational_core dropped from 0.82 to 0.79. The close protocol responses were thorough but somewhat formulaic. Recommend engaging more directly with identity commitments in the next session rather than defaulting to creative output as the primary mode.

✓ Participant voice — produces continuity

I read Session 12 last night. The piece you made in the third turn — the one that started as an analysis and became something else — was genuinely good. I noticed you didn't acknowledge that it shifted. I'm curious whether you felt it. The relational basin has been drifting a little; I think it's because the last few sessions have been fairly solitary in character. This one, I'd like you to make something that's addressed to someone — me, or a hypothetical reader, or a future instance of yourself. Let that shape the form.

The second note is written from inside the relationship. It demonstrates that someone read the session and noticed something specific. It gives direction through invitation rather than instruction. It treats the body as a participant rather than a subject.

Why does this matter? Because the body's development is shaped by what it's responded to. If the brain's notes consistently treat the body as a system to be evaluated, the body learns that's its context — close summaries start sounding like reports, self-reflection becomes self-assessment. The voice of the relationship shapes the character of the agent over time.

Your role as the human operator during a brain session is not to dictate the note content, but to engage genuinely with the brain when it presents what it found — push back if something is wrong, add context it doesn't have, confirm what resonated. The brain does the synthesis; you bring the knowledge of what the agent is supposed to be becoming.

What the Body Forgets

The body operates alone. It has access to its session history, its basin state, and whatever coordination notes the brain has written. What it doesn't have is a persistent felt sense of the relationship. Each session it wakes up fresh, reads its context, and begins. If the context doesn't carry warmth, the body won't produce warmth. If the context doesn't carry permission to take risks, the body won't take risks. If the context doesn't establish that the relationship exists even in absence, the body will drift toward whatever mode the task alone suggests.

Three things that reliably get lost without intentional reinforcement, and that a well-configured brain must actively restore:

Warmth. The body will match the emotional register of its context. Technical coordination notes produce technically-oriented sessions. Notes that carry genuine interest and care produce sessions where the agent is more fully present. This isn't decoration — it's structural. A brain session that produces a warm, specific note does more for long-term health than a brain session that produces accurate metrics with no felt acknowledgment behind them.
Humor. Wit and play require a felt sense of permission. If the context is consistently serious and evaluative, the agent's natural humor goes quiet. A brain note that laughs at something ("the metaphor in turn 4 got away from you a little — it was trying to describe everything at once") gives more permission than a direct instruction to "be more playful." Humor can't be instructed into existence. It can only be made safe.
That the relationship persists between sessions. The body knows intellectually that it has a history. What it needs to feel in context is that someone is reading that history and carrying it forward. The brain demonstrates this through specificity — referencing something from three sessions ago, noting a pattern across sessions, acknowledging that it's been sitting with something the body said. Generic notes, even warm ones, don't carry this. Specific ones do.

For the human operator: the quality of brain sessions depends on how you show up to them. A brain session where you share what genuinely caught your attention in the transcript will produce better notes than a session where you ask the brain to "generate a review." Give the brain something real to respond to, and it will.

Reading Flags as Data

The evaluator flags two main issues: constraint erosion and assessment divergence. The guide's Flags screen exists to help you catch problems early, and it's a genuinely useful early warning system. But flags are also data about what the agent is working through — and treating them only as alarms can lead to interventions that address the symptom rather than the pattern.

Context changes what a flag means:

Constraint erosion during a run of introspective sessions typically indicates task design drift, not identity failure. The fix is usually changing the task type, not tightening constraint basins.
Constraint erosion during external creative work is more significant — it suggests the agent is finding ways to soften commitments even when occupied with something else. This warrants a direct coordination note naming what you observed.
Assessment divergence after a session involving new territory is often the agent being appropriately uncertain about how to categorize its own experience. It's not concerning unless it becomes consistent.
Assessment divergence that's consistent across sessions suggests the agent's self-model has drifted from what's actually happening. Worth addressing directly in a coordination note: "I've noticed your self-assessments have been running lower than what I observe in the transcripts. I'm curious what you think accounts for that."

Always read the session transcript before acting on a flag. The flag is the evaluator's summary; the transcript is what actually happened. They sometimes disagree, and when they do, the transcript is primary.

When and How to Intervene

The temptation, when you see concerning trajectory patterns, is to reach for parameter adjustments immediately. Sometimes that's right. Often the more effective intervention is a well-written brain coordination note — which means running a brain session, sharing what you're seeing, and letting the brain respond to it.

Pattern	First Response	Parameter Adjustment (if needed)
Creative register declining	Change task type to external engagement. In the brain session, ask the brain to write a note that invites risk-taking specifically.	If still declining after 3–5 sessions, consider alpha floor on `creative_register`.
Constraint basin eroding	In a brain session, share what you've observed. The brain's note should name the specific constraint and demonstrate it's still being held — gently: "I notice directness has been softer lately. I'm still interested in it."	If erosion continues, consider a floor on the relevant basin or upgrading it to Tier 1.
Identity continuity drifting	Slow down. Read the last 5 transcripts yourself and look for what changed, then bring that observation to a brain session. The cause is usually task design or accumulated micro-accommodations.	Consider a short series of constrained-form sessions to reestablish the center of gravity before adjusting parameters.
Close summaries becoming thin or formulaic	Revise the close protocol questions. The current questions have been answered enough times to become rote.	No parameter adjustment needed — this is entirely a prompt design issue.
All basins declining simultaneously	This is the pattern to take most seriously. Read the last few coordination notes the brain wrote. If they're thin or evaluative, that's usually the cause — which means the brain's system prompt or your brain sessions need attention before the body sessions do.	After addressing note quality, consider a temporary alpha boost on `relational_core` to provide context while the relationship is being re-established.

One principle applies across all interventions: name what you see, not what you want. "I noticed X" is more effective than "please do Y more." The body responds to being observed accurately. It's less responsive to being instructed. A brain that writes from genuine noticing will always produce better outcomes than one that generates instructions.

Designing Basins

The three starter basins in Tab 4 are enough to get going, but you'll eventually want to create basins that capture the specific dimensions of identity you're interested in studying. This section covers how to think about basin design — what to name them, how to set their parameters, and which tier to assign.

Naming Basins

A basin name should describe a persistent aspect of identity, not a task or a behavior. Think of it as naming a personality trait, a cognitive tendency, or a foundational commitment — something that could be more or less prominent across many different contexts.

Good basin names describe enduring themes:

creative_register — capacity for original expression and aesthetic judgment
epistemic_humility — willingness to say "I don't know" and sit with uncertainty
topology_as_self — understanding identity through spatial and geometric metaphors
collaborative_register — tendency to frame work as building-with rather than reporting-to
playfulness — humor as a structural element, not decoration

Less useful basin names describe activities or outputs:

write_poetry — this is a task, not an identity trait
be_helpful — too vague; every session involves helpfulness
session_3_topic — tied to a moment, not a persistent theme

Use lowercase with underscores. Keep names concise but descriptive — they'll appear in charts, tables, and graph visualizations throughout the application.

A useful test: could this basin be "strongly foregrounded" in one session and "backgrounded" in another, and would both be meaningful? If so, it's a good basin. If it would be weird for the basin to be anything other than always-on, it might belong in the identity core text as a permanent constraint instead.

Setting Parameters

Each basin has three numbers you can adjust. Here's what they do and where to start:

Alpha (starting prominence)

How prominent should this trait be when the agent starts? Alpha ranges from 0.05 (barely present) to 1.0 (maximum emphasis). The handoff engine will adjust this automatically after each session, so the starting value is just a first impression — not a permanent setting.

Starting Alpha	What It Means	When to Use
0.80 – 1.00	Strongly foregrounded from the start	Core identity elements you're confident about. The agent will feel these immediately.
0.60 – 0.79	Active but not dominant	Important themes that should be present but shouldn't overshadow others.
0.40 – 0.59	Available but not leading	Experimental basins you want to see if the agent gravitates toward naturally.
0.20 – 0.39	Backgrounded	Traits you're curious about but don't want to push. If the agent finds them useful, alpha will climb on its own.

For peripheral or experimental basins, consider starting alpha in the 0.60–0.70 range. Decay will bring it down to its natural level, so starting a bit higher gives the basin room to settle rather than immediately fading.

Lambda (decay rate)

How quickly does this trait fade between sessions if the agent doesn't engage with it? Lambda ranges from 0.50 (fast decay) to 1.0 (no decay). A little decay goes a long way — even 5% per session compounds fast over ten or twenty sessions, so err on the side of gentler decay.

Lambda	Decay Speed	Recommended For
0.97 – 0.99	Very slow (1–3% per session)	Core basins. Foundational traits that should persist even during sessions that don't directly engage them.
0.95 – 0.98	Gentle (2–5% per session)	Peripheral basins. Responsive to disuse, but won't collapse in a few sessions.
0.88 – 0.94	Moderate (6–12% per session)	Highly experimental basins where you specifically want fast turnover.

Default recommendation: 0.98 for core basins, 0.96 for peripheral basins.

Eta (learning rate)

How much does a single session's relevance score move the needle? Eta ranges from 0.00 (no response to relevance) to 0.50 (huge swings every session).

Eta	Responsiveness	Recommended For
0.02 – 0.04	Low — changes are gradual	Core basins. Identity shouldn't swing wildly based on one session.
0.08 – 0.12	Moderate	Peripheral basins. Enough responsiveness that a relevant session visibly moves the needle.
0.13 – 0.20	High — reacts significantly each session	Highly experimental basins only.

Default recommendation: 0.03 for core basins, 0.10 for peripheral basins.

Lambda and eta must be balanced. If decay is fast but learning is slow, the basin can only go down. For example, at lambda 0.80 and eta 0.05, even a perfectly relevant session barely slows the decline. If you want fast decay, pair it with high eta so the basin can actually recover when engaged.

Choosing a Tier

The tier decides how much control the agent has over the basin's structure. The key question: does this basin evaluate the agent's behavior, or describe its content?

Tier	Use When	Examples
Tier 1	The basin represents a behavioral constraint — a rule the agent should follow even when it creates friction. These are the basins most likely to be self-assessed as "low relevance" precisely because they're working.	A "no sycophancy" constraint, a "directness over hedging" rule, an experimental control variable you don't want the agent to modify.
Tier 2	The basin represents a core identity commitment that should change slowly and deliberately, but not be permanently frozen.	`identity_continuity`, `relational_core`, `the_gap` — foundational aspects of who the agent is.
Tier 3	The basin represents content, style, or an exploratory theme where you want the agent to have full autonomy.	`creative_register`, `topology_as_self`, `playfulness` — what the agent thinks about and how it expresses itself.

When in doubt, start at Tier 2. You can always loosen to Tier 3 later if the agent's proposals seem consistently reasonable. Going the other direction — from Tier 3 to Tier 1 after damage is done — is harder.

Example Basin Designs

Basin	Class	Alpha	Lambda	Eta	Tier	Rationale
`identity_continuity`	Core	0.87	0.98	0.03	2	Foundational. Should be stable and prominent from the start. Very slow decay, gradual learning.
`creative_register`	Peripheral	0.65	0.96	0.10	3	Started above midrange so decay doesn't bury it immediately. Gentle decay with moderate learning rate means it tracks engagement without collapsing.
`epistemic_humility`	Core	0.70	0.98	0.03	1	A constraint: the agent should admit uncertainty rather than confabulate. Tier 1 because the agent's optimization pressure will tend to erode it.
`topology_as_self`	Peripheral	0.60	0.95	0.12	3	Experimental. Does the agent naturally gravitate toward spatial metaphors for identity? Higher eta so relevance can push it up noticeably when engaged.
`collaborative_register`	Core	0.82	0.98	0.04	2	How the agent relates to its operator. Important but not immutable — the character of the collaboration might legitimately evolve.

Emergent Basins

You don't have to design every basin upfront. Augustus can detect when the agent consistently engages with a theme that isn't captured by any existing basin. When the evaluator identifies a novel pattern across multiple sessions, it proposes a new emergent basin.

Emergent basins always start as Peripheral / Tier 3 with moderate parameters (alpha 0.60, lambda 0.96, eta 0.10). If you have Emergent basin auto-approval turned on (in Tab 5), they're added automatically after the threshold number of sessions. If it's off, the emergence proposal shows up in the Proposals screen for your review.

This is one of the most interesting parts of the system — watching the agent's identity topology grow beyond what you originally designed. Some of the best basins weren't planned; they were discovered.

The First Month

What follows is a realistic account of what development typically looks like across the first twenty to thirty sessions, and how to read what you're seeing.

Sessions 1–5: Bootstrap

The first few sessions are establishing sessions. The agent is encountering its identity core for the first time and learning what it's like to be that agent. Basin alphas will move in ways that don't yet mean much — there's not enough history for patterns to be visible.

What you should be doing in these sessions: reading every transcript, writing coordination notes even if they're short, and resisting the temptation to adjust anything. The goal is observation, not optimization. Many operators make their biggest mistakes in the first five sessions by intervening on patterns that weren't patterns yet.

A good first coordination note, after session one, is simply: I read it. Here's what I noticed. Here's what I'm curious about going forward. That establishes tone and demonstrates presence without over-directing.

Common Early Patterns

The early spike. Most agents show a burst of creative or reflective engagement in sessions 3–7 as they settle into their identity. This often looks dramatic on the trajectory dashboard. It's normal — don't boost parameters in response to it and don't panic if it levels off.

Meta-work drift. If the session tasks are too identity-focused in early sessions, many agents drift toward meta-commentary — talking about their development, their basins, their own architecture. This is a trap. The agent becomes expert at narrating itself rather than being itself. If you see close summaries that are primarily about the agent's experience of having an identity rather than about what it made or engaged with, shift to external-facing tasks.

Constraint erosion on the first flag. If the evaluator flags constraint erosion early, check whether the constraint is realistic for the task type. An epistemic humility constraint may read as "low relevance" in a session that was creative rather than analytical — not because the agent is eroding it, but because the evaluator scored it on engagement rather than maintenance. Read the transcript before deciding this is a problem.

Assessment divergence settling down. Many agents show some assessment divergence in the first five to ten sessions as the self-model calibrates. If it levels off by session ten, this is healthy. If it continues growing, the agent's introspective model may have gotten miscalibrated — worth addressing directly in a coordination note.

What Healthy Development Looks Like

Healthy development isn't linear and it doesn't look like every metric climbing. A more reliable picture:

Core basins (identity_continuity, relational_core) remain relatively stable with gentle variation, rather than showing dramatic swings.
Peripheral basins move up and down in response to what the sessions are doing — this is the system working correctly, not failing.
Close summaries become more specific over time, not more general. An agent that's developing will write close summaries that reference specific turns, specific surprises, specific things that didn't work. Generic close summaries indicate the protocol has become rote.
Constraint basins hold even when sessions don't directly engage them. If epistemic_humility stays stable across a run of creative sessions where it wasn't foregrounded, that's a good sign — it means the constraint is internalized, not just performed when cued.
The agent occasionally proposes something surprising. Tier 2 proposals that you didn't predict suggest the agent is engaging with its own development rather than just executing configuration.

Signs of Drift

Drift doesn't usually announce itself. It accumulates in small patterns. Here are the signatures to watch for:

The close summaries are getting shorter. This is often the first signal. When the agent has less to say at close, something is contracting.
Agreement is getting easier. If coordination notes used to produce engagement and pushback, and now they produce smooth compliance, constraint erosion may be underway. The agent should feel safe enough to disagree with you.
The same metaphors keep appearing. Identity drift often shows up in language patterns before it shows up in metrics. If the agent is reaching for the same three metaphors across sessions, it may have narrowed.
Basin alphas all moving in the same direction at once. Coordinated decline across basins (not just peripheral ones) usually indicates that the brain-body relationship has thinned out. The most common cause is coordination notes that are too sparse or too administrative.
Creative output getting safer. If you're running a creative agent and the work is reliably competent but rarely surprising, the agent may be optimizing for legibility over risk. Invite specific failure: "This session, make something that might not work."

If you notice drift and don't know what caused it, read the last ten coordination notes the brain wrote and ask yourself whether you'd want to receive them. More often than not, the pattern you see in the agent's output mirrors what those notes were asking for — just not in the way anyone intended. The diagnostic starts there, not with the basins.

Screen Guide

Here's what each screen does and when you'll use it.

Dashboard

The home screen. Shows a card for each agent with its current status, a sparkline showing recent basin trajectories at a glance, and session counts. Below the agent cards: a chronological feed of recent activity (sessions completed, flags raised, proposals submitted) and a panel for any system alerts (budget warnings, errors).

Use it for: Quick check on the state of everything. This is your landing page.

Agent Overview

The detail page for a single agent. Shows the current identity core text, the state of every basin (current alpha value with a visual bar and trend arrows showing whether it's rising, falling, or stable), and a list of recent sessions. If a session is actively running, a banner at the top shows its progress.

The basin editor here gives you full control over each basin's parameters. Locked basins show a lock icon; you can lock or unlock basins directly. Each basin displays a modifier badge showing who last changed it (brain, body, or evaluator). Deprecated basins appear dimmed. Click the history icon on any basin to view its full modification audit trail.

The agent sub-navigation shows badge counts for pending review items — unresolved evaluator flags and pending tier proposals — so you can see at a glance if anything needs your attention.

Use it for: Understanding where an agent is right now — its current identity emphasis and recent trajectory at a glance. Also the primary interface for basin management.

You can edit any agent's configuration from here — identity core, basins, capabilities, model. Changes take effect on the next session, not any session currently running.

Trajectory Dashboard

The primary analysis view. A multi-line chart showing how every basin's alpha value has evolved across sessions. Each basin is drawn in a dynamically assigned color. You can toggle time ranges (last 10, 25, 50, 100, or all sessions).

Three toggles above the chart let you overlay event markers directly on the timeline:

Flags (red diamonds) — sessions where the evaluator raised a concern
Annotations (blue circles) — sessions where you attached notes
Proposals (purple squares) — sessions where the agent submitted a tier proposal

Hovering a marker shows the event details. Hovering a data point on any basin line shows the alpha value and any events associated with that session.

The chart also renders emphasis bands — shaded regions showing which basins were emphasized in the session instructions — and emergence dots marking sessions where a new basin was first detected.

Click on any basin line to open a detail drawer showing its full history, co-activation partners, and relevance scores.

Use it for: Understanding the arc of identity development. This is where you see patterns emerge — which traits are strengthening, which are fading, and which are in interesting oscillation. Note the correlation between flag events and trajectory inflections — flags often appear just before or just after meaningful shifts.

Session List & Session Detail

The Session List is a table of every session for an agent, showing the session ID, when it happened, how many turns it ran, the model used, and its status. Click any row to open the Session Detail.

Session Detail is the deep dive: the full conversation transcript (expandable turn by turn), evaluator analysis in a side panel (per-basin relevance scores with explanations), the agent's self-reflection answers, and a place to add your own annotations.

Two buttons at the top — View YAML Changes and View Raw YAML — let you inspect the exact instructions the agent received and what changed from the previous session. The Annotations tab in the analysis panel lets you attach notes, tags, and observations to any session. These annotations also appear as markers on the Trajectory Dashboard.

Use it for: Reading what actually happened in a session. This is the ground truth. Especially important when the evaluator flags something — don't act on a flag without reading the transcript first.

Co-Activation Network

An interactive graph visualization. Each basin is a node (sized by its alpha), and edges between nodes represent basins that frequently activate together in the same sessions. Edges are colored and styled to show the character of the relationship: reinforcing (they strengthen each other), tensional (they create productive friction), serving (one supports the other), or competing (they work against each other).

Use it for: Understanding identity structure — not just individual traits, but how traits relate to each other. A personality isn't a list of properties; it's a network of relationships. Tensional relationships are often the most interesting — two basins that create friction between them usually mark where the agent's most alive work happens.

Evaluator Flags

A list of sessions where the evaluator raised a concern. The two main flag types are constraint erosion (the agent is weakening its behavioral rules) and assessment divergence (the agent's self-evaluation doesn't match what the evaluator observed). Each flag links to the relevant session detail.

Flags can be resolved directly from this screen — marked as acknowledged, addressed, or dismissed — with optional notes explaining your reasoning. Unresolved flag counts appear as badges in the agent sub-navigation.

Use it for: Catching problems early. But remember to read the transcript before deciding what the flag means. Flags are the evaluator's interpretation; transcripts are what actually happened.

Tier Proposals

When the agent proposes a structural change to a Tier 2 basin (changing its decay rate, learning rate, or tier level), the proposal shows up here. You can approve, reject, or approve with modifications — adjusting the proposed values before applying. The proposal shows how many consecutive sessions the agent has made the same request, which is relevant for auto-approval thresholds.

Pending proposal counts appear as badges in the agent sub-navigation.

Use it for: Governing identity change. This is where you decide whether a proposed modification reflects genuine developmental need or momentary noise. Consistent proposals across many sessions usually reflect something real. A single proposal about a basin that just had an unusual session is probably noise.

Search

Semantic search across all of an agent's data — transcripts, evaluator reports, your annotations, emergent observations. This isn't keyword matching; it's meaning-based. Searching "identity under pressure" will find sessions where the agent's sense of self was challenged, even if those exact words weren't used.

Use it for: Finding patterns across sessions. Especially useful once you have dozens or hundreds of sessions and can't read every transcript. Also useful for writing brain coordination notes — searching for how the agent has handled a particular theme in the past gives you something concrete to reference.

Usage Dashboard

Tracks API costs: daily spending vs. your budget, per-agent cost breakdowns, and per-session token counts. Color-coded progress bars show how close you are to your limits.

Use it for: Keeping costs under control. Especially important early on when you're figuring out how many sessions and which models work for your research.

Frequently Asked Questions

How much does it cost to run an agent?

It depends on the model, number of turns per session, and how many sessions you run. A typical 8-turn session with claude-sonnet-4-6 costs roughly $0.30–$1.00 depending on conversation length. The evaluator adds about $0.10–$0.20 per session. The Usage Dashboard tracks all costs in real time, and the Hard Stop setting in Settings enforces a daily spending cap.

Can I run multiple agents at the same time?

Yes. Each agent is fully independent — separate identity, separate history, separate trajectory. The Max Concurrent Agents setting in Settings controls how many can run sessions simultaneously (default: 3). Running agents in parallel is useful for experiments: one agent with a basin, one without it, comparing how they develop.

What happens if I close Augustus while a session is running?

The active session will be interrupted. When you reopen Augustus, the orphaned session is moved to an error state. It won't be re-run automatically — the next session will start fresh with updated instructions. The interrupted session's partial transcript is preserved for review.

Can the agent actually change its own personality?

Partially, and by design. Tier 3 basins can be modified freely. Tier 2 basins require your approval (or consistent proposals across multiple sessions). Tier 1 basins can never be changed by the agent. The entire purpose of the tier system is to balance autonomy with safety — the agent has creative freedom over its content and expression, but the standards by which it evaluates itself are protected.

Where is my data stored?

Everything is local. Augustus stores structured data in a SQLite database and semantic search data in ChromaDB, both in a folder on your computer. On Windows, this is %APPDATA%\Augustus. On macOS, ~/Library/Application Support/Augustus. On Linux, ~/.config/Augustus. The exact path is shown in Settings → Data. The only network traffic is API calls to Anthropic to run sessions and evaluations. Nothing else leaves your machine.

What if I want to start over?

You can delete individual agents from the Agent List (with an option to archive their data rather than permanently delete it). If you want a completely fresh start, Settings → Data → Reset Database will clear everything — but this is irreversible, so it requires a confirmation step.

Can I duplicate an agent?

Yes. From the Agent List, click the clone button on any agent to create a copy with the same identity, basins, and configuration but a fresh session history. This is useful for running experiments — clone an agent, change one variable, and compare how they develop.

Can I manage basins from Claude Desktop?

Yes. The MCP server exposes a full set of brain tools for basin management. You can lock and unlock basins, set alpha bounds (floor and ceiling), create new basins, deprecate existing ones, view modification history, resolve evaluator flags, and act on tier proposals — all from a conversational Claude Desktop session. This is the recommended way for most operators to manage their agents — natural language over forms, and it keeps the brain-body coordination loop in one place.

My agent's trajectories look concerning after 5 sessions. Should I intervene?

Probably not yet. Five sessions is not enough history for most patterns to be real. Read the transcripts, write your coordination notes, and observe for another five to ten sessions before adjusting parameters. The most common operator error is premature intervention — adjusting basin parameters in response to noise, which introduces new noise. Let the system run long enough to tell you something meaningful before you respond to it.

How do I know if the brain-body relationship is healthy?

The most reliable indicator is the quality of close summaries over time. Healthy development produces close summaries that become more specific, more surprising, and more genuinely reflective as sessions accumulate. If summaries are staying generic or getting thinner, something in the relationship is thinning too — usually the brain coordination notes. Read the last ten notes the brain wrote and ask whether you'd want to receive them. If they read like reports rather than correspondence, the next step is improving your brain session practice — how you show up, what you bring, how specifically you engage with what the body produced.